Rules, Hierarchy and Prediction

This is the 4th part of the ‘From Neural Is to Moral Ought’ series of talks. Building on ideas from the previous parts, I make at last make some connection between the structure of our brains and how we ought to behave.

15: Angels and Proles

A huge issue I’ve ignored up until now has been our inability to predict the consequences of our actions. If we are unable to predict, we cannot make any estimation of the ‘utility’ measure in order to compare against alternatives to decide what we should do.

For ‘Act Utilitarians’, each act is evaluated individually. But obviously this can only be an ideal; an aspiration. The best achievable is a rough estimate.

In contrast, the so-called ‘Rule Utilitarians’ deal with the impossibility of evaluating every individual action by having rules. It is the utility of the generalized rules that is evaluated. Good rules then become pre-calculated directives for action. An action is right if the rule generating that action is the one that leads to the greatest good (out of all the actions).

The rules provide guidance. And this works both ways:

  • Normatively: The rules determine what an agent should do in a situation.
  • Descriptively: By virtue of other agents following these rules, they make society more predictable. It becomes easier for agents to predict consequences.

The problem with rules is where to stop in the writing of them. If we rewrite the rule book with sub-clauses to deal with exceptions and continue for all exceptions,  ‘Rule Utilitarianism’ degenerates into ‘Act Utilitarianism’. The point of the rules of course is that they should be reasonably easy to use – easy to apply. The rule is not an absolute – it is only adopted on the pragmatic grounds of enabling decisions to be made in the face of otherwise impossible moral calculations. And so a rule with so many exceptions becomes self-defeating. It ceases to be a rule and so should be abandoned.

So how do we find the appropriate balance between rules that aren’t too simple as to lead to absurd counter-examples and act justification that is not debilitatingly complex?

R. M. Hare used the terms ‘Proles’ and ‘Archangels’ for those positions at the extremes:

  • Proles only ever apply rules.
  • Archangels are superhumanly capable of perfect prediction (c.f. Laplace’s Demon).

In practice, people are in the middle of these extremes, applying rules in some circumstances and reason in others.

Note that being a prole is always option to us whereas being an archangel is not. But we might at least aspire to be angels.

In Hare’s formulation of ‘Two-Level Utilitarianism’, decisions should be based on ‘intuitive’ moral rules except in rare circumstances where it is more appropriate to apply a higher level ‘critical’ moral deliberation.

This is in line with a general pragmatic position among Utilitarians:

“Follow the rule as long as there is a good reason not to break it.”

So if you consider an application of a rule that seems absurd:

“In a warzone, you cannot use a sheet hanging to dry outside a deserted house to bandage a wounded soldier.”

…because that would be stealing, then we can use reason to determine that the usual rule:

“Do not steal!”

can, in this case, be ignored.

(This above is a similar to but a clearer-cut example compared with Lawrence Kohlberg’s ‘Heinz dilemma’)

But a genuine problem here is that this always leaves a utilitarian open to using an opt-out to his advantage. If he finds himself in a situation where he covets something and there are no witnesses around, he may well be able to think of a very personal ‘good reason’:

“Of course, that stealing rule doesn’t apply to me – here, in this circumstance!”

(And I do not offer a solution to that here.)

Two-Level Utilitarianism may be an improvement over Rule and Act Utilitarianism but it is still a fudge. It shifts the issue from ‘should’ to ‘when should’. When is it appropriate to act as a Rule Utilitarian and apply the rule (“Do not steal”, or whatever it is) and when is it appropriate to act as an Act Utilitarian and judge the specific situation?

16: Acting Fast and Acting Slow

The problem with aspiring to be a moral angel is that we don’t have the time. By the time we have worked out what we should do (having gone off and done the research, gathered all relevant information, analysed, deliberated, etc.), the moment for action will probably have passed.

  1. M. Hare’s Two-Level Utilitarianism is reminiscent of the psychological ‘dual process’ theory which divides the mind into:
  • an anciently-evolved, fast-responding, sub-conscious, simple reflex ‘default’ associative process trained from past experiences, and
  • a more recently evolved, slower-acting, conscious, complex, reasoned process that over-rules the above for exceptional situations.,_Fast_and_Slow

Thinking Fast and Slow

This concept has become widely-known through Daniel Kahneman’s book ‘Thinking Fast and Thinking Slow’ in which he is talking of the 2 systems of thinking within the brain:

  • ‘System 1’: intuition and
  • ‘System 2’: reasoning.

(This is using the terminology of Keith Stanovich and Richard West.)

17: Rules and Sub-Rules

We have seen some dual-process theories, characterized by a fast, intuitional process and a slow, reasoning process. But we have also seen that ‘Rule Utilitarianism’ can degenerate to ‘Act Utilitarianism’ with enough sub-rules.

Why stop at just two processes? Why not, for the sake of argument and convenient numbering, have six levels of rules? Let’s call them:

  • The ‘System 1.0’ level: rules
  • The ‘System 1.2’ level: sub-rules
  • The ‘System 1.4’ level: sub-sub-rules
  • The ‘System 1.6’ level: sub-sub-sub-rules,
  • The ‘System 1.8’ level: sub-sub-sub-sub-rules, and
  • The ‘System 2.0’ level: sub-sub-sub-sub-sub-rules.

Each higher level refines the lower-level rule further to create a more authentic representation of that ideal and unknowable ‘moral landscape’. ‘System 1.0’ is a brutish ‘prole’ Rule-Utilitarianism. By the time we are applying the sub-sub-sub-sub-sub-rules of ‘System 2.0’, we might have an almost ‘angelic’ level of refinement. That is, for virtually all practical purposes, it is indistinguishable from ‘Act Utilitarianism’.

(Ultimately, as we increase the number of levels between ‘prole’and ‘archangel’, we have a continuum, akin to Cleeremans and Jiminez’s ‘dynamic graded continuum’.)

Rather than looking at a 2-dimensional (or higher-dimensional) landscape, let us first consider, for simplicity, a 1-dimensional shape – an arbitrary line on a graph (see the black line in the figure below, for the most part hiding underneath the orange line). We can approximate that line with various alternative polynomials:

  • first-order (‘linear’): y = a + b(the green line)
  • second-order (‘quadratic’): y = a + bx + cx2 (the purple line)
  • third-order (‘cubic’): y = a + bx + cx2 + dx3 (the blue line)

and so on. And in the other direction there is also:

  • zeroeth-order (‘constant’): y = a (the brown line)
Calculations thanks to

Fitting polynomials of order N=0, 1, 2, 3 and 7 to an arbitrary data set

The higher the order, the better the match to the original graph shape. Each increase in order adds a higher-order power term at the end, allowing greater refinement of the model landscape. (The ordering of the constants a, b etc. might be different from what you may be used to but the equations are essentially the same.)

Moving on from a 1-dimension curve, a 2-dimensional example is more obviously like a landscape. A picture is a landscape of colour intensities and most pictures distributed across the internet are compressed, ‘lossy’ approximations of the original, in order to reduce the time it takes to download them to you. These are in ‘‘JPEG’ or a similar format. These image files are built up not of higher and higher order polynomials but of higher and higher frequency cosine waves. A ‘Discrete Cosine Transform’ is applied to the image bitmap to transform it into frequency components. By only sending the lower frequency components, the image that is reconstituted at the receiving end is not as good as the original but is a good enough replica.

For example, in the Wikipedia example, the ‘first’ image, with only the lowest-frequency components, is just a blob…


A very crude approximation of a letter ‘A’

But as we add in all the combinations of higher-frequency components, the image quality becomes better …

Building up a better approximation of a letter ‘A’

(The images on the right are ‘templates’. These are weighted by a ‘brightness’ number to obtain the image in the middle which is then added to the image on the left which started as a blank sheet. There are 64 weighted templates to be added in this DCT so the complete image can be defined by just 64 numbers.)

Eventually we have the ‘final’ image, which lacks the crisp edges (high-frequency components) of a perfect letter ‘A’ but is still discernable as being one’:


An acceptable approximation of a letter ‘A’

It is common to get ‘Progressive JPEG’ images on websites, where we can see this gradual build-up of the image (although in fewer steps than that above). For example, in satellite images in Google Maps, we first see a coarse patchwork of squares. By the time we have a medium-resolution image, we may decide that we want to pan across to somewhere nearby, so that the final (high-frequency) components never get displayed. But, ordinarily, the landscape view builds up to the final highest-resolution landscape is displayed.

18: A Hierarchy of Rules

Whether we have two levels (as with Hare’s Two-Level Utilitarianism), 6 levels or 600, a major issue remains – in a particular situation, which level should we apply? When do we respond as a prole? When do we aspire to be angels? When do we select some level in between?

One solution is as follows:

1: First respond as a prole, applying the lowest-level rule.

2: Then for each level upwards, in turn:

  • If we have time, apply that sub-rule. This may overrule previously initiated actions.
  • Either something external may change the situation, or our actions resulting from applying lower-level rules may change the situation.
  • In either case, we then need to start again with the lowest-level rule with the new situation.
  • If the situation hasn’t changed, we eventually apply the highest-level sub-rules that we are able to (acting as an Act Utilitarian).

So we build up a series of actions, each a refinement of the earlier responses. The similarity between this strategy to refine our view of the ‘moral landscape’ and the building up of a landscape view in the previous section should hopefully be very apparent.

Another strategy, almost the same as the one above, is:

Use all the levels all of the time!

As soon as a situation presents itself, we start generating our response at every level in parallel. We do not have to wait for level n to complete before we start calculating our response at level n+1.

Why am I proposing these strategies as an alternative to Hare’s Two-Level Utilitarianism? (Two-Level Utilitarianism is basically subsumed by these new strategies.) The new strategies are rather technical and not obviously more useful, pragmatically, than having just 2 levels. The answer is as follows…

19: A Hierarchy of Predictors

The above description of a hierarchy of more and more refined moral rules may seem familiar. It is the same approach as presented in previous talks, based around Karl Friston’s ‘Variational Free Energy’ theory:

  • The ‘Intelligence and the Brain’ talk: a description of how intelligent behaviour arises from what our brains are physically constituted of, and
  • The ‘What I Know and Why I know It’ talk: relating Variational Free Energy’s ‘minimization of surprise through action and perception’ and ‘hierarchical message passing’ to philosophical theories of knowledge, particularly Susan Haack’s ‘Foundherentism’.

(Refer to these talks for more about ‘Variational Free Energy’.)


Similarly to that latter talk, here I am making connections between:

  • the physical constitution of the brain, in terms of Variational Free Energy’s ‘hierarchical message passing’ with the ability for handling greater complexity at higher levels, and
  • the established branch of philosophy that is Ethics, and in particular with the position within that discipline that is Utilitarianism.

In comparing our moral thinking with a hierarchy of predictors, it should be obvious that prediction should play a major role in any consequentialist morality – we need to be able to predict in order to then make a judgement of the best course of action. We need to be able to imagine what would happen if we did something.

I have presented both a serial algorithm for applying rules and a parallel one. Obviously we are not able to do the parallel one consciously but it matches how we are physically constructed. In practice we do not apply a serial algorithm consciously but apply a parallel algorithm unconsciously, which is a skill we have learnt over many years.

In Variational Free Energy’s mechanistic dynamical system of ‘hierarchical message passing with minimization of error’, changes propagate up and down the hierarchy until everything has settled. When it has, it would be seen that one level is passing an error upwards that is negligible so that higher levels have essentially no role to play. The problem has found its appropriate level that is good enough, based on previous experience.

This fits with our personal experience. In phenomenological terms, morality is an art, not an algorithm – a subtle skill learnt over many years. In simple moral cases, the problem is clear cut; making a decision is easy, based on only the coarsest or rules. As the moral case becomes more problematic, we intuitively sense that:

  • There is less margin for error between right and wrong courses of action.
  • Our margin of error is greater in those problems outside of our normal, learnt experience (such as trolleyology problems).

In both cases, we feel that we need to deal with them with much more attention.

By relating Variational Free Energy to both our moral thinking (here) and our non-moral thinking (in ‘Intelligence and the Brain’), we indirectly make the connection between them. They are done in essentially the same way. This should not be surprising (it is not like we see that the parts of our brain correlated with moral thinking during fMRI scans are made out of different stuff, connected in different ways from what we see in other regions of our cortex!).

This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

7 Responses to Rules, Hierarchy and Prediction

  1. Pingback: Deontology | Headbirths

  2. Pingback: A Unified Morality | Headbirths

  3. Pingback: Ethical Physicalism | Headbirths

  4. Pingback: Anxiety and Well-Being | Headbirths

  5. Wyrd Smythe says:

    Very interesting ideas!

    The law has the same problem with rules: Too many are constricting, often contradictory, and ultimately never enough for all situations. Too few leave large gaps between clearly illegal and clearly legal actions. The judiciary exists, in part, to rule on the ambiguities, and the legislature to evolve the rule-set over time — usually later rather than sooner.

    I found myself wondering how different a hierarchy of rules is from a large (flat) set of rules. Moral rules are a lot more complex than a hierarchy of sine waves (per the JPEG example). As you argue, it does seem a better match for how we think anyway, but at the same time, that functionality of the brain seems below our “awareness horizon” so I wonder if the application of a hierarchy ends up being similar to the application of a large set of flat rules?

  6. Pingback: The Mind of Society | Headbirths

  7. Pingback: Mirroring and Mimicry | Headbirths

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s