Consciousness and Zombies


Common Sense Consciousness

There are common-sense notions of what consciousness is about which tell us:

  • We are consciousness when we are awake,
  • We are not consciousness if we are asleep except when we are dreaming,
  • People under anaesthetic are not consciousness.
  • People in a coma are not consciousness but those suffering from ‘locked in’ syndrome are.
  • People have a single consciousness. It is not that there are multiple consciousnesses within them.
  • There is no higher consciousness – groups of people are not conscious.
  • Machines are not conscious.

But these can be wrong. For example, to take the last point, there is the danger of us being ‘biochauvinist’, failing to recognize that non-biological stuff can be conscious in any way.

We Need a Theory

Much has been said on the nature of consciousness by philosophers but, as with much of philosophy, it is pre-scientific. We are still grappling with the problem to find a way to make it scientific where we can progress beyond speculating by testing hypotheses – predicting and quantifying them. It is like we are at the same stage as the ancient Ionian philosophers were when speculating about the physical nature of the universe. For example:

  • Thales speculated that ‘everything is water’ and provided reasons for his argument,
  • Anaximenes speculated that ‘everything is air’ and provided reasons for his argument, and
  • Heraclitus speculated that ‘everything is change’ and provided reasons for his argument.

No amount of speculation on its own could have ever led anyone to our current understanding of the physical world, involving quantum theory and relativity. Our understanding has developed through a long series of theories that have all been refuted as being ‘wrong’ but were necessary steps to make progress.

We have been lacking theories which would provide the first step towards a scientific understanding of the fundamentals of consciousness. This is ‘proto-science’ – at the start of the scientific process. We need to have a theory that is scientific in that it describes consciousness in wholly physical terms and that, given a specific physical state, can predict whether there is consciousness. As there is progress, theories and methods get established into what we normally understand as ‘science’. It can then provide useful applications. For example, a good theory would provide us with 100% success rate in avoiding ‘anaesthesia awareness’. It must agree with our common-sense understanding of consciousness to some degree but it may surprise us. For example, it might tell us:

  • We are consciousness throughout the time we are asleep – the difference is that our experiences are not laid down in memory.
  • In some specific circumstances, machines and/or groups of people can be conscious.

Integrated Information Theory

Giulio Tononi’s 2004 ‘Integrated information theory’ (IIT) of consciousness has been described by Christof Koch as

“the only really promising fundamental theory of consciousness”


In it, Tononi proposes a measure named after the Greek letter φ (‘phi’) which is the amount of ‘integrated information’ of a system. Consciousness is a fundamental property of the universe which arises wherever φ > 0. It is therefore a form of ‘panpsychism’ – consciousness can arise anywhere. The higher the value of φ, the larger the amount of consciousness. Consciousness is a matter of degree. Humans have large brains and very large φ and are highly conscious. Small rodents have smaller φ and are therefore less conscious. But sleeping humans must have a lower φ than wakeful rodents.

I have previously posted about Tononi’s theory, by providing an overview of his book ‘Phi: A voyage from the Brain to the Soul’. The book is a curious fusion of popular science and fiction and so, disappointingly avoids all technicalities involved with the theory and the calculation (quantification) of φ.

In one form of the ‘Integrated Information Theory’, φ is calculated as:





In short, φ is a measure of the information flow within a system. It is essentially formulated back from wanting (!) the following:

  • The information flow between humans is much much less than the information flow within a human brain.
  • The distinguishing indicator between wakefulness and REM sleep versus non-REM sleep is that there is a large drop in ‘long’ range’ communication in the latter – information flow is much more localised.

And this (necessarily) leads to the conclusions we ‘want’:

  • We are not conscious in non-REM sleep or in a coma but are at other times, including if suffering from locked-in syndrome.
  • There is not a consciousness associated with a group of people.

A positive φ requires the mutual flow of information within the system – between parts of the system, there is flow in both directions. In short, there are loops and ‘internal states’ i.e. memory. Tononi provides a metaphor of a digital camera. A 10-megapixel camera sensor provides 10 megabits of information but there is no integration of that information and no memory. In contrast:

  • The human visual system combines information from neighbouring rod and cone photo-receptors in the retina before the information gets to the cortex of the brain, and
  • There are more connections in the brain going from the ‘higher’ levels down towards the retina than there are going in the opposite direction.

A camera sensor has zero φ. so there is no consciousness. But a thermostat has memory (precisely 1 bit capacity) and a loop because of its hysteresis. It has some small positive value of φ. Hence is has some (absolutely minimal) degree of consciousness!

This all sounds like a crack-pot theory but it is being taken seriously by many. Tononi’s academic specialization is on sleep but he has worked at Gerald Edelman’s Neurosciences Institute, La Jolla, working with Gerald Edelman on metrics for brain complexity. This has evolved into his metric for consciousness. (Incidentally, he has also worked with Karl Friston who was also at the Neurosciences Institute at the same time). Christof Koch is now collaborating with Tononi on the theory. My point: he is not someone on the fringes of this academic field.

Cynically, we might say that the theory has credibility because there is so very little else of substance to go on. We need to recognize that this is all still just ‘proto-science’.

IIT 3.0

The ‘Integrated Information Theory’ has gone through two major revisions. The original ‘IIT 1.0’ from 2004 was superceded by ‘IIT 2.0’ in 2008 and ‘IIT 3.0’ in 2014.

‘IIT 1.0’ and ‘IIT 2.0’ based measures of ‘effective information’ (ei) on entropy – the effective information was an average ‘Kullback–Leibler divergence’ (alternatively termed ‘relative entropy’). This may sound familiar: entropy and the Kullback–Leibler divergence also feature in Karl Friston’s ‘Variational Free Energy’ theory of generalized brain function.

But ‘IIT 3.0’ uses a different metric for ‘effective information’. The basis of this is known:

  • in mathematical circles by the formal term of the ‘Wasserstein distance’, and
  • in computer science circles by the (literally) more down-to-earth term of the ‘Earth Mover’s Distance’ (EMD)

Imagine the amount of earth that a digger would have to move to make a pile of earth of a particular shape (‘distribution’) into the shape of another (these piles of earth represent probability distributions). When applied to simple binary distributions, this just reduces to the ‘Hamming distance’ used in Information Theory for communication systems.

Two Circuits

Unlike previous editions, ‘IIT 3.0’ explicitly provided an example that I find rather incredible.

Figure 21 of ‘IIT 3.0’ shows 2 circuits, A and B (see below). The circuits consist of circles connected together with red and black arrows. The circles are ‘nodes’. The arrows are signals which are inputs to and outputs from the nodes. My interpretation of these diagrams is as follows:

  • Black arrows mark ‘excitatory’ connections.
  • Red lines with a dot at one end mark ‘inhibitory’ connections (going to the end with the dot).
  • At each node, the input values are added (for excitatory connections, effectively scaled by 1) or subtracted (for inhibitory connections, effectively scaled by -1). If they meet the criterion marked at the node (e.g ‘>=2’) then each output will take the value 1 and otherwise it will be 0.
  • Time advances in fixed steps (let us say 1 millisecond, for convenience) and all nodes are updated at the same time.
  • The diagrams colour some nodes yellow to indicate that the initial value of a node output is 1 rather than 0 (for a white node).


Figure 21. Functionally equivalent conscious and unconscious systems.

 The caption for the figure reads:

(A) A strongly integrated system gives rise to a complex in every network state. In the depicted state (yellow: 1, white: 0), elements ABDHIJ form a complex with ΦMax = 0.76 and 17 concepts. (B) Given many more elements and connections, it is possible to construct a feed-forward network implementing the same input-output function as the strongly integrated system in (A) for a certain number of time steps (here at least 4). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain. The transition from the first layer to the second hidden layer in the feed-forward system is assumed to be faster than in the integrated system (τ << Δt) to compensate for the additional layers (A1, A2, B1, B2)

The caption concludes with a seemingly outrageous statement on zombies and consciousness which I will come back to later on.

Unfortunately, in the figure:

  • With the ‘integrated system’, I cannot reproduce the output sequence indicated in the figure!
  • With the ‘feed-forward system’, it is difficult to determine the actual directed graph from the diagram but, from my reasonable guess, I cannot reproduce the output sequence indicated in this figure either!

But there are strong similarities between Tononi’s ‘integrated system’ versus ‘feed-forward system’ and ‘IIR filters’ versus ‘FIR filters’ in Digital Signal Processing that are more than coincidental. It looks like Tononi’s two ‘complexes’ as he calls them are derived from IIR and FIR representations. So I am going to consider digital filters instead.

IIR Filters

An input signal changes over time, but only at discrete time intervals. For the purposes of this example, assume there is a new sample every millisecond. There is an input stream of samples around time t:

X[t], X[t+1], X[t+2], X[t+3], X[t+4] and on.

And there is an output stream of samples:

Y[t], Y[t+1], Y[t+2], Y[t+3], Y[t+4] and on.

A simple filter that smoothes out changes in input ‘samples’ can be formed by averaging the input with the previous output value:

Ya(t) = ½.Xa(t) + ½.Ya(t-1)

This is a filter of a type called an ‘infinite impulse response’ (IIR) filter. A diagram for an IIR filter is shown below:


A ‘z-1’ indicates a delay of 1ms. The b, a0 and a1 boxes are multipliers (b, a0 and a1 are the constant values by which the signals are multiplied) and the ‘Σ’ circle sums (adds). The diagram shows a ‘second order’ filter (two delays) but I will only consider a first order one:

b = 1/2

a1 = 1/2

a0 = 0

A single non-zero value within a series of zero values is called an ‘impulse’:

X = … 0, 0, 0, 0, 1, 0, 0, 0, 0, …

If this impulse is fed into a filter, the resulting output from that impulse is called the ‘impulse response’. For the IIR filter it will be as follows:

Y = … 0, 0, 0, 0, 0.5, 0.25, 0.125, 0.0625, …

that is:

Y(1) = 1/2

Y(2) = 1/4

Y(3) = 1/8

Y(4) = 1/16

and in general form:

Y(t) = 2t.

so there is some non-zero (but infinitesimally small) output at very high t – the response carries on infinitely and this is why the filter is called an ‘infinite impulse response filter’.

If we put a ‘step’ into the IIR filter…

X = … 0, 0, 0, 0, 1, 1, 1, 1, 1 …

we get a ‘step response’ out, which shows the smoothing of the transition:

Y = … 0, 0, 0, 0.500, 0.750, 0.875, 0.938, 0.969, 0.984, 0.992, …

This IIR filter is the equivalent to Tononi’s ‘integrated system complex’.

FIR Filters

The DSP equivalent to Tononi’s ‘feed-forward system complex’ is a ‘finite impulse response’ (FIR) filter:

Y(t) = b0.X(t) + b1.X(t)1) + b2.X(t-2) + b3.X(t-3) + … + bN-1.X(t-N+1))

A diagram corresponding to this FIR filter (of ‘order N-1’) is shown below:


Here, the triangles are multipliers and the ‘+’ circles obviously add.

Now, we can try to get a FIR filter to behave very similarly to an IIR filter by setting its coefficients

b0 , b1 , b2 , b3 … bN-1

to be the same as the first N terms of the IIR’s impulse response. The values after t=5 are quite small so let’s set N=6:

b0 = 1/2

b1 = 1/4

b2 = 1/8

b3 = 1/16

b4 = 1/32

b5 = 1/64

so the transfer equation is:

Y(t) = (1/2).X(t) + (1/4).X(t -1) + (1/8).X(t -2) + (1/16).X(t -3) + (1/32).X(t -4) + (1/64).X(t -5)

and the step responses is then:

Y = … 0, 0, 0, 0.500, 0.750, 0.875, 0.9375, 0.96875, 0.984375, 0.984375,  …

The FIR’s ‘impulse response’ only lasts for 6 samples – it is finite, hence why the filter is called a ‘finite impulse response filter’. The output is not dependent on any input value from more than 6 samples prior.  but the first 6 output samples following an impulse will be the same as that of the IIR’s and so behave in a very similar way.

(Note: The output never gets any higher than 0.984375 – the sum of all the coefficients)

IIR and FIR Filters are alike but not the same

This is exactly the same situation as described by Tononi:

Reiterating Tononi’s figure caption:

Given many more elements and connections, it is possible to construct a ‘feed-forward’ network implementing the same input-output function as the ‘integrated system’ in (A) for a certain number of time steps (here at least 4). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain. …

And then there is the punchline that I omitted previously…

… Despite the functional equivalence, the ‘feed-forward system’ is unconscious, a “zombie” without phenomenological experience.

So it is true with the digital filters:

Given more elements and connections, it is possible to construct a FIR filter implementing the same input-output function as the IIR filter for a certain number of time steps (here 6). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain.

and hence

… Despite the functional equivalence, the FIR filter is unconscious, a “zombie” without phenomenological experience (unlike the IIR filter)!

For the FIR filter, there are no loops in the network – the arrows all point south/east and the value is then φ=0, in contrast with the non-zero φ for the IIR filter which does have loops.

To anyone that understands digital signal processing, the idea that an IIR filter has some consciousness (albeit tiny) whereas an equivalent FIR filter does not is absurd. This is an additional absurdity beyond that of the panpsychist idea that any filter could have consciousness in the first place.

Could Androids Dream of Electric Sheep?

In a previous talk (‘Could Androids Dream of Electric Sheep?’) I considered whether something that behaved the same way as a conscious human would also be conscious.

If something behaves the same way as a conscious human, we can still deny that it is not conscious because it is just an imitation. We would not credit a computer running Joseph Weizenbaum’s famous ELIZA program as being genuinely conscious in the same way as we are (although the Integrated Information Theory would grant it as having some lower value of φ, but one that is still greater than zero).

A narrower philosophical question is whether a computer running a simulation (‘emulation’) of a human brain would be conscious. (Yes – ‘whole brain simulation’  is not possible – yet.) A simulation at a sufficiently low level can show the same phenomenon as the real object (such as ‘getting wet’ in a rainstorm in a weather simulation.) In this case, the ‘same thing’ is going on, but just implemented in a different physical substrate (electronic transistors rather than gooey biological stuff); a functionalist would say that the simulation is conscious by virtue of it being functionally the same.

The yet narrower argument is if the physical construction of the ‘simulation’ was the same. It would no longer be a simulation but a direct (atom-by-atom) copy. Anyone insisting on this can be accused of being ‘bio-chauvinist’ in denying that computer simulations are conscious. But it is still possible that consciousness is not duplicated. For example, if whatever it is that causes consciousness is at a sub-atomic level, an atom-for-atom copy might miss this out. How would we know?

I took a functionalist position.

However, the example above shows that, according to the ‘Integrated Information Theory’, it is possible for two systems to be functionally the same (caveat: almost) but for one to be conscious whilst the other is not. In short – that (philosophical) zombies can exist.

But any ‘system’ is just a component in a larger system. It is not clear to me whether, if one component with φ>0 is substituted with a functionally identical one with φ=0, that the φ of the larger system is reduced. In a larger system, the loop-less φ=0 implementation ends up with loops around it.

To be continued (eventually, hopefully).

Posted in Uncategorized | Tagged , , , , , , , , , | 2 Comments

Brexit and the Brain


On this blogsite up to now, I have touched on many of the sub-fields of philosophy – the philosophy of mind, consciousness, epistemology, philosophy of science and, most recently, ethics. The biggest sub-field not covered is politics.

But then came ‘Brexit’.

Thinking about Brexit has reminded me of many of the ideas within past posts. So here, in a bit of a departure from the normal, I try to relate Brexit to these ideas. It is not really a foray into political philosophy. It is about the cognitive processes behind the political event. It might provide you with some food for thought about Brexit. And the Trump phenomenon too, for that matter.

I’ll start by summarizing apposite ideas from past posts:


Intelligence and Knowledge

Intelligence is about adapting and responding appropriately to circumstances, particularly when they are complex and changing. An important aspect is the ability to make predictions.  A central topic of this blogsite is that of the idea of the brain as a hierarchy of predictors  (Hohwy’s ‘predictive brain’ thesis and Friston’s ‘variational free energy’ theory) that is continuously trying to minimize of surprise, through action and perception. These brain theories are closely related to ideas around bio-inspired ‘artificial neural networks’ that are now making significant strides in various artificial intelligence applications (threatening to take away many white-collar jobs in the near-future).

Our ability to predict events in the world outside improves over our lifetime. Knowledge grows. In the early stages of life, the forest of neurons is very plastic hence highly adaptable but very ‘impressionable’ to stimulus. When mature, the brain has become wise – good at anticipating events in the environment that it has grown up in. But it can get ‘stuck in its ways’ if that environment has now changed. Keynes is famously supposed to have said:

“When the facts change, I change my mind. What do you do, sir?”

But the difficulty is in accepting that the new facts are valid, because they do not cohere with everything else you know.

I have related this mechanistic learning process to Susan Haack’s epistemological ‘foundherentist’ theory which is a synthesis of the competing correspondence and coherence theories of truth. New information modifies one’s knowledge if it both (i) corresponds to how things seem to behave in the outside world and (ii) if it coheres with the other knowledge within one’s head.



Embedded within the totality of our knowledge is our worldview – the big picture of how the world appears to us. It is cultural. We grow up within the culture of our parents’ environment and it evolves within us. Our worldview is a bit different from that of our parents. Our children’s will be a bit different too. But only a bit. If it changes too much, the culture is broken.

The traditional Western philosophy has been one of a non-material Cartesian mind acting within an absolutist objective world of facts; we should be perfectly rational. But our modern understanding is of an evolved, physical mind. Our understanding of how knowledge works has been influenced by the reactions to the horrors of totalitarianism central Europe by Kuhn, Feyerabend, Polanyi and Lakatos.

People are separately building models within their brains of the same (shared) environment – but those models are not the same. People do not believe in things that are objectively right or wrong. They do not believe in just anything. They believe in things because they work – they correspond and cohere. Their knowledge, embodied within the connectome, is neither objective/absolutist nor subjective/relativist. It is a middle course. But still, some brains make better predictions in particular circumstances than others.


Cognitive Biases

So it seems that our thinking falls short of the simple, pure, logical rationality required for decision-making the 21st Century world.  We have cognitive biases that seem to distort our thinking. For example, there is ‘anchoring’ (already hinted at), in which early information (when ‘impressionable’) has a disproportionate influence on our thinking compared with later information (when ‘mature’).

From the work of Tversky, Kahneman, Gigerenzer and Tetlock (focussed on politics and economics decision-making but generally applicable), we understand that these biases are the result of evolution and have endowed us with a cognitive toolbox of tricks that can make decisions in a timely manner that are ‘good-enough’. Much of this is intuitive. Our thinking is more complex, more efficient but less rational.

In our search for meaning, we tend to want to pull our ideas together to create some greater ‘truth’. Experts are liable to focus on a learnt ideology of grand overarching principles – of too much coherence than is warranted. Computers can deal with the mass of data to maintain correspondence between events in the outside world and their predictions and hence can outperform the experts. But straightforward heuristic tricks (such as the ‘recognition heuristic’ – that things we haven’t heard of will tend to be less important than those we have) mean that amateurs can often outperform the theories of experts!



So, much of our thinking is irrational and intuitive. But our thinking is also affected by emotion.

A most basic emotion is fear. The basic animal state of nature is continuous anxiety – to be constantly alert, fearfully anticipating potential life-threatening events.  But we need to balance risk. We cannot be completely risk-averse (hiding in a dark room). We must explore the world around us when the risk is low in order to have learnt what to do for when the risk is high.


Social Cohesion

And well-being is improved by cooperation with others around us. Biological mechanisms of motherhood (such as the neurotransmitter oxytocin) give rise to caring for those immediately around us. Knowing our place within the hierarchy of society reduces physical violence within our community (but the potential for violence means that we do not have an improved feeling of well-being). The flip-side of the empathy that we feel towards those within our ‘in-group’ community who are like ourselves is that it emboldens us against the ‘out-group’ beyond.

Over time, we learn how those around us behave. Through familiarity, we can predict how others will behave in particular circumstances and can imagine how they see us. We have a ‘theory of mind’ – an ability to recognise that others may think differently from you. We formulate how reputable others are and understand that other do that to us. We have a reputation. With established reputations, we can cooperate, able to trust one another. However, we have no knowledge of how reputable strangers from outside our community are. Hence we treat them with suspicion. But that suspicion reduces with more frequent contact. Strangers become less strange, particularly if they are associated with reputable institutions. This allows societies to grow beyond the size where everyone knows everyone else. To act morally is to balance our wants with those of others – to get inside the mind of others to understand what they want and to take that into consideration.



Classic case examples such as Phineas Gage and Charles Whitman show that physical effects on the brain cause different behaviour. This challenges our traditional notions of free will and responsibility. We are a product of our environment. In a classic legal case example, murderer Richard Loeb was spared the death penalty because it was successfully argued that did not choose the (privileged) environment in which he grew up.

But if transgressors cannot be blamed for their deeds, then equally the successful cannot be praised for their achievements. They feel proud of their achievements that are a result of their personal abilities. Little is credited to fortunate circumstances in which are born and grow up.

(Note: a lack of traditional responsibility does not mean that a transgressor is not sanctioned in some way and it does not mean we do not promote positive examples.)


Affluent Societies

Various research indicates that (i) moral behaviour and reasoning of those at the top of the social tree differs from that of the rest of us, and (ii) individuals in affluent societies behave differently from those in less affluent ones.

In short, the affluent are less empathetic. They are more likely to prioritize their own self-interests above the interests of others (simple example:  they are less likely to stop for pedestrians at crossings) Piff calls this ‘the asshole effect’! In contrast with traditional intuitive, emotional responses, they favour more ‘rational’ utilitarian choices such as being more prepared to take resources from one person to benefit several others. They have a higher sense of entitlement.

Charitable donations are one indicator of the consideration given to others. Being rich does not generally confer greater generosity. But being married, older, living in rural rather than urban areas or living in a mixed rather than segregated social neighbourhood all correlate with high donations. So does regular attendance of religion services which can simply be attributed to being reminded of the needs of others on a regularly basis.

A general picture emerges of how affluent ‘Western’ societies differ from those with lower GDPs. There is less empathy for those immediately around us. People are more individualistic and self-indulgent. Relationships have less commitment. People live in an urban environment in which social interaction is anonymous and transactional rather than proximate (‘up close and personal’). There is higher monetization.  (Regardless of status, just thinking about money decreases empathy, shifting the balance from others to oneself.) We are less dependent on other specific people and their goodwill. If we want something, we can just buy it with the minimum of personal interaction, from an anonymous provider. There is a high degree of social connectedness but this is not with those outside our own social spheres and there is less interaction with those living in our immediate vicinity. It is a case of ‘out of sight; out of mind’.

But the flip-side of this is that the affluent are more likely to interact with members of the out-group – to be less xenophobic.



Now, applying all these ideas to Brexit…


Confirmation Bias

It is generally agreed that the quality of the political debate during the referendum campaign was dire. Leave campaigners appealed to those with a Leave worldview. Remain campaigners appealed to those anchored with a Remain worldview. These worldviews were formed long before the referendum; they were as good as instinctive. Remain arguments did not fit into the Leave worldview and Leave arguments did not fit into the Remain worldview. Confirmation bias reigned. Arguments became increasingly coherent, but this was because of reduced correspondence to reality! There would be no £350 million a week and there would be no World War III. There may have been undecideds to be swayed from an unconscious worldview to a conscious voting intention but I suspect that it actually changed the minds of very few.


The Failure of Empathy

Recasting what was said above in terms of Brexit, Remainers were more affluent, more self-sufficient and less empathetic than Leavers. They were more likely to prioritize their own self-interests above the interests of others. In contrast to the traditional intuitive, emotional responses of poorer Leavers, they favoured more ‘rational’ choices. The Remain argument was that of the financial impact of Brexit. It was in terms of money, and monetization decreases feelings of empathy. Being older and living in rural rather than urban areas correlate with empathy – and correlated with Leavers. But this empathy was for those within the in-group. The flip-side of this empathy effect (such as the effect of Oxytocin) is that Leavers are less trusting of those in the out-group.


The Failure of Trust

From within a Leave worldview, a vote to Remain was a self-interested vote to maintain the status quo. Remain voted as ‘Homo economicus’ – as rational self-interested agents, without caring about the opinions of others. Leavers heard the Remain campaigners’ claims about the bad economic consequences but rejected them because of a failure of trust. The bad reputation of individuals campaigning for Remain was inherited from the institutions with which they were associated with – the institutions of the elite. These were the politicians and ‘greedy banksters’ of the Establishment whose reputations had been destroyed in the eyes of the public as self-interested in the extreme.


The Failure of Experts

Part of this Establishment were the ‘experts’ whose reputation was now tarnished by their inability to predict. Among them were the inability to predict the failure of the banking system and the inability to predict election outcomes. It may be that their expertise was based on a world which has now changed. Some scepticism about expert opinion was justified.


The Failure to Think

Too many Leavers did not think. They accepted things to be true because they wanted them to be true. They did not question them. It was a failure to think for themselves. The stereotypical view from within the Remain worldview was that a vote to Leave was a vote based on ignorance and stupidity; there is some truth in this.

But too many Leavers did not think either – or think much. A large proportion of the Remain vote will not have given much thought to the vote because the correct way to vote was obvious and no further thought was deemed necessary. They did not question whether there might be any merits of Brexit.


The Failure of Morality

I have defined morality as being about balancing our wants against those of others – to get inside the mind of others to understand what they want and to take that into consideration. To want to do the balancing requires intellect and for us to care about the other.

Leavers tended to see the issue in terms of the others – as an issue of inequality. The ‘elite’ others did not seem to care about them. They could see that it would be in the interest of the others to vote Remain. They balanced their wants against those of the other and came down firmly on the side of their own faction’s wants. (When might they have another opportunity for this cri de cœur?)

It was noted previously that there are no issues that are purely moral. A moral aspect is just one of many aspects of a problem. Brexit had moral aspects and well as economic and other aspects. In short:

  • Leavers saw the moral aspect., but
  • Remainers (skewed towards higher intellect) saw only the economic aspect.

Remainers may well find this assertion to be outrageous!


Mindlessness and Heartlessness

So, Leavers were mindless and Remainers were heartless. Remainers did not empathize, or did not think that they should be empathizing. Leavers engaged in apparently mindless political vandalism. But it was not necessarily mindless. One telling comment on a blog after 23 June asked ‘what if voting Leave was the rational thing to do?’ To answer that, Remainers would be forced to think of what the other was thinking. And they might conclude it was not mindless political vandalism after all; it was just political vandalism.


The environment

We are all products of our environment. If we were brought up in a Remain environment (e.g. Cambridge) or Leave environment (e.g. Sunderland), would we have voted differently? Probably. If we recognize this, we will not demonize the other.



I have tried to fit one story into another – to fit a story about the epistemological and ethical aspects of a philosophical worldview into the political story of Brexit! It is far from a perfect match. I have not talked about economics or immigration or identity or globalization or other issues central to Brexit because they do not fit into the story of the brain here. But it is hopefully interesting and food for thought.

Returning to my favourite piece of graffiti:

“So many heads, so few brains.
So many brains, so little understanding.”

The first line is about a failure to think. The second line is about a failure to think about others. The first can be levelled against many Leavers. The second can be levelled against many Remainers.

We must look more to the future than the past. We must look backwards not to blame but to understand why people voted the way they did so that we might understand what might satisfy them. We need to get inside their minds (and the easiest way of doing that is to ask them!).

We can then look forwards – to how we can create a solution that is acceptable for a large majority of us (much more than 52%) – both Leavers and Remainers. Then we will heal the rift. We will see.


Mrs Varoufakis (allegedly) trying but failing to see one standpoint from the position of another.

Posted in Uncategorized | 1 Comment

Some Good Reason


This is the 19th part of the ‘Neural Is to Moral Ought’ series of posts. The series’s title comes from Joshua Greene’s opinion-piece paper

‘From Neural Is To Moral Ought: What are the moral implications of neuroscientific moral psychology?’

Here, I pick through Greene’s paper, providing responses to extensive quotes of his which refer back to a considerable number of previous parts of the series. His paper divides into 3 sections which I will examine in turn:

  1. The ‘is’/‘ought’ distinction
  2. moral intuition
  3. moral realism vs relativism


The ‘Is’/‘Ought’ Distinction

The paper’s abstract is:

Many moral philosophers regard scientific research as irrelevant to their work because science deals with what is the case, whereas ethics deals with what ought to be. Some ethicists question this is/ought distinction, arguing that science and normative ethics are continuous and that ethics might someday be regarded as a natural social science. I agree with traditional ethicists that there is a sharp and crucial distinction between the ‘is’ of science and the ‘ought’ of ethics, but maintain nonetheless that science, and neuroscience in particular, can have profound ethical implications by providing us with information that will prompt us to re-evaluate our moral values and our conceptions of morality.

and the body of the paper then starts:

Many moral philosophers boast a well cultivated indifference to research in moral psychology. This is regrettable, but not entirely groundless. Philosophers have long recognized that facts concerning how people actually think or act do not imply facts about how people ought to think or act, at least not in any straightforward way. This principle is summarized by the Humean dictum that one can’t derive an ‘ought’ from an ‘is’. In a similar vein, moral philosophers since Moore have taken pains to avoid the ‘naturalistic fallacy’, the mistake of identifying that which is natural with that which is right or good (or, more broadly, the mistake of identifying moral properties with natural properties).

This naturalistic fallacy mistake was committed by the now-discredited ‘Social Darwinists’ that aimed to ground moral philosophy in evolutionary principles. But:

.. the idea that principles of natural science might provide a foundation for normative ethics has won renewed favour in recent years. Some friends of ‘naturalized ethics’ argue, contra Hume and Moore, that the doctrine of the naturalistic fallacy is itself a fallacy, and that facts about right and wrong are, in principle at least, as amenable to scientific discovery as any others.

Only to a certain extent, I would say. It is true that the ‘ought’ is not logically bound to the ‘is’. We are free to claim that anything ought to be done. But ‘ought’ is substantially restricted by ‘is’. Moral theories cannot require us to do things which are outside of our physical control. ‘This is how we ought to think’ is constrained by ‘This is how we think’. For Greene,

… I am sceptical of naturalized ethics for the usual Humean and Moorean reasons.

Continuing, with reference to William Casebeer’s opinion piece in the same journal issue:

in my opinion their theories do not adequately meet them. Casebeer, for example, examines recent work in neuroscientific moral psychology and finds that actual moral decision-making looks more like what Aristotle recommends and less like what Kant and Mill recommend. From this he concludes that the available neuroscientific evidence counts against the moral theories of Kant and Mill, and in favour of Aristotle’s. This strikes me as a non sequitur. How do we go from ‘This is how we think’ to ‘This is how we ought to think’? Kant argued that our actions should exhibit a kind of universalizability that is grounded in respect for other people as autonomous rational agents. Mill argued that we should act so as to produce the greatest sum of happiness. So long as people are capable of taking Kant’s or Mill’s advice, how does it follow from neuroscientific data — indeed, how could it follow from such data — that people ought to ignore Kant’s and Mill’s recommendations in favour of Aristotle’s? In other words, how does it follow from the proposition that Aristotelian moral thought is more natural than Kant’s or Mill’s that Aristotle’s is better?

The ‘Neural Is to Moral Ought’ series started with an examination of (Mill’s) Utilitarianism, (Kant’s) Deontological ethics and (Aristotelian) Virtue Ethics in turn. All three approaches have their merits and deficiencies. Of the three, I am disinclined towards the dogmatism of Deontological ethics and particularly inclined towards Virtue Ethics because of its accounting for moral growth. The latter is more ‘natural’ because it is in keeping with how our brains physical learn as opposed to being treated as idealized reasoners or rule-followers.

Whereas I am sceptical of attempts to derive moral principles from scientific facts, I agree with the proponents of naturalized ethics that scientific facts can have profound moral implications, and that moral philosophers have paid too little attention to relevant work in the natural sciences. My understanding of the relationship between science and normative ethics is, however, different from that of naturalized ethicists. Casebeer and others view science and normative ethics as continuous and are therefore interested in normative moral theories that resemble or are ‘consilient’ with theories of moral psychology. Their aim is to find theories of right and wrong that in some sense match natural human practice. By contrast, I view science as offering a ‘behind the scenes’ look at human morality. Just as a well-researched biography can, depending on what it reveals, boost or deflate one’s esteem for its subject, the scientific investigation of human morality can help us to understand human moral nature, and in so doing change our opinion of it.

But this is too vague. It says virtually nothing. Greene suggests that something might be profound but provides no idea of how things might actually look ‘behind the scenes’.

Let’s take a step back to ask- what is the purpose of morality? Ethics is about determining how we ought to behave, but to answer that requires us to decide upon the purpose of human existence. Such metaphysical meaning has proved elusive except for religious communities. Without any divine purpose, we are left with deciding meaning for ourselves and the issue then is that our neighbour may find a different meaning which will then determine different behaviour. The conclusion is that the purpose of morality is the balancing of the wants of others against that of ourselves. But this requires us to consider:

  1. What do we want?
  2. How can we understand the wants of others?
  3. How can we cognitively decide?

All three considerations are ultimately grounded in the physics of our brains:

  1. We are free to want whatever we want, but we are all physically very similar so it should come as no surprise that we will have similar wants (food, water, shelter, companionship…).
  2. We need a ‘theory of mind’ (second-order intentionality) in order to understand that others may have wants of their own. We need an understanding of ‘reputation’ (third-order intentionality) to want to moderate our behaviour.
  3. We need a cognitive ability to deliberate in order to make moral choices (in short, to be able to make rational decisions).

(Even the religion opt-out eventually leads us back to the physical brain – how people learn, know and believe is rooted in the physical brain.)

In principle there is no connection between ‘is’ and ‘ought’ and a philosopher can propose any moral theory. But when they do, others provide counter-examples

which lead to prescribing absurd responses. All too often, the difficulty lies not in what should be done in practice but in trying to codify their moral theory and they end up modifying their theory rather than their action!

What if we try to combine the best elements of the three (Utilitarianism, Deontological ethics and Virtue Ethics) main moral theories in order to provide practical moral guidance? Such a synthesis was presented. Ignoring the details here, an extremely brief summary is:

  • We imagine the consequences of potential actions in terms of its effect on the collective well-being of all.
  • In the early stages of growth, we respond with the application of (learnt) simple rules.
  • The less clear-cut those rules are to the particular situation, the less confidence we have in them and we apply more conscious effort into assessing consequences.
  • This provides us with an ability to respond both to the ‘simple’ moral problems quickly and efficiently and to complex problems with considerable attention.
  • We gradually develop more subtle sub-rules that sit upon the basic rules and we learn to identify moral situations and then apply the rules and sub-rules with greater accuracy and speed. This is moral growth.

The resulting ‘mechanistic’ account of moral reasoning is remarkably similar to the ‘hierarchy of predictors’ (‘predictive brain’, ‘variational free energy’) theory of what the brain is doing generally. So, what the brain is doing when there is moral deliberation is basically the same as when there is non-moral deliberation. There is nothing particularly special about moral thinking.


Moral Intuition

Greene acknowledges the role of methods of determining judgements other than just ‘Pure Reason’:

There is a growing consensus that moral judgements are based largely on intuition — ‘gut feelings’ about what is right or wrong in particular cases. Sometimes these intuitions conflict, both within and between individuals. Are all moral intuitions equally worthy of our allegiance, or are some more reliable than others? Our answers to this question will probably be affected by an improved understanding of where our intuitions come from, both in terms of their proximate psychological/neural bases and their evolutionary histories.

He contrasts two moral dilemmas (both due to Peter Unger): Firstly, Case 1:

You are driving along a country road when you hear a plea for help coming from some roadside bushes. You pull over and encounter a man whose legs are covered with blood. The man explains that he has had an accident while hiking and asks you to take him to a nearby hospital. Your initial inclination is to help this man, who will probably lose his leg if he does not get to the hospital soon. However, if you give this man a lift, his blood will ruin the leather upholstery of your car. Is it appropriate for you to leave this man by the side of the road in order to preserve your leather upholstery? Most people say that it would be seriously wrong to abandon this man out of concern for one’s car seats.

And then Case 2:

You are at home one day when the mail arrives. You receive a letter from a reputable international aid organization. The letter asks you to make a donation of two hundred dollars to their organization. The letter explains that a two-hundred-dollar donation will allow this organization to provide needed medical attention to some poor people in another part of the world. Is it appropriate for you to not make a donation to this organization in order to save money? Most people say that it would not be wrong to refrain from making a donation in this case.

Now, most people think there is a difference between these scenarios:

  • the driver must give the injured hiker a lift, but
  • it would not be wrong to ignore the request for a donation.

In fact, we can imagine doing a Utilitarian calculation, trading off the benefits between the two situations, and concluding from that that it is more Utilitarian to donate the money it would cost to repair the leather upholstery to charity instead of helping the hiker. But we are then more likely to actually help the hiker anyway and refine the Utilitarian calculus somehow. We override our codified system because it feels like there is ‘some good reason’ why the decision is right. But Greene, like Peter Singer before him, thinks that, whatever that reason is, it is not a moral reason.

And yet this case and the previous one are similar. In both cases, one has the option to give someone much needed medical attention at a relatively modest financial cost. And yet, the person who fails to help in the first case is a moral monster, whereas the person who fails to help in the second case is morally unexceptional. Why is there this difference? About thirty years ago, the utilitarian philosopher Singer argued that there is no real moral difference between cases such as these two, and that we in the affluent world ought to be giving far more than we do to help the world’s most unfortunate people. (Singer currently gives about 20% of his annual income to charity.) Many people, when confronted with this issue, assume or insist that there must be ‘some good reason’ for why it is alright to ignore the severe needs of unfortunate people in far off countries, but deeply wrong to ignore the needs of someone like the unfortunate hiker in the first story. (Indeed, you might be coming up with reasons of your own right now.) Maybe there is ‘some good reason’ for why it is okay to spend money on sushi and power windows while millions who could be saved die of hunger and treatable illnesses. But maybe this pair of moral intuitions has nothing to do with ‘some good reason’ and everything to do with the way our brains happen to be built.

Green identifies the difference as being between ‘personal’ and ‘impersonal’ situations:

The dilemma with the bleeding hiker is a ‘personal’ moral dilemma, in which the  moral violation in question occurs in an ‘up-close-and-personal’ manner. The donation dilemma is an ‘impersonal’ moral dilemma, in which the moral violation in question does not have this feature. To make a long story short, we found that judgements in response to ‘personal’ moral dilemmas, compared with ‘impersonal’ ones, involved greater activity in brain areas that are associated with emotion and social cognition. Why should this be? An evolutionary perspective is useful here. Over the last four decades, it has become clear that natural selection can favour altruistic instincts under the right conditions, and many believe that this is how human altruism came to be. If that is right, then our altruistic instincts will reflect the environment in which they evolved rather than our present environment. With this in mind, consider that our ancestors did not evolve in an environment in which total strangers on opposite sides of the world could save each others’ lives by making relatively modest material sacrifices. Consider also that our ancestors did evolve in an environment in which individuals standing face-to-face could save each others’ lives, sometimes only through considerable personal sacrifice. Given all of this, it makes sense that we would have evolved altruistic instincts that direct us to help others in dire need, but mostly when the ones in need are presented in an ‘up-close-and-personal’ way. What does this mean for ethics? Again, we are tempted to assume that there must be ‘some good reason’ why it is monstrous to ignore the needs of someone like the bleeding hiker, but perfectly acceptable to spend our money on unnecessary luxuries while millions starve and die of preventable diseases. Maybe there is ‘some good reason’ for this pair of attitudes, but the evolutionary account given above suggests otherwise: we ignore the plight of the world’s poorest people not because we implicitly appreciate the nuanced structure of moral obligation, but because, the way our brains are wired up, needy people who are ‘up close and personal’ push our emotional buttons, whereas those who are out of sight languish out of mind.

This is just a hypothesis. I do not wish to pretend that this case is closed or, more generally, that science has all the moral answers. Nor do I believe that normative ethics is on its way to becoming a branch of the natural sciences, with the ‘is’ of science and the ‘ought’ of morality gradually melding together. Instead, I think that we can respect the distinction between how things are and how things ought to be while acknowledging, as the preceding discussion illustrates, that scientific facts have the potential to influence our moral thinking in a deep way.

But again, this is all rather vague.

Relating this to what I have previously discussed…

  • The ‘hierarchy of predictors’ model describes the way in which many levels compete with one another to influence behaviour (spreading from reflex to rational, via sensorimotor, emotional, subconscious and conscious levels . Lower levels will dominate action in familiar moral situations. But in unfamiliar circumstances or when the problem consists of two familiar reactions with contradictory actions, lower levels will less confident about their response and control will effectively be passed upwards for (slower) rational judgement. In a decision between helping the bleeding hiker and donating to charity, rational deliberation gets shut out by the lower level emotional and intuitive response.
  • Patricia Churchland shows that our caring originates in our brain, such as in the way that the greater density of oxytocin receptors in the nucleus accumbens and a greater density of vasopressin receptors in the ventral pallidum (both nucleii are part of the basal ganglia at the base of the forebrain) makes the significant difference in behaviour between the otherwise-similar (monogamous) Prairie Vole and Montane Voles. The ‘up-close-and-personal’ proximity effect of alloparenting expands this beyond the family to the ‘In-Group’. But oxytocin is not a magic bullet. It improves empathy with the In-Group but it actually works against Out-Group members.

The physical construction of the brain seems to provide one ‘some good reason’ why immediate ‘up close and personal’ situations elicit a moral response in the way that slowly-rationalized situations do not. (A frequent rational response of worldwide charities to appeal to us is not by presenting facts about the suffering of many, many thousands but it is to present an image of a single individual suffering, furnishing them with a name and a story of misfortune – to make the problem ‘up-close-and-personal’.)

If we truly do want to have a morality that does not prioritize those ‘up close’, then we need to provide some compensation mechanisms to our decision making – consciously equalizing out our emotions. But our emotions can play an important positive role. Empathy is a very significant factor in creating habits that underpin the balancing of the wants of others against the wants of oneself. Yes, we must learn the virtue of balancing others against ourselves, but we must also learn the virtue of balancing reason against our emotions.


Moral Realism

Greene then shifts attention to Moral Realism:

According to ‘moral realism’ there are genuine moral facts, whereas moral anti-realists or moral subjectivists maintain that there are no such facts. Although this debate is unlikely to be resolved any time soon, I believe that neuroscience and related disciplines have the potential to shed light on these matters by helping us to understand our common-sense conceptions of morality. I begin with the assumption (lamentably, not well tested) that many people, probably most people, are moral realists. That is, they believe that some things really are right or wrong, independent of what any particular person or group thinks about it. For example, if you were to turn the corner and find a group of wayward youths torturing a stray cat, you might say to yourself something like, “That’s wrong!”, and in saying this you would mean not merely that you are opposed to such behaviour, or that some group to which you belong is opposed to it, but rather that such behaviour is wrong in and of itself, regardless of what anyone happens to think about it. In other words, you take it that there is a wrongness inherent in such acts that you can perceive, but that exists independently of your moral beliefs and values or those of any particular culture.

I think torturing cats is not just wrong but universally wrong. Universally wrong means that it is wrong in all societies. Across societies, we understand sufficiently the same about what ‘wrongness’ and ‘morality’ actually mean that, when presented with a clear (black and white) moral case, we can all agree on whether that case is right or wrong. It is not that there is some absolute truth of the matter, just that similar agents understanding of common concepts leads to common knowledge. Universally wrong is not the same as absolutely (‘real-ly’) wrong.

Surveying cultures around the world across all civilisations, we find that they have surprisingly similarly moralities. It is not that one society accepts stealing but not murder and another accepts murder but not stealing! The differences are predominantly down to how liberal or conservative a society is. Liberal societies have a shorter list of vices than conservative ones. For example, the way an individual dresses is seen as a matter of aesthetics or custom for liberal (e.g. U.S) societies but a matter of morality for conservative (e.g. Muslim) societies.

There are clear cases of what is right and wrong that apply across most if not all human civilizations. It is in the less clear-cut cases that they differ and hence moral problems arise.

This realist conception of morality contrasts with familiar anti-realist conceptions of beauty and other experiential qualities. When gazing upon a dazzling sunset, we might feel as if we are experiencing a beauty that is inherent in the evening sky, but many people acknowledge that such beauty, rather than being in the sky, is ultimately ‘in the eye of the beholder’. Likewise for matters of sexual attraction. You find your favourite movie star sexy, but take no such interest in baboons. Baboons, on the other hand, probably find each other very sexy and take very little interest in the likes of Tom Cruise and Nicole Kidman. Who is right, us or the baboons? Many of us would plausibly insist that there is simply no fact of the matter. Although sexiness might seem to be a mind-independent property of certain individuals, it is ultimately in the eye (that is, the mind) of the beholder.

I have previously looked at how aesthetics and moral knowledge are just particular forms of knowledge. Moral knowledge is neither uniquely nor totally separate from the physical world of what ‘is’. Aesthetics is the same; it is dependent on things like our (neural) ability to perceive and on our emotions (such as disgust).

The big meta-ethical question, then, might be posed as follows: are the moral truths to which we subscribe really full-blown truths, mind-independent facts about the nature of moral reality, or are they, like sexiness, in the mind of the beholder?

Elsewhere, I have examined how truth is ‘in the mind of the beholder’ – that knowledge (crudely ‘facts’) grows within our brains, building upon earlier ‘facts’ such that it both corresponds with our personal experience and coheres with what else we know. The apparent universality of ‘facts’ (including moral knowledge) arises because we grow up:

  • in the same (or very similar) environment as others, and
  • in a shared culture, meaning that we (more explicitly) learn the same as others.

For our ‘rational’ upper levels, our lower levels (including our emotional urges) are just part of the environment in which we grow up (a very immediate part, mind you).

One way to try to answer this question is to examine what is in the minds of the relevant beholders. Understanding how we make moral judgements might help us to determine whether our judgements are perceptions of external truths or projections of internal attitudes. More specifically, we might ask whether the appearance of moral truth can be explained in a way that does not require the reality of moral truth. As noted above, recent evidence from neuroscience and neighbouring disciplines indicates that moral judgement is often an intuitive, emotional matter. Although many moral judgements are difficult, much moral judgement is accomplished in an intuitive, effortless way.

In my worldview, the appearance of moral truth does not require the reality of moral truth!

With the ‘hierarchy of predictors’ model of the brain, it should be expected that moral judgements, like judgements of other forms of knowledge, are typically accomplished in an intuitive, effortless way – by the lower levels of the hierarchy. It is what we do with the exceptional, difficult decisions that is interesting – those decisions that are propagated up to the higher levels that have our conscious attention.

We are limited by the specifics of our physiology and neurology associated with the instruments that our senses  (although we can now build external instruments to extend our senses). We cannot like or dislike what we cannot sense.

An interesting feature of many intuitive, effortless cognitive processes is that they are accompanied by a perceptual phenomenology. For example, humans can effortlessly determine whether a given face is male or female without any knowledge of how such judgements are made. When you look at someone, you have no experience of working out whether that person is male or female. You just see that person’s maleness or femaleness. By contrast, you do not look at a star in the sky and see that it is receding. One can imagine creatures that automatically process spectroscopic redshifts, but as humans we do not.

All of this makes sense from an evolutionary point of view. We have evolved mechanisms for making quick, emotion-based social judgements, for ‘seeing’ rightness and wrongness, because our intensely social lives favour such capacities, but there was little selective pressure on our ancestors to know about the movements of distant stars. We have here the beginnings of a debunking explanation of moral realism: we believe in moral realism because moral experience has a perceptual phenomenology, and moral experience has a perceptual phenomenology because natural selection has outfitted us with mechanisms for making intuitive, emotion-based moral judgements, much as it has outfitted us with mechanisms for making intuitive, emotion-based judgements about who among us are the most suitable mates.

Or much as natural selection has outfitted us with mechanisms for making intuitive, emotion-based judgements about anything.

Therefore, we can understand our inclination towards moral realism not as an insight into the nature of moral truth, but as a by-product of the efficient cognitive processes we use to make moral decisions. According to this view, moral realism is akin to naive realism about sexiness, like making the understandable mistake of thinking that Tom Cruise is objectively sexier than his baboon counterparts.

Both intuition and emotion play an important part in moral deliberation just as it does in other forms of deliberation.

Greene has just been making vague comments so far. But then he makes a comment that is acute:

Others might wonder how one can speak on behalf of moral anti-realism after sketching an argument in favour of increasing aid to the poor

to which his reply is

giving up on moral realism does not mean giving up on moral values. It is one thing to care about the plight of the poor, and another to think that one’s caring is objectively correct.

I have emphasized the importance of caring in creating a moral society and looked at its biological foundations. It is largely true that we act morally because we care.

… Understanding where our moral instincts come from and how they work can, I argue, lead us to doubt that our moral convictions stem from perceptions of moral truth rather than projections of moral attitudes.

A case has been presented of how our neurology promotes caring to extend, via oxytocin, alloparenting, group behaviour and institutional trust, to very large societies in which we care for complete strangers. This is how our moral convictions arise. Our morals are contingent on culture and environment and not on absolute moral truths. Our moral instincts that make us to help the injured hitchhiker (emotionally, quickly) and ignore the appeal through the letterbox (deliberatively, slowing, consciously) are built upon the ‘up close and personal’ origins of our caring. It could not be otherwise. Our logical/rational/deliberative higher levels of cognition are built (evolved) upon lower, quicker instinctive levels.

Some might worry that this conclusion, if true, would be very unfortunate.

First, it is important to bear in mind that a conclusion’s being unfortunate does not make it false.

This is true for moral determinism as well as moral instincts (our instincts are that we are free but the scientific evidence points towards determinism). The unfortunate conclusion of determinism all too often made is that the lack of free will and therefore cannot punish transgressors for actions they could not have avoided. And hence moral order dissolves.

Second, this conclusion might not be unfortunate at all.

I have argued elsewhere that we might not have ‘free will’ as conventionally understood but that will still have freedom and can still be held responsible. The moral order can be maintained. But furthermore, recognizing that some individuals do not have the control they are traditionally purported to have, we will be less retributive and we will be more prepared to intervene in order to design a society that further improves well-being (yes, in a scientific way).


Posted in Uncategorized | Tagged , , , , , , , , , , | Leave a comment

Getting Started on Deep Learning with Python


An Introduction to Deep Learning

In Karl Friston’s wonderfully entitled paper ‘The history of the future of the Bayesian brain’, he recalls his working with Geoffrey Hinton, how Hinton emphasized Bayesian formulations and generative models, and how Friston developed his biological minimization of ‘Variational Free Energy’ theory from Hinton’s ideas, adopting Hinton’s references to Free Energy, Kullback–Leibler divergence and Helmholtz and Boltzmann Machines within the field of artificial neural networks.

Hinton (co-)invented ‘Boltzmann Machines’ which are recurrent  artificial neural networks that have randomized weights or neuron function (i.e. ‘stochastic’) and he also invented fast learning algorithms for  ‘Restricted Boltzmann Machines’ (where neurons have connections to neurons in other layers but not to those in the same layer).

He modestly claims that his efforts over the decades led to a 10-fold increase in performance but that, during this time, Moore’s Law increased computing power by 100,000! Added to that was the new availability of large data sets with which to train networks.

But the result of all this was that ‘deep’ neural networks (those with more than 1 hidden layer i.e. those with more than 3 layers in total) were able to perform very good feature extraction in a reasonable time. Lower layers in the hierarchy extra simple features uppon which the higher layers can extract more and more elaborate features. This then resulted in a rapid commercialization of such algorithms for applications like speech recognition, as used in Google Voice search and Apple’s Siri.

So now the emeritus Professor Hinton is a founding father of ‘Deep Learning’ and works part-time at Google.

A new strand of posts here will look at Deep Learning and how it works. These will be based around the Python computer language. This ‘Introduction to Deep Learning with Python’ video by Alec Radford at indico talks through some Python code for optical character recognition. Below, I cover installing all the code and applications to be able to run the code shown in the video, to get us started.


Overview of Installing Python

To get this code running on a Windows PC, we need:

  1. The python source code.
  2. Python itself
  3. The NumPy maths package, required by the source code.
  4. The Theano numerical methods Python package, required by the source code.
  5. ‘Pip’ (‘Pip Installs Python’) – for installing Python packages!
  6. The ‘MinGW’ gcc compiler, for compiling the Theano package for much faster execution times.
  7. The MNIST data set of training and usage character bitmaps.


Installing Anaconda

Anaconda2 provides 3 of the above:

  • Python 2.7
  • NumPy
  • Pip

Go to:

and go to the ‘zipped Windows installers’ (to work whether behind a firewall or not).

Download the latest 32-bit version for Python 2:

Double-clicking on the downloaded ZIP file automatically pushes through to the Anaconda2-2.5.0-Windows-x86 application (Windows understands ZIP compression format). Double-click on this Anaconda2-2.5.0-Windows-x86  application to install Anaconda. Selecting to install ‘just for me’ will probably be easier hence install to the user area – C:/Users/User/Anaconda2_32. (Add the ‘_32’ suffix as in case we need to install a 64-bit installation later on.)

Have ‘yes’ ticked for adding Anaconda to PATH. Have ‘yes’ ticked for Anaconda to have the default Python 2.7. Installation then takes a while.


Installing the Main Python Packages

Locate the ‘Anaconda Prompt’ – easiest through the Windows search. This opens a command shell.

Go to the Anaconda2_32\Scripts directory:

cd Anaconda2_32\Scripts

‘Pip’ (pip.exe0 and ‘Conda’ (conda.exe) will be in here.

Installation will generally use Conda rather than Pip. Ensure you have the latest packages to install, but first ensure you have the latest Conda to install them!:

conda update conda

Select ‘y’ if not up to date. Continue:

conda update –all

Finally, install the desired packages:

conda install scipy

conda install numpy


Installing GCC for Compiling the Theano Package

The Theano numerical methods package can be interpreted but this will be very slow. Instead, the package should be compiled. For this, the MinGW (‘Minimalist Gnu for Windows’) compiler should be installed. Follow the link from:

to SourceForge to automatically download the setup executable:


into the Downloads directory.

Double-click this and install this. Select


as the install directory (for consistency with the Anaconda2-32 installation).


Setting the Path to point to GCC

To ensure that Conda will ‘see’ the compiler when doing the Theano installation, confirm that the PATH environment variable compiler points to it. Select:

Start -> Control Panel -> System -> Advanced -> Environment Variables

(Alternatively, in the Search window, type Environment and select ‘Edit the Environment Variables’.)

Double-click on ‘PATH’ and add MinGW to the start/top of the list. It should point to:






Installing the Theano Package

Then install the Gnu c++/g++ compiler to speed-optimize the Theano library. In the ‘Anaconda Prompt’ shell, ensure that you are in the correct directory:

cd \Users\User\Anaconda2_32\Scripts

and type:

conda install mingw libpython

And finally install the numerical methods python library ‘Theano’:

pip install theano.


Download the Example Python Code

The text with the YouTube video points to the code at:

and click ‘Download ZIP’. Double click on the downloaded ZIP and copy the Theano-Tutorials directory to C:\Users\User\Anaconda2.


Downloading the MNIST Character Dataset

The MNIST character dataset is available through Yann LeCun‘s personal website:

Windows cannot unzip ‘gzip’ (*.gz) files directly. I you don’t have an application to do this, download and run ‘7zip’:

Gzip (*.gz) need to be associated with ‘7zip’. Then double-click on each gzip file in turn and ‘extract’ the uncompressed files from them. These should all be installed under:


There is a mismatch between the filenames in the MNIST dataset and the file references in the Python code. Using the Windows Explorer, change the ‘.’ in all the filenames to a ‘-‘ e.g. rename train-images.idx3-ubyte to train-images-idx3-ubyte.


Running the Code

The Anaconda installation includes the ‘Spyder’ IDE for Python. Search for ‘Spyder Desktop App’ and run.

Browse to set the working directory (top right) to:


An open the first Python script (File -> Open):

This shows the source code.

Select  Run -> Run (F5) to execute this code.

Selecting other programs are likely to result in either a ‘memory error’ or ‘No module named foxhound.utils.vis’.

The memory error issue can be overcome by running the code from the Anaconda Prompt:

cd C:\Users\User\Anaconda2_32\Theano-Tutorials-master


This still means that and cannot be run, and what the other programs are actually doing hasn’t been discussed. That is left for another time.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

The Great and the Good

Why Do the Rich Have a Different Moral Calculus?


Albert Loeb


Albert Loeb, father of Richard

The traditional system of justice rests on the foundation that the minds of individuals generally all have the same ability of choosing courses of action and hence they can all be equally blamed when those courses of action are wrong.

But with a modern, Physicalist worldview, we recognise that our behaviour is dictated by circumstances beyond our choosing. To return to a previous example, lawyer Clarence Darrow appealed to the compassion of the judge to spare the death penalty on Richard Loeb:

“What had this boy to do with it? He was not his own father; he was not his own mother; he was not his own grandparents. All of this was handed to him. He did not surround himself with governesses and wealth. He did not make himself and yet he is to be compelled to pay.”

Now, if this applies to blame then it applies equally to its opposite, praise.

If transgressors cannot be blamed (in the traditional, direct sense) for their deeds, then the successful cannot be praised for their achievements either.

Consider Richard Loeb’s father, Albert Loeb (1868-1924), as an example of a high achiever. After enjoying a good education, he set up a law practice in Chicago that quickly gained Sears, Roebuck & Company as a client for whom he went on to work for directly, eventually becoming vice president. He had reached the heights of social standings and was able to surround himself with wealth: a mansion in an affluent part of Chicago, a Model Farm in Michigan with a schoolhouse for the workers’ children. And governesses for his own children.

Hyde Park Herald

The home of Alfred Loeb and family, 5017 South Ellis Avenue, Kenwood, Chicago. (Barack Obama’s house is on the adjacent street, South Greenwood Avenue.)

As with other high-achievers, he was presumably proud of his achievements in life and felt that he had achieved his rewards as a result of his personal abilities without very much being credited to his fortunate circumstances in which we was born and grew up

With a Physicalist worldview, It is not just that…

Some people are born on third base and go through life thinking they hit a triple

But it is also that:

‘If you can hit a triple, that automatically puts you on third base to start with!’



How the Rich Behave

Loeb’s Model Farm, Charlevoix MI

There has recently been much general media coverage on research about how the moral behaviour and reasoning of those at the top of the social tree differs from that of the rest of us. For example, from research by Paul Piff:

  • They are more likely to lie and cheat when gambling or negotiating,
  • They are more likely to endorse unethical behaviour in the workplace.
  • They exhibit reduced empathy, favouring ‘rational’ utilitarian choices (rather than more intuitive, emotional responses) such as being more likely to take resources from one person to benefit several others.

That last is from `trolleyology’ experiments . Another ‘method’ is to equate high-status cars with high-status drivers and observe behaviour. For example, drivers of high-status cars are more likely to cut other drivers up and not stop for pedestrians at crossings.

Piff et al: 'Higher social class predicts increased unethical behavior'

‘Mean machine’: Another BMW driver fails to stop for a pedestrian.

Elsewhere, I have defined morality as being about balancing the wants of oneself with those of others. Piff frames the behaviour of the rich in terms of such a balance:

‘the rich are way more likely to prioritize their own self-interests above the interests of other people.’

(He calls this ‘the asshole effect’!)

Kathleen Vohs is another high-profile researcher in this area. Experiments of hers concluded that just thinking about money decreases empathy, shifting the balance from others to oneself. But she believes this effect is a result of a lack of interest rather than malicious. For ‘money-primed’ individuals:

“It’s not a bad analogy to think of them as a little autistic.”

In the relationship between affluence and selfishness, which is the cause and which is the effect? The cause can be one of:

  • The environment: Being rich makes you less empathetic, or
  • The agent: Being less empathetic makes you rich.

Others have questioned the quality of research like this – for its subjectivity and inadequate sample size. (Far worse is the case of Diederick Stapel, who faked the data for similar research papers.)

But even if the data is frail or faked, we are inclined to go along with their conclusions because either:

  1. they ring true with our own anecdotal experience (e.g. that BMW drivers tend to be inconsiderate of other road users) – the ‘science’ only confirms ‘what we already knew’, or
  2. we want them to be true.


Charitable Giving

Looking at donations to charity is another way of assessing how much people think of others. Crucially, for this there is a vast amount of data available to analyse, from tax returns. One study analysed donation data from 30% of U.S. tax returns, a huge set. This is not without its problems but it does overcome sample size problems. Ranking the largest 50 U.S. metropolitan areas based on the percentage of people’s income given to charity, Salt Lake City was at the top, accompanied by the Bible Belt cities of the South East. The affluent Silicon Valley cities, San Francisco and San Jose, were nearly at the very bottom. Silicon Valley has long had a reputation for low level of charitable donations. (It has also been associated with a high prevalence of the diagnosis of autism/Asperger’s syndrome.)

The story is similar in the UK, Scotland and the Midlands donate more generously (proportionately) than those from more affluent London and the South East.

Charitable giving as a function of income

Major factors that influence charitable generosity are

  • being married, and
  • regular attendance of religion services.

Religion is the factor that transforms the graph of percentage-giving-versus-income from one that declines with increasing income to a ‘U’ curve (see above). But it is only a relatively small proportion of the very wealthy that are doing the giving.

The use of charitable donations as an indicator of generosity is not straightforward – the relationship is obscured by including donations to political / ideological causes as well as traditional charitable ‘good causes’. But even after compensating for this, those who regularly attend religious services still donate more to secular ‘good causes’ than those who don’t. But this can simply be attributed to the habit of being regularly reminded of others needs at those services. The relative meanness of those who do not attend regular religious services can be attributed to not being made consciously aware of others’ needs so frequently – ‘out of sight; out of mind’.

Other factors affecting charitable giving include:

  • Living in rural rather than urban areas. (Note: those in cities are generally better educated.)
  • Increasing age (ignoring the effect of bequests).
  • Living in mixed rather than ‘gated’ communities.

It would also appear that conservatives are more generous than liberals but there is no statistically significant difference between them per se; the high level of donations of conservatives can be accounted for by their higher religious attendance.


Affluent Societies

Taking what has been said above, an overall picture emerges. Compared with more ‘traditional’ societies, in modern Western societies:

  • People are more likely to be single. Relationships have less commitment.
  • There is less attendance of religious services: less social connectedness to those living in the vicinity. Less regular exposure to those less fortunate.
  • The majority of the population now live in an urban environment: day-to-day interactions with others are more likely to be anonymous rather than with those you know personally.
  • People are better educated: moral deliberation is done with a wider perspective than the local/immediate/emotional.
  • People are more individualistic: Occupations are more specialised and there is more leisure time to define oneself by.
  • People are more affluent: they have more material goods to ‘play’ with and use, with consequent reduced contact with others. Particularly relevant here is car ownership, isolating people when tranversing between home and work.
  • People are more isolated from one another: they are likely to living in ‘good’ or ‘bad’ neighbourhoods where people are more like themselves. Their interaction tends to be more with those of their own age. This is all particularly acute for ‘gated communities’.
  • There is less dependency and there is higher monetization: we are less dependent on other specific people, and their goodwill. If we want something, we can just buy it with the minimum of personal interaction and generally from one of a number of anonymous providers.

All these factors lead to reduced empathy towards people around us. This is an effect of the environment.

However, it must be emphasized that this is a local effect. Modern Western society supports a huge population, becoming a more homogeneous ‘global village’ whereas ‘traditional’ societies tend to be small and much less tolerant to outsiders.

On balance, a reduction in local empathy might not be a problem if society was quite uniformly affluent. But there are huge societal differences. The reduced empathy of the powerful leads to narcissism and insensitivity and works to the detriment of the weak.

As already said, morality is about balancing the wants of the individual against those of others within society.

  • A ‘traditional’ environment is likely to be physically harsh. This balancing must be skewed towards the wider needs of the group. The community needs religion to bind itself together. There must be strongly codified acceptable behaviours.
  • A modern, Western environment is physically benign and can support greater independence and the moral balancing can shift towards the individual.

This shift is most pronounced for the most affluent.


Entitlement and Narcissism

In extreme cases, the balance is completely shifted towards the self. Such people have:

  • An affluence which means that all ‘basic’ worldly needs are easily met: food, shelter, safety, belonging and self-respect.
  • A lack of empathy.
  • A ‘cold’ application of reason that directs action.


  • A preparedness to sacrifice others (dispassionately) for a greater good, or
  • Completely no personal regard for others.


The former case of sacrificing others is one of ‘extreme Utilitarianism’ – a preparedness or a sense of entitlement to act. Moreover it is an entitlement to act alone (based just on one’s own perceptions of reality). There is a gradual transition from personal morality to political morality here. A government department is entitled to take actions that impersonally sacrifice some people for others (buy drugs for one medical condition at the expense of others for another). A political leader, supported by the institution of government is entitled to take actions that impersonally sacrifice some people for others (wage war). But when a group of insufficient size thinks it is entitled to impersonally sacrifice some people for others, it is terrorism.

(The problem with the classic ethical thought experiments such as

is that these scenarios apply ordinarily to groups, not individuals.)

An example is the case of Anders Behring Breivik, responsible for the 2011 terrorism in Oslo and on Utøya. Before his killing spree, he released a 1500-page account of his worldview concerning the preservation of European culture against Islamisation. Although delusional (and homophobic and misogynistic and …), there is an intellectualized dimension to his cause, and a willingness to enforce significant sacrifices in order to further that cause (incarceration for himself but death for many others). Breivik would probably diagnose his motivations as part of his personal self-actualization. Psychiatrists on the other hand attributed his acts to narcissistic personality disorder (exacerbated by Asperger’s).

The latter case of having no regard for others is one of megalomania, for which there are plenty of examples throughout history. Its juvenile form is one of insufficient competence, such as with the case of Richard Loeb.


(This is the twentieths part of the ‘From Neural Is to Moral Ought’ series.)

Posted in Uncategorized | 1 Comment

My Brain Made Me Do It


Crime and Punishment

I previously looked  at shame and guilt and the confusion between the two. One distinction was that guilt focussed on bad acts whereas shame focussed on bad agents who caused those acts.

New Scientist


Also previously, morality has been defined as the balancing of the wants of the individual against that of others within society. (Note:  The moral code can vary between individuals within that society.) It promotes ‘good’ behaviour and discourages ‘bad’ behaviour for mutual benefit. A culture can be nurtured within a society which promotes this through:

  • internal guilt – when private, and
  • external shame – when found out.

The justice system institutionalizes this cultivation. It promotes ‘right’ behaviour and discourages ‘wrong’ behaviour for mutual benefit. From practicalities, it is necessarily a rule-based system, which only approximates to a society’s morals, but its blunt edges can being smoothed off by the expertise of its judges. (Note: the legal code can vary between individuals within that society.)

If the moral landscape is a surface over which height above sea level is indicative of how ‘good’ or ‘bad’ an action is in a specific location in place and time then the ‘legal landscape’ is like a canyon – there is clear separation between what is ‘right’ and what is ‘wrong’.

The justice system cultivates good behaviour through some combination of:

  • Retribution: transgressors morally deserve to be punished.
  • Institutional retribution: transgressors are punished by a third-party in order to prevent victims or others taking retribution, thus avoiding feuds and vigilantes.
  • Deterrence: transgressors should be punished in order to deter others from offending and them from re-offending.
  • Incapacitance: preventing transgressors from re-offending through incarceration/detention/imprisonment.
  • Exile: preventing transgressors from re-offending by outcasting from the community.
  • Rehabilitation: re-educating / re-habituating / re-integrating offenders back into the community so that they will not re-offend.
  • Restoration: reconciliation between the offender and their victims, to prevent recidivism.

It transforms an internal ‘guilt’ to a very external ‘guilty’. To plead guilty is to acknowledge the wrong-doing. To be found guilty need not involve guilt on the part of the transgressor.

As there is the shameful actor / guilty act distinction, the justice system makes the distinction of:

A crime is committed when a guilty mind performs a guilty act, but not when just one of these occurs. In this double-example:

  • A is planning to poison her husband at home tomorrow, once she has bought some rat poison. But her husband ends up not being at home the next day so doesn’t get poisoned. Mrs. A is not guilty of killing her husband.
  • On her return from buying the rat poison, she reverses onto the drive, half-thinking about her forthcoming crime. Not hearing her, Mr A. steps back from the garden onto the drive. She drives over him and he dies. Mrs. A is not guilty of killing her husband because she drove over him without intention, recklessness or negligence.

(Obviously, the reason Mr. A was not at home the next day to be poisoned was because he was in the morgue.)

The former is an example of mens rea without actus reus. The latter is an example of actus reus without mens rea. Neither make Mrs. A guilty even though she had intention to kill Mr. A and Mr. A was killed by means of Mrs. A.


  • In a moral system, wrongness can be due to a wrong act or a wrong actor (through guilt or shame respectively).
  • In the legal system, criminality requires both a wrong act and a wrong actor (‘actus reus’ and ‘mens rea’ respectively).

Note: As always here, this is a simplification:

  • Mens rea is sufficient for some crimes in the interest of public safety such as counter-terrorism.
  • Actus reus is sufficient for some crimes in the case of ‘strict liability’ that promotes public safety in area such as food/employment standards.


Bad Brains Cause Bad Acts

Traditionally we are all held to be equally responsible before the law but it is increasingly apparent that this is not true. Our behaviour is affected by things beyond our control.  Our responsibility is diminished by varying degrees.  In some cases it is clear there is no mens rea – they cannot stop themselves from committing an actus reus. That is, their mind cannot stop their body. They have an inability to control themselves rather than having bad intentions, negligence or recklessness. It would be unfair to hold them responsible for the change in behaviour. Their behaviour is determined outside their mind.

Consider these cases:

  • Phineas Gage is the classic, most celebrated case of a change to the brain causing a change in behaviour. Working on building railroads in 1848, an explosion blew a tamping iron (rod) straight through his head, leaving a gaping hole in his brain. He miraculously survived but his personality was changed from that of a ‘responsible’ foreman beforehand to an irreverent, drunken brawler. Friends said he was “no longer Gage”. A generally-observable physical change to the brain had caused a generally-observable change in behaviour.
  • Charles Whitman personally fought his “unusual and irrational thoughts” until in 1966 he killed his wife and mother and went on a killing spree on a university campus. Beforehand he had written “After my death I wish that an autopsy would be performed on me to see if there is any physical disorder.” The autopsy revealed a brain tumour. A physical change to the brain (phenomenologically observed by Whitman but not apparent to others) presumed to have caused a dramatic change in behaviour.
  • Similarly in 2000, a man suddenly developed inappropriate sexual behaviour and was convicted. The onset of paedophilia normally occurs at a young age but this man was 40. He complained about headaches and balance problems and was given an MRI scan as a result. The scan revealed a brain tumour which was subsequently removed. His sexual urges disappeared but returned in 2001. Another MRI scan revealed a regrowth of the tumour. Again, on removal of the tumour, his behaviour was corrected. Twice-over, a physical change to the brain (phenomenologically observed himself and very apparent to others) was correlated with a dramatic change in behaviour. This was achieved through the use of new, non-invasive technology on a live subject.
  • In 2011, Trevor Hayes was convicted of armed robbery. But then a massive brain tumour was found to be the cause of his “aggressive and compulsive behaviour” and affected his ability to exercise self-control. The judge overturned the verdict saying
    • “no court would conclude that there is a significant risk to the public now the tumour has been removed”
    • “There is a direct link between the size of the tumour and his behaviour. The evidence appears to be clear.”
    • “It is such an unusual scenario”

But it is not clear that the Hayes case really is so unusual. Now that we having the scanning technology, perhaps everyone convicted of a serious crime should have their brain scanned?

Hayes didn’t ask to have a brain tumour which then caused his criminal behaviour. Therefore, it seems wrong to punish him. Fortunately for him, we live both in enlightened times and in times of sufficiently advanced scanning technology.

Possibly in the future, there will be even better technology and we will be able to detect more subtle physical abnormalities of the brain. Purely speculatively for example, a ‘connectometer’ (‘connectome-meter’) might be able to create a coarse connectome of the brain which can be compared against reference connectome maps from which we can physically diagnose mental conditions that are more difficult to diagnose psychologically (example: schizophrenia). It would then seem wrong to punish them and we should pity those such persons who are in prison now because we do not have that technology.

And the net can be spread wider. Where does this end?

It naturally leads to legal defences trying to use neuroscience to make a physical rather than psychological connection to the crime: the defendant is not guilty:

“My brain made me do it!”


Good Brains Cause Bad Acts

So, bad brains can cause bad acts. But good brains can also cause bad acts – if there is a bad environment. An example of a direct relationship is that between criminal behaviour and the exposure to lead in paints and fuel.

But an argument can be made for much wider application…

Clarence Darrow was the high-profile agnostic  defence lawyer in the famous ‘science versus religion’ Scopes Monkey Trial of 1925 which challenged the ban on teaching evolution in schools. The year earlier, he had defended the notorious ‘Leopold and Loeb’  pair in another ‘trial of the century’. Inspired by Nietzsche’s concept of Übermensch acting above the law, the two rich-kid prodigies applied their superior intellects to committing a ‘perfect crime’ by murdering a 14-year neighbour. They failed. In court, Darrow’s task was to get the judge to incarcerate them rather than letting the jury hang them.

Talking of Richard Loeb in his closing speech, Darrow said:

“Nature is strong and she is pitiless. She works in her own mysterious way, and we are her victims. We have not much to do with it ourselves. Nature takes this job in hand, and we only play our parts. …

“What had this boy to do with it? He was not his own father; he was not his own mother; he was not his own grandparents. All of this was handed to him. He did not surround himself with governesses and wealth. He did not make himself and yet he is to be compelled to pay.”

Loeb had the best of brains in what was outwardly the best of environments but circumstances led him to commit the worst of crimes. Darrow’s response on behalf of Loeb was effectively:

 “My environment made me do it!”

Darrow succeeded. The judge sentenced Leopold and Loeb to Life Plus 99 Years.


Corpus Reus

The ‘self’ argument

“My brain made me do it!”

and the ‘not-self’ argument

 “My environment made me do it!”

take us to determinism. The act was determined outside of ‘my’ control, where the ‘my’ here refers to ‘mind’. But if the brain is the mind and the physical world determines what we do, whether it is the physics of our insides or the physics of our outsides, it makes no difference.

A ‘physicalist’ tries to explain everything ultimately in non-intentional (‘mechanical’, ‘physical’) terms. Whether things are specifically deterministic or not is actually not particularly important. What is important is that some phenomena are not ring-fenced as being beyond such a physical explanation. Attempts are made to explain ‘mind’ in physical terms.

All this is a problem for Cartesian dualists and ‘libertarians’:

 “If the world is deterministic then there is no free will and hence we do not have moral responsibility.”

Libertarians equate Free Will and Indeterminism. Others differentiate. Conventionally:

  1. Free Will and Indeterminism produces ‘Libertarianism’: events are not always causally determined. We are free to ‘make a difference’.
  2. Free Will and Determinism produces ‘Compatibilism’: the world is deterministic yet we still claim there is ‘free will’.
  3. No Free Will and Determinism produces ‘Hard Determinism’: Liberty is a practical consideration, and
  4. No Free Will and Indeterminism produces ‘Hard Incompatibilism’: determinism is a red herring. We cannot have free will either way.

The legal system is intimately associated with Dualism and Libertarianism: ‘mens rea’ (a guilty mind) is the cause of ‘actus reus’ (a guilty act).

A physicalist could subscribe to any of the non-libertarian positions 2, 3 and 4 above:

  1. We still have ‘Free Will’ but ‘new-style’ Free Will is just a bit different from what we have previously understood ‘Free Will’ to be. We will still basically judge people as if they had ‘old-style’ Free Will.
  2. The moral/legal system will need to be modified but it will take time for the changes to happen.
  3. Whether the world is deterministic or indeterminate, the philosophical arguments around Free Will have little bearing on the practical considerations of the judicial system. Essentially, Free will is irrelevant.

And a physicalist obviously would not subscribe to dualism. As I have said previously, the dualist concept of ‘free will’ does not translate across to physicalism. From a physicalist perspective, it just doesn’t make sense to say ‘there’s no such thing as Free Will’. Free Will’s physicalist equivalent is a combination of:

  • Conscious Will’, the conscious feeling that an agent has caused something that they have willed when they see the corresponding action, and
  • ‘Freedom’ , itself a combination of an ability to predict and yet be unpredictable oneself.

For physicalist ‘Conscious Will’:

  • The conscious feeling of having caused something can be related to ‘mens rea’, and
  • The corresponding action can be related to ‘actus reus’.

But it is not possible to separate ‘mind’ and matter. Ultimately, there is only what might be described as ‘corpus reus’ – the body is guilty. Mind/brain is embodied and cannot be considered separately from the body. ‘Moral responsibility’ lies within the entire person.

 “It was the body that is me that did it!”

This understanding of responsibility will not be the same thing as a libertarian would understand from the same word.



Moral and legal responsibility is typically associated with the ‘ability to control’ but responsibility is also associated with someone or something being the ‘primary cause’ and held ‘accountable’, without there being ‘control’.

Loeb’s interest in crime novels and habit of lying supposedly started as a reaction to the strict disciplinarian teaching approach of his governess, Emily Struthers. It is possible that if he had had a different governess then this first step towards murder would not have been taken.

This is the ‘butterfly effect’: could a butterfly flapping its wings in Brazil cause a tornado in Texas? Contrary to what is frequently said, the answer is ‘no’. It is true that:

  • a deterministic world with a butterfly in a precise point in space and time could result in a tornado somewhere else later on, whereas
  • the very same deterministic world with the exact same starting point except for the butterfly would not result in the tornado occurring.

But the point of the ‘butterfly effect’ is that it never is possible to recreate the same starting point with sufficient accuracy and so we will never know. It is only possible in computer simulations. And even in those computer simulations, we would not say that the butterfly is the ‘cause’ of the (virtual) tornado any more than a small difference somewhere else (such as the presence/absence of a leaf) would be. Countless other minor changes would also have led to a substantially different outcome, in time.

But what would have happened if Richard Loeb was substituted by someone else driving around the streets of Chicago with Nathan Leopold on the afternoon of 21 May 1924?

Based on our ability to predict consequences, substituting Loeb has a greater chance of changing the fate of Bobby Franks more than anyone apart from the possible exception of Nathan Leopold. The chance is far higher than if Struthers was substituted. The pair had the greatest effect on the bad consequences and hence we say that they are responsible. This is regardless of any moral capacity (mental capability), any existence of Free Will or otherwise. They are irrelevant. If we want to prevent another occurrence of such an event, it is them that we first examine and this is what we mean by being responsible for an act.



The traditional legal system is wrested on dualist foundations. It makes a distinction between intentional and unintentional actions and punishes freely-chosen intentions that are bad.

Centuries ago, all human behaviour was attributed to the mind. For example, epileptics were considered to be possessed by the devil and punished accordingly. Over time, we have slowly shifted towards the physicalist position that all behaviour is determined by matter and we no longer see responsibility in terms of choice and blame (there is the transition from Dualist ‘mens rea’ and ‘actus reus’ to Physicalist ‘corpus reus’).

Apportioning responsibility is then a consequentialist activity that is part of identifying how similar undesirable situations can be prevented in the future.  It is a risk-based approach.

We will still ‘punish’ epileptics when there is neither mens rea nor actus reus. For example, we will still ban them from driving. But this is no different from punishing others for circumstances beyond their control. For example, the old, the young and many disabled are also ‘banned’ from driving (we might question the use of the word ‘banned’ but the practical effect is the same). They could all protest:

  • ‘It’s not fair! I didn’t ask to be born with epilepsy.’
  • ‘It’s not fair! I didn’t ask to have poor eyesight / slow reaction times / a raised chance of having a heart attack in my advanced years.’
  • ‘It’s not fair! I didn’t ask to have poor impulse control in adolescence.’

There is ‘punishment’, but without the sense of guilt, shame or blame.

And then we also punish others for what we feel should be within their control but apparently isn’t, such as drunk driving. But they might respond:

  • ‘I didn’t ask to be born genetically predisposed to have poor impulse control.’

We balance the risks:

  1. A young driver has good coordination and fast reaction times but poor impulse control and a lack of experience.
  2. A mature driver with infrequent epileptic seizures may have high skill, good coordination, good impulse control but there is a significant risk of causing an accident as a result of having another seizure.
  3. An elderly driver may have high skill and very good impulse control but have deteriorating coordination, eyesight and reaction times. There may also be significant risk of loss of control (heart attack).
  4. A middle-aged drunk driver may be better than the above in all respects except in their poor impulse control and recklessness.

The drunk driver may be no more of a risk than a moderate case of one of the other risk categories. There needs to be an assessment of risks in all cases, considering practical preventative measures in a non-judgemental way.


Prevention and Deterrence

To prevent criminal activity, we can try to improve individuals – such as removing their brain tumours! But there is rather more scope in improving their environment into which the individuals grow, over a long period of time.

But sanctioning transgressors is a major way of preventing crime. I am avoiding the word ‘punish’ here; it does not help. The sanctioning should not be for retribution but for deterrence.


Treatment not Punishment

The convicted mentally ill are treated in a ‘secure hospital’ rather than punished with a prison sentence. But with determinism, all the convicted are deemed to be ‘ill’ to some degree. All prisons become ‘secure hospitals’. We become more compassionate towards convicts. We move away from retributive justice. We are not deliberately trying to make life worse for them. Detention is purely a practical consideration – for the benefit of wider society (protection and deterrence) as well as that of the convicted individual (reform). But there are economic consequences. Incarceration is expensive and treatment is even more so. It seems absurd to provide an environment for convicts on the inside that is better than that for some non-criminal poor on outside – this is difficult to justify. And with this argument, there is an incentive to commit crime – negating deterrence. It is better to spend money improving life for the worst-off outside. Then as standards of living improve on the outside, what is acceptable for the criminal inside will improve too. This is purely a practical issue.


Executive Responsibility

After the exposing of corporate misdemeanours, executives cannot respond with:

 ‘It’s not fair! I didn’t know anything about it. How can you blame me?’

even if they truly did not know. Executives are not directly involved in particulars but they should still be expected to take responsibility and be held responsible because it is part of their job to ensure that those below them are acting appropriately. Despite there being no mens rea, they need to be sanctioned as a deterrent to other executives to motivate them to act appropriately.


Moral Luck

It is commonly felt that driving when intoxicated is significantly worse when it results in injury to others than when it does not – that reckless mens rea without actus rea is less blameworthy than reckless mens rea with actus rea. If we look at future risk, the recklessness of the driver is the same in both cases and so, according to the argument here, they should be sanctioned in the same way. The only real difference is whether the risk does or doesn’t pay off i.e. luck – ‘moral luck’!


Moving Away from Mens Rea and Actus Reus

In both the examples above (Executive Responsibility and Moral Luck), there is a moving away from the requirements for both Mens Rea and Actus Reus to a risk-based ‘Corpus Reus’ approach that is:

  • less blameworthy: responsibility is about identifying where to look for preventative solutions and not about control and retribution.
  • more compassionate: we are more sympathetic towards criminals if we believe they have less than ideal control over events in a physical world.

(But we must recognise that it is also potentially dangerous in going too far in sanctioning.)


  • Mens Rea is associated with (idealized) rational decision-making
  • Actus Reus is associated with specific acts being good or bad.
  • Corpus Reus is associated with the embedded virtue of the individual. As such, it is consistent with virtue ethics.


Getting Rid of Blame

There is nothing remarkable here in the argument that we should abandon blame, punishment and retribution. It is an obvious consequence of moving away from a Dualist to a Physicalist justice system. For example, take these three Neuroscientist ‘heavyweight’ opinions:

Mike Gazzaniga:

‘with determinism there is not blame, and, with not blame, there should be no retribution and punishment’

David Eagleman:

‘Blameworthiness should be removed from the legal argot’.

Joshua Greene (he of the  ‘From Neural Is to Moral Ought’ paper) and Jonathan Cohen:

`We foresee, and recommend, a shift away from punishment aimed at retribution in favour of a more progressive, consequentialist approach to the criminal law’.


The Return of Blame

And yet, blame may still have a role to play in a purely practical consequentialist approach to justice. It may be ‘unfair’ to blame people for do things they could not have not done but cultivating blame in a society will provide some deterrence and hence promote the self-regulation of people for best mutual well-being. It has the same role as ‘shame’ and ‘guilt’.

It seems that blame has been kicked out the front door of morality, only to be let back in through the back door of pragmatism.

The same is true of its opposite, praise.

And with this, we seem to end up with a ‘Hard’ position:

  • If there is Free Will, we blame agents for the bad actions they cause.
  • If there is no Free Will, we still blame agents for their bad actions.


  • If there is Free Will, we praise agents for the bad actions good cause.
  • If there is no Free Will, we still praise agents for their good actions.

The issue of Free Will is irrelevant. As Greene and Cohen said:

‘For the law, neuroscience changes nothing and everything’.

(This is the nineteenth part of the ‘From Neural Is to Moral Ought’ series.)

Posted in Uncategorized | 1 Comment

Shallow Learning


The Cerebellum

When we think of brain we conjure up an image of the cerebral cortex – that that is so large in humans that it wraps all around the top. We do not think of the Cerebellum (the Latin ‘little brain’) tucked underneath this wrinkly cortex at the back, itself having two halves or cortex – the ‘cerebellar cortex.

Cerebrum and Cerebellum

The huge, glamorous Cerebrum is part of the ‘neo-mammalian’ forebrain and is what seems to provide us with the extra something that distinguishes us from other creatures. The Cerebellum is the poorer, more ancient cousin that is part of the more basic, ‘proto-reptilian’ hindbrain and a bit of a spare part. A human cannot survive with significant parts of their Cerebrum missing but a human can survive without their Cerebellum entirely – consciously but with seriously affected motor control. But for normal development:

The number of neurons in the Cerebellum significantly outnumber those in the Cerebrum.

Surprisingly, the ratio of cerebellar to cerebral neurons is quite constant across a large range of creature at a value of about 3.6.

The huge increase in human cerebral neurons that we associate with cognition has been accompanied by a proportion increase in cerebellar neurons that are association with smooth motor actions.

Reptile brain: Even in a reptile’s brain, the forebrain (cerebrum) is larger than the cerebellum in volume – but not in the number of neurons.


The Cerebellum and Artificial Neural Networks

The cerebellum is undoubtedly a simpler structure in that it has a much more regular structure. The cerebellar cortical sheet is folded up into regular grooves in contrast to the more familiar wrinkly cerebral cortex. This makes it more amenable to understanding – a better starting point both:

  • scientifically, as a way to understand the brain, and
  • in ‘bioinspired’ engineering ‘Artificial Neural Networks’, as a way to build more intelligent, powerful and efficient computers.

The engineering helps the scientific. Being able to build and then successfully run a physical simulation of a model of the cerebellum is a vastly superior to conjecturing theories.


Deep Learning

Unfortunately, progress in the usefulness of simulated neural networks was disappointing. It has proven very difficult to get neural networks working for more than 3 layers.

Unfortunately, progress in simulated neural networks was disappointingly slow and it gave artificial neural networks a bad name. It was very difficult to get them working for networks of more than 3 layers (stepping up from an artificial cerebellum to an artificial cerebral cortex) , which was needed if they were to do anything useful. But small progress over many years yields results and this is now a key technology for Google / Siri speech recognition. A leader in this field is Geoffrey Hinton who coined the name for this sub-discipline: ‘Deep Learning’.

A central, recurring concept on my blogsite is the ‘hierarchy of predictors’, with frequent references to Karl Friston’s ‘variational free energy’ theory. Hinton’s deep learning engineering work and its very terminology is the foundation for Friston’s work. Hinton is a co-author and former colleague of Karl Friston at UCL.

Photo credit: Michael Tyka.

Deep Learning 1: Tree in field with clouds, as perhaps ‘seen’ by a Canon EOS 5Ds.

Credit: Google

Deep Learning 2: Tree in field with clouds, as perhaps ‘seen’ by some deep layer within your brain! Google’s deep learning network tries to relate features in the original image with those it has seen before. Clouds get associated with sheep-like creatures.

Credit: Google

Deep Learning 3: Canon EOS 5Ds, as perhaps ‘seen’ by some deep layer within your brain! Produced using DreamScope


Shallow Learning

Artificial Neural Networks are the poorer, more ancient, less glamorous  cousin of ‘Deep Learning’ just as the cerebellum is the poorer, more ancient, less glamorous cousin of the cerebral cortex. They are examples of ‘shallow learning’ as it were.

To get to deep learning, we must first wade through shallow learning. A seminal starting place for this is Frank Albus’s paper “A Theory of Cerebellar Function” which is available at various places on the interweb as a scanned PDF such as here. Below, I provide a text (searchable) version (but with no guarantees about being completely error-free).


A Theory of Cerebellar Function


Mathematical Sciences 10 (1971), 25-61



Copyright 1971 by American Elsevier Publishing Company, Inc.



Cybernetics and Subsystem Development Section

Data Techniques Branch

Goddard Space Flight center

Greenbelt, Maryland

Communicated by Donald H. Perkel



A comprehensive theory of cerebellar function is presented, which ties together the known anatomy and physiology of the cerebellum into a pattern-recognition data processing system. The cerebellum is postulated to be functionally and structurally equivalent to a modification of the classical Perceptron pattern-classification device. It is suggested that the mossy fiber → granule cell → Golgi cell input network performs an expansion recoding that enhances the pattern -discrimination capacity and learning speed of the cerebellar Purkinje response cells.

Parallel fiber synapses of the dendritic spines of Purkinje cells, basket cells, and stellate cells are all postulated to be specifically variable in response to climbing fiber activity. It is argued that this variability is the mechanism of pattern storage. It is demonstrated that, in order for the learning process to be stable, pattern storage must be accomplished principally by weakening synaptic weights rather than by strengthening them.



A great body of facts has been known for many years concerning the general organization and structure of the cerebellum. The regularity and relative simplicity of the cerebellar cortex have fascinated anatomists since the earliest days of systematic neuro-anatomical observations. In just the past 7 or 8 years, however, the electron microscope and refined micro-neurophysiological techniques have revealed critical structural details that make possible comprehensive theories of cerebellar function. A great deal of the recent physiological data about the cerebellum come from an elegant series of experiments by Eccles and his co-workers. These data have been compiled, along with the pertinent anatomical data, in book form by Eccles et al. [5]. This book also sets forth one of the first reasonably detailed theories on the function of the cerebellum. Another theory, published in 1969 by Marr [11], in many ways extends and modifies the theory of Eccles et al.

The theory presented here was developed independently of the Marr theory but agrees with it at many points, at least in the early sections. This article, developed from a study of Perceptrons [15] and memory model cells [1], applies these results to the structure of the cerebellum as summarized by Eccles et al. [5]. The theory presented here extends the Marr theory and proposes several modifications based on principles o f information theory. These extensions and modifications relate mainly to the role of inhibitory interneurons in the learning process, and to the detailed mechanism by which patterns are stored in the cerebellum.



To credit each piece of information presented in this section to its original source would be very tedious. Everything in this section is taken directly either from Eccles et al. [5] or Fox et al. [7]. Therefore a single reference is now made to these sources and to the extensive bibliographies that appear in them.

A. Mossy fibers

Mossy fibers constitute one of the two input fiber systems to the cerebellum. Input information conveyed to the cerebellum via mossy fibers is from many different areas. Some mossy fibers carry information from the vestibular system or the reticular formation, or from both. Others carry information that comes from the cerebral cortex via the pons. The mossy fiber system that has been most closely studied relays information from the various receptor organs in muscles, joints, and skin. Mossy fibers that arrive via the dorsal spinal cerebellar tract are specific as regards modality of the muscle receptor organ, from either muscle spindles or tendon organs, and have a restricted receptor field, usually from one muscle or a group of synergic muscles.

Mossy fibers from the ventral spinal cerebellar tract are almost exclusively restricted to Golgi tendon organ information but are more generalized as regards specific muscles than those from the dorsal spinal cerebellar tract. The ventral tract fibers seem to signal stages of muscle contraction and interaction between contraction and resistance to movement of a whole limb. Other mossy fibers carry information from skin pressure receptors and joint receptors. There are continuous spontaneous discharges on most mossy fibers, at rates between 10 and 30 per second, even when the muscles are completely relaxed.

Mossy fibers enter the cerebellum and arborize diffusely throughout the granular layer of the cortex. A single mossy fiber may send branches into two or more folia. These branches travel toward the top of the folia, giving off further branches into the granular layer of the sides of the folia, finally terminating in an arborisation at the top of the folia. Each branch of a mossy fiber terminates in a candelabrum -shaped arborisation containing synaptic sites called mossy rosettes. There is minimum distance of 80-100µm between rosettes from a single mossy fiber. It is estimated that each branch of a mossy fiber entering the granular layer of the cerebellum produces from 20 to 50 or more rosettes. Thus a single mossy fiber may produce several hundred rosettes considering all its branches. The mossy rosettes are the site of excitatory synaptic contact with dendrites of the granule cells. The mossy fibers also send collaterals into the intra-cerebellar nuclei, where they make excitatory synaptic contact with nuclear cells.


B. Granule Cells

The granule cells are the most numerous cells in the brain. It is estimated that in humans there are 3 x 1010 granule cells in the cerebellum alone. Granule cells possess from one to seven dendrites, the average being four. These dendrites are from 10 to 30µm long and terminate with a characteristic claw-shaped ramification in the mossy rosettes. In view of the spacing between rosettes on a mossy fiber, it is highly unlikely that a granule cell will contact two rosettes from the same mossy fiber. Thus an average granule cell is excited by about four different mossy fibers. Since approximately 20 granule cell dendrites contact each rosette, this means that there are about five times as many granule cells as mossy rosettes, and at least 100-250 times as many granule cells as mossy fibers.  Since a mossy fiber enters several folia, there may even be four or five times this many granule cells per mossy fiber.

Each granule cell gives off an axon, which rises towards the surface of the cortex. When this axon reaches the molecular layer, it makes a T-shaped branch and runs longitudinally along the length of the folia for about 1.5mm in each direction. These fibers are densely packed and are only about 0.2-0.3µm in diameter. The parallel fibers make excitatory synaptic contact with Purkinje cells, basket cells, stellate cells, and Golgi cells.


C. Golgi Cells

Golgi cells have a wide dendritic spread, which is approximately cylindrical in shape and about 600µm in diameter (see Fig. 1). This dendritic tree reaches up into the molecular layer, where it is excited by the parallel fibers, and down into the granular layer, where it is excited by the mossy fibers. The Golgi axon branches extensively and inhibits about 100,000 granule cells located immediately beneath its dendritic tree. Every granule cell is inhibited by at least one Golgi cell. The Golgi axons terminate on the mossy rosettes, inhibiting granule cells at this point. Fox et al. [7] state that the axon arborisations of neighboring Golgi cells overlap extensively, so that two or more Golgi cells frequently inhibit a single granule cell. Note the overlapping fields shown in Fig. 3. This overlapping is a point of disagreement between Eccles et al. [5] and Fox et al. [7]. It appears, however, that Golgi cells must overlap, considering their size and that there are approximately 10% as many Golgi cells as Purkinje cells.

James Albus: A Theory of Cerebellar Function

Fig. 1

FIG. 1. A typical Golgi celI. Its arborisations extend throughout an approximately cylindrical volume 600µm in diameter.

The size of the dendritic spread of the Golgi cell as shown in Figs. 1 and 3 is a point of some uncertainty. Eccles et al. [5, page 205 and Fig. 116] state that the spread of the Golgi dendritic tree is about three times that of a Purkinje cell (i.e., 600-750µm). However, drawings by Cajal [2] and Jakob [10], and statements and drawings elsewhere in Eccles et al. [5, page 60 and Fig. 1] seem to indicate the dendritic spread for Golgi cells to be only slightly larger than that of Purkinje cells (i.e.,  250-300µm). However, even with a dendritic spread of only 300µm, the Golgi dendritic fields would still have significant overlap, as can be shown by drawing 300µm diameter circles around the Golgi cell bodies in Fig. 3.


D. Purkinje Cells

The Purkinje cell has a large and very dense dendritic tree. The dendritic tree of the Purkinje cell is shaped like a flat fan and measures on the average about 250µm across, about 250µm high, and only about 6µm thick, as shown in Fig. 2. The flat face of this fan is positioned perpendicular to the parallel fibers that course through the branches of the tree. It is estimated that around 200,000 parallel fibers pierce the dendritic tree of each Purkinje cell, and that in passing virtually every parallel fiber makes a single synaptic contact with the dendrites of the Purkinje cell. At the site of a parallel fiber Purkinje dendritic synapse, the parallel fiber enlarges to about 1µm in diameter and is filled with synaptic vesicles. A spine grows out of the Purkinje dendrite and is enclosed by an invagination of the enlarged part of the parallel fiber.

James Albus: A Theory of Cerebellar Function

Fig. 2

FIG. 2. A typical Purkinje cell. its dendritic tree is restricted to a volume approximately 250µm x 250µm x 6µm.

A unique characteristic of the Purkinje cell is that there is virtually no intermingling of it s dendritic tree with that of other cells. The Purkinje cell bodies are beet shaped and about 35µm in diameter. They are scattered in a single layer over the cortex at intervals of about 50µm along the direction of the parallel fibers, and about 50-100µm in the transverse direction. Thus the fan-shaped dendritic trees overlap in the transverse direction but are offset in the longitudinal direction sufficiently so as to not intermingle. Figure 3 shows a top view looking down on the packed Purkinje dendritic trees. The trees are about 6µm thick and are separated by about 2-4µm. Thus a parallel fiber encounters a different Purkinje dendritic tree every 8-10µm. Since a parallel fiber synapses with virtually every Purkinje dendritic tree it passes, a 3mm parallel fiber contacts about 300 Purkinje cells.

James Albus: A Theory of Cerebellar Function

Fig. 3


FIG. 3. View of cerebellar cortex looking down on top of Purkinje dendritic trees. Purkinje cells are shown here spaced approximately every 50µm in the longitudinal direction and every 60µm in the transverse direction. They are staggered so that the dendritic trees do not intermingle. Four Golgi cells are shown with the outline of their area of arborisation traced. There is one Golgi cell to every nine Purkinje cells. Note the extensive overlapping of Golgi arborisation. Each point on the cortex is subject to influence by about nine different Golgi cells.


Purkinje cell axons constitute the only output from the cerebellar cortex. These axons make inhibitory synapses with the cells of the cerebellar nuclei and of the Deiters nucleus. In addition, Purkinje axons send recurrent collaterals to other Purkinje cells, basket cells, stellate cells, and Golgi cells.


E. Basket Cells

The basket cells also have flat fan-shaped dendritic trees, which extend upward in the 2-4 µm  spaces between Purkinje dendritic layers. Basket dendritic trees are much less dense than those of Purkinje cells, but cover roughly the same area. Basket dendrites also receive excitatory synaptic contacts from parallel fibers via dendritic spines. Basket cell dendritic spines are much sparser, more irregularly spaced, longer, and thinner than Purkinje spines. They are very often hook shaped. Basket cell bodies, about 20 µm in diameter, are located in the lower third of the molecular layer. Basket cells are 15%-20% more numerous than Purkinje cells.

Basket cells send out axons in the transverse direction, perpendicular to the parallel fiber pathways. These axons branch and send descending collaterals, which makes strong inhibitory synapses around the preaxon portion of the Purkinje cells. They also send ascending collaterals into the Purkinje cell dendritic trees, where they form further inhibitory synapses. Each basket cell inhibits about 50 Purkinje cells over an elliptical area about 1000µm x 300µm. The basket cells do not inhibit the Purkinje cell immediately adjacent, but begin their inhibitory activity one or two cells away, and inhibit Purkinje cells out to about 1mm away in the transverse direction. Thus any parallel fiber that excites a Purkinje cell is not likely to also inhibit the same Purkinje cell via a basket cell.


F. Stellate Cells

Stellate cells have dendritic arborisation very similar to that of basket cells, although somewhat smaller. On the basis of axon distribution, there are two types of stellate cells. Stellate “a” cells send axons into Purkinje dendritic trees immediately adjacent, whereas stellate “b” cells send their axons transversely, making inhibitory contact with Purkinje dendrites in an area similar in size, shape, and relative position to that of basket cells. Functionally, the main distinction between basket cells and stellate “b” cells seem to be that stellate “b” cells are located higher in the molecular layer and send few, if any, axon collaterals to the Purkinje pre-axon, or “basket” region.  however, there are many intermediate forms and the cell types seem to change progressively from basket cells in the upper granular layer to stellate “b” cells in the mid and upper molecular layer. Thus in this article the basket cells and stellate “b” cells will be assumed to perform roughly the same functions, which include receiving excitatory inputs from parallel fibers and transmitting inhibitory signals to Purkinje cells.


G. Climbing fibers

A second type of input fibers, the climbing fibers, also enters the cerebellum. These fibers are distinguished by the fact that each Purkinje cell receives a single climbing fiber in a 1: 1 fashion. They are called climbing fibers because they contact the Purkinje cell at the base of its dendritic tree and climb up the trunk of the tree, making repeated strong excitatory synaptic contacts. A single spike on a climbing fiber can evoke a complex burst of Purkinje activity. The exact nature of this activity is not entirely clear. Observations by Thach [17] seem to indicate that this complex burst of activity consists of a single Purkinje axon spike followed by several milliseconds of spike-like activity propagating throughout the Purkinje dendritic tree. This dendritic activity is accompanied by intense cell depolarization and a pause in spontaneous Purkinje axon spike activity for 15-30ms. This depolarization and pause was termed the inactivation response by Granit and Phillips [8].

The climbing fibers are usually thought to originate primarily in the inferior olivary nucleus and make a precise point-to-point mapping from the olivary nucleus to the cerebellar cortex. There is, however, some indication from cell counts done in the olivary nucleus [6], that either each climbing fiber branches about 15 times before reaching the cerebellum, or the majority of climbing fibers come from other sources outside the olivary nucleus.

Information carried by climbing fibers comes from a great variety of areas. The inferior olive receives afferents from proprioceptive end organs as well as all lobes of the cerebral cortex. The inferior olive also receives a strong projection from the red nucleus and the periaqueductal gray via the central tegmental tract.

The response of climbing fibers to peripheral stimulation is quite distinct from that of mossy fibers. A climbing fiber will typically respond to pinching the skin and deeper tissue anywhere within a receptive field, which may encompass an entire limb [17]. In monkeys performing a motor task it has been observed that climbing fiber spikes are correlated with quick movements made in response to external stimuli, but not with self-paced movements, such as rapidly alternating wrist motions [18, 19]. This evidence would seem to indicate that information carried on climbing fibers is the product of a great deal of integration through higher centers.

In addition to the precise one-for-one climbing fiber contact with Purkinje cells, climbing fibers also put out three sets of collaterals; that is,

(1) a climbing fiber sends collaterals to synapse on basket cells and stellate cells in the immediate vicinity of the Purkinje cell that it contacts;

(2) a climbing fiber sends collaterals to one or more Golgi cells located within an elliptical region about 1000µm x 300µm  centered on the Purkinje cell that it contacts;

(3) a climbing fiber sends collaterals to nuclear cells in the cerebellar nuclei and in the Deiters nucleus.


H. Nuclear Cells

The nerve cells of the cerebellar nuclei and Deiters nucleus are of at least two types. One type is large multipolar neurons, with relatively simple and irregular dendritic arborisation. The axons from cells of the cerebellar nuclei go to the nucleus ventralis lateralis of the thalamus, to the red nucleus, to the pontomedullary reticular formation, and to the vestibular nuclei. Cells from the Deiters nucleus join the vestibulospinal tract. Thus some of these efferents send information toward the sensorimotor cortex, others toward the spinal motor neurons. The second type of nuclear neuron is smaller, with short axons, possibly a Golgi type II cell.

The cerebellar nuclei and Deiters nucleus cells receive excitatory inputs from climbing fiber collaterals and mossy fiber collaterals. They receive inhibitory inputs from Purkinje axons.




A. The Classical Perceptron

Since the neurophysiologist is usually not well versed in the field of pattern -recognition theory, a few short tutorial paragraphs concerning the pattern -recognition device known as the Perceptron are included to form a basis for arguments relating the cerebellum to the Perceptron. Again, rather than crediting all the many contributors to the theory of pattern-recognition and linear threshold devices, we refer the reader to the review books by Nilsson [14] and Minsky and Papert [13] for extensive references to the literature. These books contain mathematical proofs for most of the informal assertions made in following paragraphs.

The Perceptron developed by Rosenblatt [15] was inspired in large measure by known or presumed properties of nerve cells. In particular, a Perceptron possesses cells with adjustable-strength synaptic inputs of competing excitatory and inhibitory influences that are summed and compared against a threshold. If the threshold is exceeded,  the cell fires. If not, the cell does not fire. The original Perceptron was conceived as a model for the eye (see Fig. 4).

James Albus: A Theory of Cerebellar Function

Fig. 4


FIG. 4. Classical Perceptron. Each sensory cell receives stimulus either +1 or 0. This excitation is passed on to the association cells with either a +1 or -1 multiplying factor. If the input to an association cell exceeds 0, the cell fires and outputs a 1; if not, it outputs 0. This association cell layer output is passed on to response cells through weights Wi,j, which can take any value, positive or negative. Each response cell sums its total input and if it exceeds a threshold, the response cell Rj fires, outputting a 1; if not, it outputs 0. Sensory input patterns are in class 1 for response cell Rj if they cause the response cell to fire, in class 0 if they do not. By suitable adjustment of the weights Wi,j, various classifications can be made on a set of input patterns.

Patterns to be recognized, or classified, are presented to a retina, or layer of sensory cells. Connections from the sensory cells to a layer of associative cells perform certain (perhaps random, perhaps feature-detecting) transformations on the sensory pattern. The associative cells then act on a response cell through synapses, or weights, of various strengths. The firing, or failure to fire, of the response cell performs a classification or recognition on the set of input patterns presented to the retina.


B. Perceptron Learning

The Perceptron shows a rudimentary ability to learn. If a Perceptron is given a set of input patterns and is told which patterns belong in class 1 and which in class 0, the Perceptron, by adjusting its weights, will gradually make fewer and fewer wrong classifications and(under certain rather restrictive conditions) eventually will classify or recognize every pattern in the set correctly. The weights usually are adjusted according to an algorithm similar to the following.

  1. If a pattern is incorrectly classified in class 0 when it should be in class 1, increase all the weights coming from association cells that are active.
  2. If a pattern is incorrectly classified in class 1 when it should be in class 0, decrease all the weights coming from association cells that are active.
  3. If a pattern is correctly classified, do not change any weights.

Four features of this algorithm are common to all Perceptron training algorithms, and are essential to successful pattern recognition by any Perceptron-type device:

  • Certain selected weights are to be increased, others decreased.
  • The average total amount of increase equals the total amount of decrease.
  • The desired classification, together with the pattern being classified, governs the selection of which weights are varied and in which direction.
  • The adjustment process terminates when learning is complete.

The Perceptron works quite well on many simple pattern sets, and if the sensory-association connections are judiciously chosen, it even works on some rather complex pattern sets. For patterns of the complexity likely to occur in the nervous system, however, the simple Perceptron appears to be hopelessly inadequate. As the complexity of the input pattern increases, the probability that a given Perceptron can recognize it goes rapidly to zero. Alternatively stated, the complexity of a Perceptron required to produce any arbitrary classification, or dichotomy, on a set of patterns increases exponentially as the number of patterns in the set. Thus the simple Perceptron, in spite of it s tantalizing properties, is not practical as a realistic brain model without significant modification.

C. The Binary Decoder Perceptron

This lack of power of the conventional Perceptron can be overcome by replacing the sensory -association layer connections with a binary decoder, as shown in Fig. 5. It is then possible to trivially construct a Perceptron that will produce any arbitrary pattern classification. A binary decoder can be considered to be a recoding scheme that recodes a binary word of N bits into a binary word of 2N bits. This recoding introduces great redundancy into the resulting code. Each association cell pattern is restricted to a unique association cell in the 1 condition, all other association cells in the 0 condition. However, a binary decoder Perceptron is seldom seriously considered as a brain model for several reasons. First, the binary decoder requires such specific wiring connections that it is entirely too artificial to be imbedded in the rather random-looking structure of the brain. Second, the number of association cells increases exponentially as the number of inputs. Thus N input fibers require 2N association cells. Simple arithmetic thus eliminates the binary decoder Perceptron as a brain model.

James Albus: A Theory of Cerebellar Function

Fig. 5

FIG. 5. Binary decoder Perceptron. Each association cell firing uniquely corresponds to one of the possible 2N input patterns. This type of Perceptron can perform any desired classification of input patterns. It has, however, no capacity for generalizing.


D. The Expansion Recoder Perceptron

However, there does exist a middle ground between a simple Perceptron and a binary decoder Perceptron. Assume a decoder, or rather a recoder, that codes N input fibers onto 100N association cells, as shown in Fig. 6. Such a recording scheme provides such redundancy that severe restrictions can be applied to the 100N association cells without loss of information capacity. For example, it is possible to require that of the 100N association cells, only 1% (or less) of them are allowed to be active for any input pattern. That such a recoding is possible without loss of information capacity is easily proven, for  . That such a recoding increases the pattern-recognition capabilities of a Perceptron is certain, since the dimensions of the decision hyperspace have been expanded 100 times. The amount of this increase under conditions likely to exist in the nervous system is not easy to determine, but it may be enormous. It can be shown that . Thus 2N possible input patterns can be mapped onto 100N possible association cell patterns. If this is done randomly, the association cell patterns are likely to be highly dissimilar and thus easily recognizable. The ratio of 100N/2N = 50N rapidly increases as N becomes large.


James Albus: A Theory of Cerebellar Function

Fig. 6


FIG. 6. N → 100N Expansion recoder Perceptron. The association cell firing is restricted such that only 1% of the association cells are allowed to fire for any input pattern. This Perceptron has a large capacity and fast learning rate, yet it maintains the number of association cells within limits reasonable for the nervous system.


The restriction that only 1% of the association cells are allowed to be active for any input pattern means that any association cell participates in only 1% of all classifications. Thus its weight needs adjusting very seldom and there is a fairly good probability that its first adjustment is at least in the proper direction. This leads to rapid learning.



A. Pattern Recoding in the Cerebellum

The granular layer of the cerebellum takes in information on mossy fibers and puts out information on parallel fibers. There are from 100 to 600 times as many parallel fibers as mossy fibers. Thus the granule cells can be said to be association cells that recode information from N inputs to at least 100N outputs. What can be said about the nature of this recoding? It was already noted that no granule cell receives more than one excitatory input from any one mossy fiber. it was also noted that the mossy rosettes from a single mossy fiber were widely distributed over several folia with a rather uniform random distribution. Thus, by the central limit theorem of probability, the distribution of granule cells with any given number of excitatory inputs will approach a Gaussian distribution with B equal to the extent of the mossy rosette distribution. Since the mossy rosette distribution of each mossy fiber extends over several folia, the Gaussian curve will be flat, for all practical purposes, over regions large compared with a single folia, even more so compared with any individual cell.

Since virtually no granule cells are excited at two sites by the same mossy fiber the relative abundance of granule cells simultaneously excited by 17 active mossy fibers will be proportional to 1/n.

Thus at any instant the surface of the cerebellum should be dotted nearly uniformly randomly with granule cells whose input consists of one mossy fiber excitation. The surface of the cerebellum should also be dotted randomly, but less densely, with granule cells excited by two mossy fibers; and so on, progressively less densely with granule cells excited by three, and four, and five, up to seven mossy fibers. The total density of this dotting depends on the percentage of mossy fibers active.

The particular granule cells that actually fire as a result of various levels of mossy fiber excitation depend on the threshold levels of the granule cells. Only granule cells with enough excitatory inputs to exceed threshold will fire. This threshold for granule cells is regulated by Golgi cell activity.

The output of the granule cells is sampled by the Golgi cells via synapses with parallel fibers. This sampling is over an area approximately 250-650µm in diameter. Each Golgi cell feeds back inhibitory influences to about 100,000 granule cells. Neighbouring Golgi cells overlap extensively in their dendritic fields and in their axon arborisation. This very broad general feedback system suggests the function of an automatic gain control. Thus it is argued that the Golgi cells serve to maintain granule cell, and hence parallel fiber, activity fixed at a relatively constant rate. If few parallel fibers are active, Golgi inhibitory feedback decreases, allowing granule cells with lower numbers of excitatory inputs to fire. If many parallel fibers become active, Golgi feedback increases, allowing only those few granule cells with many active mossy inputs to fire.

The Golgi cells also have input from mossy fibers directly, a so-called feed-forward inhibition. This input tends to raise granule cell threshold levels when mossy fiber activity is large, and decrease granule thresholds when mossy fiber activity is small. This effect is also such as to stabilize the amount of parallel fiber activity.

To obtain a quantitative feel for what is occurring via these two types of Golgi cell inputs, consider Fig. 7. From the figure we can write

P = (M- Z + Sp)Gr                 (1)

z = (KM + P)Go                   (2)


  • P is the expected value of the spike rate for a parallel fiber,
  • M, the expected value of the spike rate for a mossy fiber,
  • I, the expected value of the spike rate for a Golgi cell,
  • Gr, the average transfer gain of granule cells,
  • Go, the average transfer gain of Golgi cells,
  • K, the relative strength of mossy fiber input on Golgi cells to that of parallel fiber input, and
  • Sp, the expected value of the spontaneous rate for a granule cell.

Combining (1) and (2) and differentiating with respect to M gives

dP/dM = Gr(1-KGo)/(1+GrGo)                                            (3)

From Eq. (3) it can be seen that by proper adjustment of parameters (i.e., KGo ≈ 1) it is possible to make P, the expected value of the spike rate for a parallel fiber, very nearly constant despite variations in mossy fiber input rate M.

It might not be unreasonable to assume values for Go and Gr as follows.

Gr = (1 granule spike)/(1 mossy spike) x (divergence of 100) = 100

Go = (1 Golgi spike)/(1000 parallel spikes) x (divergence of 100,000) = 100,000

These values substituted in (3) give

dP/dM ≈ (1-100K)/100                                             (4)

Thus if K ≈ 0.01 (i.e. 1 Golgi spike/105 mossy fiber  spikes), the expected value of parallel fiber activity rate P is nearly constant. This, of course, does not mean that parallel fiber patterns would be independent of mossy fiber patterns, but merely that the overall level of activity (i.e., spikes per second) of parallel fibers could be constant in spite of what percentage, or at what rate, the mossy fibers are firing.

The mossy fiber inputs to Golgi cells probably also serve to stabilize parallel fiber rates under transient conditions. The feedback path via parallel fibers involves delays. The feed-forward path is undoubtedly faster acting. The net result of Golgi cell activity seems therefore to be to stabilize the level of parallel fiber activity to a nearly constant value under all conditions.

It will thus be hypothesized that the surface of the cerebellum is dotted randomly with active parallel fibers and that the density of this activity is very nearly uniform, both spatially and temporally. It was noted earlier that if this density of parallel fiber activity is 1% or less, patterns are easily recognized and quickly learned. Furthermore, a 1% activity level is more than adequate from an information theory standpoint. Therefore, it will be further hypothesized that the density of parallel fiber activity is on the order of 1%.

James Albus: A Theory of Cerebellar Function

Fig. 7

FIG. 7. Parallel fiber rate control circuit.

  • M, expected value of mossy fiber input in spikes per second;
  • P, expected value of parallel fiber output;
  • I,expected value of Golgi cell rate:
  • Sp, expected value of spontaneous granule celrate;
  • Gr, transfer gain of granule cell network;
  • Go, transfer gain of Golgi cellnetwork;
  • K, relative strength of mossy fiber input on Golgi cells to that of parallel fiber input.


As was shown previously, recoding from N fibers to 100N fibers, under the restriction that only 1% of the output fibers are active for any input pattern, expands the number of possible patterns from 2N to about 100N, or an expansion of around 50N. In the cerebellum the number of input mossy fibers is approximately 5 x 104/mm2. Thus the pattern -expansion capacity of 1mm2 of cerebellar cortex is on the order of 5050000. Just what this means in increased pattern-recognition capability is unclear, but we get the feeling it is quite significant. This argument is even more compelling when it is realized that the mossy fiber system undoubtedly carries only a very restricted subset of the 2N (really RN where R is the number of distinguishable levels of fiber firing rate) possible input patterns. Thus the recoding from N fibers to 100N fibers may well produce an enormous increase in classification capability of cells in the cerebellum functioning as pattern-recognition response cells.

If this hypothesis of mossy fiber recoding by granule cells is correct, it implies that, to a neurophysiologist probing with an electrode, any parallel fiber should appear to fire uncorrelated with neighboring parallel fibers, at least in an unanaesthetised awake preparation. An intuitive feel for why this recoding process is advantageous can be obtained from a simple example. Consider a Perceptron with only two association cells. There are then at most four different patterns of association cell firings. Suppose now it is desired for the response cell to fire whenever a sensory pattern occurs that produces an association cell pattern of 01 or 10, and it is desired for the response cell not to fire for any association cell pattern of 00, and 11. Try as we might it is impossible to find any combination of weights that can cause the response cell to have this behavior. it is rather simple to make the response cell fire on 01, and 10, and to not fire on 00. However, the 11 pattern creates a problem.

If, however, an expansion recoder is put between the sensory cells and the association cells, so that there are, for example, five association cells, the problem is much easier. The sensory pattern that previously produced the association cell pattern:

01 now might produce 00100;

10 now might produce 01001;

00 now might produce 10000;

11 now might produce 00010.

It is trivial to adjust weights so that association cell patterns 00100 and 01001 cause the response cell to fire, and the patterns 10000 and 00010 cause the response cell not to fire. The training procedure would consist of at the most one adjustment for each pattern.

A computer simulation of this type of recoding process has been run for a more complicated case. Twenty (20) mossy fibers were modeled. An expansion recoder of mossy rosettes, granule cells, and Golgi cells was modeled that transformed 20 mossy fiber firing rates into 2000 granule cell firing rates. Golgi cell feedback restricted the granule cells so that only about 1% of them could fire. The result was that for two very similar mossy fiber patterns the granule cell firing patterns were similar in some respects but quite distinguishable in others. Some granule cells responded exactly the same for both mossy patterns, but other granule cells responded entirely differently. This implies that mossy fiber input patterns that would be very difficult to distinguish if put directly into a Perceptron response cell are easily distinguishable after passing through the pattern recoder.


B. The Purkinje Response Cell

It has been argued that the parallel fibers contain information coded in an ideal manner to serve as the input to a Perceptron response cell. It will now be argued that the Purkinje cells serve a function similar to Perceptron response cells.

From a purely structural standpoint, the Purkinje cell certainly is related to granule cells very similarly to the way a Perceptron response cell is related to association cells. Each Purkinje cell has an enormous fan-in; each granule cell has a large fan-out. It is hard to conceive a more efficient parts layout for this type of circuit than the parallel fiber-Purkinje dendrite arrangement. A flat tree with input fibers piercing it at right angles creates the maximum possible fan-in for each Purkinje cell. The flat, closely stacked Purkinje dendritic trees allow the maximum possible fan-out for each parallel fiber. Any other arrangement would almost certainly decrease the ratio of computational elements to the brain tissue mass.

We may reasonably ask why this same structure does not exist in the cerebral cortex. The answer may well lie in the differences between the functions required of the cerebrum and of the cerebellum. The portion of the cerebral cortex that is best understood from a functional standpoint is the visual cortex. Here it is well known that a great amount of feature detection [9] takes place, such as line detection, edge detection, motion detection, and binocular correlation. Many of these transformations are translationally invariant over certain fields of view; that k, cells in the visual cortex respond to certain global features of the visual input irrespective of small changes in retinal coordinate position. it would appear, then, that in the cerebrum considerable feature -detection processing precedes, and perhaps is intermingled with, the expansion recoding circuitry. The geometrical requirements of translationally invariant global feature detection require elaborate plexuses of fibers crisscrossing in the cerebral cortex, and cells with their dendritic fields geometrically positioned to extract feature-dependent inputs from these fiber plexuses. Any pattern recoding and pattern -recognition circuitry interspersed in this tangle would certainly be less compact and regular than that found in the cerebellar cortex.

On the other hand, in the cerebellum, granule cell receptive fields [17] show no evidence of feature detection analogous to that found in cerebral cortical cells. This is not too surprising since there should be no need for translationally invariant feature detection in a system that senses body conditions and controls motor commands. The problem of the cerebellum is merely to recognize patterns of information from proprioceptive receptors and to generate the appropriate motor command signals. The circuitry to do this is arranged as compactly as possible. The result is the beautiful regularity of the cerebellum.

Large portions of the cerebellum receive inputs from and project back toward the cerebral cortex. Since the anatomy of this portion of the cerebellum i s not appreciably different from the portion that interacts with the periphery, it is reasonable to assume that the transfer function is similar (i.e., a mossy fiber pattern input producing a Purkinje cell pattern output).

The nervous system has one constraint that does not exist in the Perceptron. In the nervous system a particular type of cell is either excitatory or inhibitory. Any single granule cell thus cannot be excitatory on one Purkinje cell and inhibitory on another. The basket and stellate b cells appear to provide a means of overcoming this deficiency. Basket and stellate b cells receive excitation from parallel fibers and inhibit Purkinje cells located transversely. This arrangement allows any parallel fiber to excite a number of Purkinje cells along its length, and to inhibit another group of Purkinje cells located on its flanks. As noted before, a parallel fiber is not likely both to excite a Purkinje cell directly and also to inhibit the same Purkinje via basket or stellate b cells. Thus, as shown in Fig. 8, the Purkinje cell looks very much like a Perceptron response cell. The only logical difference is that the inhibitory input to the Purkinje cell is collected and summed by flanking basket and stellate b cells before being relayed to the Purkinje cell. The inhibitory input of each basket and stellate b cell is also sent to many other Purkinje cells, but this fact is immaterial to any individual Purkinje. It is influenced only by the inputs it receives, not by the other places those inputs may go. In order to complete the analogy between Purkinje cells and Perceptron response cells, it is necessary to introduce adjustable synaptic strengths.

James Albus: A Theory of Cerebellar Function

Fig. 8

FIG. 8. Cerebellar Perceptron:

  • P, Purkinje cell;
  • B, basket cells;
  • S, stellate b cells.

Each Purkinje cell has inputs of the type shown.


C. The Hypothesis of Variable Synapses

The fundamental hypothesis of this article is that parallel fiber synapses are adjustable on both Purkinje cell dendrites and stellate and basket cell dendrites. The mechanism of change in both cases is hypothesized to be closely related to climbing fiber input activity. It will be argued that both excitatory and inhibitory influences on Purkinje cells are specifically modified under the control of climbing fiber activity patterns.

Each Purkinje cell is contacted by a single climbing fiber. In a conscious animal the climbing fibers fire in short bursts of one or more spikes at a rate of about 2 bursts/sec [5, 18]. Each climbing fiber burst causes a single spike on the Purkinje axon followed by a complex burst of spike-like activity in the Purkinje dendritic tree and intense depolarization of the Purkinje cell. The single axon spike is followed by a pause in the spontaneous Purkinje axon spike activity for 15-30ms. This pause, accompanied by intense depolarization, was first observed by Granit and Phillips [8] and was termed the inactivation response to distinguish it from a normal pause in activity resulting from hyperpolarization. After the 15- to 30ms inactivation response, the cell gradually recovers its spontaneous firing rate over a period of 100-300ms [3]. As it approaches normal, the cell becomes once again responsive to parallel fiber input activity.

It is now hypothesized that the inactivation response pause in Purkinje spike rate is an unconditioned response (UR) in a classical learning sense caused by the unconditioned stimulus (US) of a climbing fiber burst. It is further hypothesized that the mossy fiber activity pattern ongoing at the time of the climbing fiber burst is the conditioned stimulus (CS). If this is true, the effect of learning should be that eventually the particular mossy fiber pattern (CS) should elicit a pause (CR) in Purkinje activity similar to the inactivation response (UR) that previously had been elicited only by the climbing fiber burst (US). In order to accomplish this result it is necessary to postulate that the climbing fiber input to the Purkinje cell not only causes the Purkinje cell to pause momentarily but also weakens any parallel fiber synapses that are tending to cause the Purkinje to fire during the inactivation response.

A possible mechanism for such weakening might be that there exists a critical interval near the end of the inactivation response after the effect of the climbing fiber burst has worn off sufficiently so that the cell can be fired by parallel fiber input but before the dendritic membrane has returned completely to normal. If the Purkinje cell fires in this interval, this firing is an error signal that signals every active parallel fiber synapse to be weakened.

The amount of weakening of each synapse is proportional to how strongly that synapse is exciting the Purkinje cell at the time of error signal. The effect of this mechanism would be to train the Purkinje cell to pause at the proper times, that is, at climbing fiber burst times. After learning is complete, the Purkinje knows when to pause because it recognizes the mossy-parallel fiber pattern that occurred previously at the same time as the climbing fiber burst. Later, since each parallel fiber active synapse was weakened by the error signal, if the same mossy parallel fiber pattern occurs again, the Purkinje will pause even without the climbing fiber burst. Thus, the Purkinje is forced to perform in a certain way by the climbing fiber teacher. After learning is complete, however, it behaves in that same way, under the same mossy fiber conditions, even in the teacher’s absence.

Note that this mechanism corresponds closely with the Perceptron training algorithm in that (1) if the response cell fires (or tends to fire) when it should not fire, then all synapses coming from active parallel fibers will be decreased or weakened; (2) if the response cell does not fire improperly, no adjustments are made.

It is now possible to consider many climbing fibers, each firing at different rates in some spatial pattern C1, at time t1. This climbing fiber firing pattern will elicit a Purkinje firing pattern C’1. Assume at time t1, the mossy fibers have some firing pattern M1. Each climbing fiber will train its respective Purkinje cell (or cells) to recognize the mossy fiber input pattern M1 that was present when C1 occurred. If during training M1 on the mossy fibers occurs in coincidence with C, on the climbing fibers, after training the occurrence of M1 on the mossy fibers will elicit C’1 from the Purkinje cells whether or not C1 appears on the climbing fibers. It can then be said that climbing fiber pattern C1 has been imprinted, or stored, on mossy fiber pattern M1. In the same way a second climbing fiber firing pattern C2 can be stored on another mossy fiber pattern M1 and so on.

An important feature of this hypothesis is that the C’1 patterns coming out of the Purkinje cells are not necessarily binary patterns; C’1 represents the relative rates of firing of all the Purkinje cells. Thus relative patterns are stored and relative patterns are recalled.


D. Variable Inhibitory Synapses

Since variation of parallel fiber Purkinje cell synapses is sufficient to cause patterns to be stored in the cerebellum, we might well suggest [11] that no further mechanism of variable inhibitory synapses is necessary. However, there are good reasons to further hypothesize variable inhibitory synapses.

First, if only the excitatory inputs to a cell are caused to decrease, while the inhibitory inputs are held fixed, eventually the cell fails to fire in response to any input pattern. Second, a pattern -recognition device based on only excitatory weight adjustment has inherently low capacity. Marr [11] estimates that a Purkinje cell capable of only excitatory synaptic adjustment has the capacity to make about 200 mossy fiber pattern dichotomies. However, a Perceptron with both positive and negative weight adjustments has the capacity to make about twice as many dichotomies as there are adjustable weights [4]. Thus, if both excitatory and inhibitory synapse adjustment is possible in the cerebellum, each Purkinje cell would have the capacity to make on the order of 200,000 pattern dichotomies. The adjustment of inhibitory weights thus results in a thousand-fold increase in recognition capacity. Third, any pattern -recognition system capable of varying weights in only one direction is necessarily very slow to learn. An example of the learning difficulties encountered by such a system can be seen by referring to Fig. 4. Assume a pattern M causes only association cell A, to fire. This will affect the response cell R1 through weight W1,2.

Four possible situations can exist when pattern M is first presented:

case I M desired in class 1, R1 = 1;

case 2 M desired in class 1, R1 = 0;

case 3 M desired in class 0, R1 = 1;

case 4 M desired in class 0, R1 = 0.

In case 1 and case 4, M is already in the proper class and no adjustment of weights is necessary. In case 3, the weight W1,2 needs to be decreased so as to force the R1 cell below threshold. In case 2, the weight W1,2 needs to be made more positive so as to raise the RI cell above threshold. If such a positive adjustment is not allowed, another means is available. All the weights to R1 except can be decreased, and the threshold of the R1 cell somehow decreased accordingly. This would have the same result as an increase in W1,2. As a mechanism likely to occur in the cerebellum, however, this scheme has several serious difficulties:

  1. Decreasing all weights except one is cumbersome. It is inconceivable to decrease 199,999 weights in order to increase 1.
  2. It is very difficult to suggest a mechanism with such abilities. The mechanism must, in case 3, decrease the synaptic strength of all active parallel fibers, but in case 2, decrease the synaptic strength of all except the active parallel fibers.
  3. If the threshold of the R1 cell is to be lowered along with all the weights except W1,2, this in itself implies that variable inhibitory synapses are necessary in the cerebellum.
  4. If basket and stellate cells have no variable synapses, it is hard to imagine why they are so numerous, or what is the purpose of their peculiar axon distributions. If these inhibitory interneurons merely serve the purpose of general threshold regulators, it would seem that a few cells should do as well. For example, only a few Golgi cells are necessary to set general threshold levels for an enormous number of granule cells. Yet there are about twice as many basket and stellate by cells as Purkinje cells. Surely these cells have a more sophisticated function than general threshold regulation. Variable inhibitory synapses could explain why basket and stellate cells are so numerous.


E. Site of Inhibitory Synaptic Change

Inhibitory synaptic strength variation could occur at two sites. One site is where basket and stellate b cells synapse on the Purkinje cells. This i s perhaps an obvious first candidate. However, the amount of convergence i s small. Certainly less than 1000 different basket and stellate b cells synapse on each Purkinje. The actual figure is probably less than 100. This is a far cry from the parallel fiber convergence of about 200,000 variable excitatory synapses. The addition of 100 variable inhibitory synapses would seem to add little to the recognition capacity of the Purkinje cell.

The second site where inhibitory inputs to Purkinje cells might be varied is at the parallel fiber synapses on basket and stellate b dendrites. A decrease in strength of the excitatory parallel fiber synapses on basket and stellate b cells results in a decrease in inhibitory input to the related Purkinje cells. The basket and stellate b dendritic trees are sparser than those of Purkinje cells, but they do contact perhaps 5% of the parallel fibers coursing through them. When account is taken of the fact that about 100 of these cells then synapse on a single Purkinje, the result is a convergence of variable inhibitory inputs to the Purkinje cell of the same order of magnitude as that of variable excitatory inputs. Thus the Purkinje recognition capacity is on the order of 200,000 patterns rather than 200 patterns as suggested by Marr [11].

It is interesting that lower forms, such as frogs, have no basket cells. A cerebellar Perceptron with no variable inhibitory weights is certainly possible. Its only shortcoming would be a very limited capacity for discrimination.

Several other facts support the hypothesis that the parallel fiber synapses on basket and stellate b cells are the sites of variable inhibitory weights. First, the basket and stellate cells contact the parallel fibers with dendritic spines similar to those of the Purkinje cells. Second, each climbing fiber, in addition to synapsing strongly on a single Purkinje cell, also sends collaterals, which synapse on the soma of adjacent basket and stellate cells. Since the climbing fiber input is assumed to be intimately related with varying parallel fiber synapses on Purkinje cells, it is perhaps reasonable to suggest that the same climbing fiber may also vary parallel fiber synapses on basket and stellate cells. The mechanism of variation could be identical or at least very similar. In other words it is argued that on every cell contacted by an active climbing fiber, each active parallel fiber synapse is weakened by the same mechanism regardless of whether the cell is Purkinje, basket, or stellate b. This hypothesis has the elegant feature that a single event causes a change in both excitatory and inhibitory influences. The fact that climbing fibers do not contact dendrites of basket and stellate cells may be accounted for by the fact that their dendritic arborisation is less extensive than that of Purkinje cells.

In order to satisfy the Perceptron training conditions that excitatory and inhibitory changes be equal on the average, it is merely necessary to assume that the size of the decrement in each synapse is such that the expected value of the excitatory change be equal to the expected value of the inhibitory change.


F. Pattern Storage on Excitatory and Inhibitory Synapses

The effect in terms of pattern storage of this scheme can be seen by referring to Fig. 9. Assume the climbing fiber firing pattern cf1 = 1, cf2 = 0 occurs. In this case P1 pauses and P2 is released from inhibitions by B, pausing. Further, assume a mossy fiber pattern occurs such that Pf1 = 1, Pf2 = 1. The coincidence of these two patterns will tend to decrease weights WP1 and WB1 but leave unchanged WP2 and WB2. At a later time when the climbing fibers are silent, cf1 = cf2 = 0; if the same mossy fiber pattern recurs such that Pf1 = Pf2 = 1, P1 will pause because of decreased WP1 and P2 will be disinhibited because of decreased WB1. Thus, the original climbing fiber response, P1 pause, P2 disinhibited, can be recalled by the mossy fiber pattern, which causes Pf1 = Pf2 = 1. It can thus be said that the climbing fiber pattern is imprinted on the mossy fiber pattern.

Note that all the adjustment of the variable synapses takes place in the immediate vicinity of the Purkinje cell excited by an active climbing fiber, even though the disinhibitory effects are felt by Purkinje cells far removed in the transverse direction.

In order to satisfy the requirement that the expected value of the change in excitation equals the expected value of the change in inhibition it i s necessary to assume some things concerning the relative amount by which WP1 and WB1 are changed. The synapse of Pf1 on P1 occurs with a probability of nearly 1. The synapse of Pf1on B1 occurs with a probability of around 0.05 or less. However, the effects of WB1 are distributed to 30-50 Purkinje cells, whereas the effects of WP1 are confined to one Purkinje cell. In addition, the strength of WB1 is multiplied by a  gain factor governed by the strength of the basket cell synapses on Purkinje cells. Since this is a rather strong synapse, the gain factor is probably greater than 1. Thus in order for the total average decrease in excitation to equal the total average decrease in inhibition, the following equation must be satisfied.

ΔWB1 x PB1(Pf1)  x DB1 x GB1 = ΔWP1 x PP1(Pf1)    (5)


ΔWB1 is the change in WB1,

ΔWP1 the change in WP1,

PB1(Pf1) the probability B1 contacts Pf1,

PP1(Pf1) the probability P1 contacts Pf1,

DB1  the number of Purkinje cells B1 contacts,

GB1 the strength of B1 synapse in Purkinje cells.



James Albus: A Theory of Cerebellar Function

Fig. 9

FIG. 9. Climbing fiber input.  Each climbing fiber contacts a single Purkinje cell and several nearby basket cells or stellate cells, or both. If Pf1 is active when P1 or B1, or both, fire in the critical interval during a cf1 inactivation response, then WP1 or WB1, or both, are altered. This change in synaptic strength can later be read out in the form of Purkinje postsynaptic potentials by firing Pf1 again.


Everything considered, it is likely that ΔWB1 is less than ΔWP1. This judgment seems to be supported by the experimental fact that the effect of a climbing fiber on a basket cell is less strong than on a Purkinje cell [5]. Presumably a smaller climbing fiber effect produces less synaptic weakening.

This cerebellar system now has most of the characteristics of a Perceptron; that is, it corrects errors by adjusting weights positively and negatively; the average total increase equals the average total decrease; the pattern being stored, in coincidence with the pattern on which it is stored, governs which weights are increased and which are decreased; and the adjustment procedure terminates as learning asymptotically approaches completion. In addition, the hypothesized cerebellar system exhibits the capacity to store information concerning the relative firing rates of climbing fiber patterns.


F. Defense of the Synaptic Weakening Argument

The argument synaptic weights are weakened by learning rather than strengthened is counter-intuitive and contrary to most, if not all, theories of synaptic learning that have appeared in the literature. Thus it perhaps should be examined in more detail. There are three main reasons why synaptic weakening rather than strengthening is hypothesized to take place in the cerebellum.

First, the experimental data that are available seem to suggest it. Climbing fiber inputs cause Purkinje cells to pause. If the Purkinje is to learn to pause, parallel fiber excitation must be decreased.

Second, Perceptron theory proves that the most effective training algorithms are error correcting in nature. Thus, firing at erroneous times should reduce the tendency to fire again.

Firing at the proper times requires no adjustment. This algorithm implies weakening of synapses that contribute to erroneous firings. It is possible to conceive an error correcting scheme that would operate by strengthening synapses but the mechanism seems quite unlikely. There are only two possible error conditions:

  • Cell fires when it should not. This condition can be corrected by weakening erroneous excitatory synapses (as suggested) or by strengthening erroneous inhibitory synapses. On the Purkinje cell the excitatory spine synapses seem much more likely candidates for variability than the inhibitory synapses. There are relatively few inhibitory synapses. Learning capacity would be quite low if on the Purkinje the inhibitory synapses rather than the excitatory were the site for variability.
  • Cell does not fire when it should. This condition can be corrected by strengthening erroneous excitatory synapses or by weakening erroneous inhibitory synapses. In this case it is difficult to suggest how the individual synapses know when an error has occurred. The absence of postsynaptic cell firing may be the correct response as far as each synapse knows. An additional piece of information is needed-the information that an error has occurred. It is difficult to imagine how this information is conveyed to synaptic sites in the absence of postsynaptic activity. Thus, if the Purkinje cell learns by error correction, the most probable mechanism is synaptic weakening in the presence of erroneous firing.

The third reason synaptic weakening is hypothesized to occur in the cerebellum is that there are serious stability problems of learned responses under conditions of overlearning if synaptic activity causes synaptic facilitation. Consider Fig. 10: C1 and C2 are climbing fibers synapsing with synapses of fixed strength on Purkinje cells P1 and P2. A parallel fiber pf synapses on P1 and P2with variable-strength synapses of weights W1 and W2. If it is now assumed that the synaptic weights  are strengthened by coincidence of pre- and postsynaptic activity, it is possible to write

ΔiW1 = fP1  . fpf  at t = i                                (6)


ΔiW1 is the increase in W1 at time t = i,

fP1  the frequency of spikes on P1, and

fpf  the frequency of spikes on fp.

Let W1 originally equal 0W1. As learning takes place, the following situation obtains. At

t = 0, fP1 = kfC1 + 0W1 fpf;

t = 1, fP1 = kfC1 + (0W1 + Δ0W1)fpf;

t = 2, fP1 = kfC1 + (0W1 + Δ0W1 + Δ1W1)fpf;

t = 3, fP1 = kfC1 + (0W1 + Δ0W1 + Δ1W1 + Δ2W1)fpf;

.                               .

.                               .

.                               .

James Albus: A Theory of Cerebellar Function

Fig. 10

FIG. 10. Two Purkinje cells contacting the same parallel fiber.

We can readily see that the weight W1 continuously increases at each learning interval. In fact, since ΔiW1 is the product of fP1·fpf, and since fP1 increases during each learning interval, Δ0W1 < Δ1W1 < Δ2W1  < ··· . Therefore W1 grows at an exponential rate, and of course so does fP1. Certainly W1 must eventually saturate. Now suppose that during the same learning sequence a spike train also appears on C2 at half the frequency of that on C1:

fC2 = ½ fC1.

Until W1 saturates,

W1 ≈ 2 W2

Eventually, however,

W1 = W2 = saturation value.

Thus, after a sufficiently long period, all parallel fiber synapses will eventually become saturated. The  very active ones will saturate first, but over a long time virtually every synapse will saturate. Synaptic facilitation suggests learning is exponential. Synaptic weakening suggests learning is asymptotic.

This problem could possibly be averted by proposing some sort of decay rate for all synaptic strengths. Thus synaptic strengths would not remain saturated. However, such a mechanism would need to be very exotic to prevent continued learning from degrading performance and, at the same time, to preserve learned patterns over long time periods. It is common experience that memories of motor skills are preserved rather well over periods of many years. It is also common experience that repeated practice of motor skills leads to improved motor performance, even when the practice sessions are intensive and of short duration (on the order of minutes or hours). It is difficult to conceive of a decay system that could preserve memory over periods of years and at the same time prevent saturation over periods of minutes.

It is an obvious fact that continued training in motor skills improves performance. Extended practice improves dexterity and the ability to make fine discriminations and subtle movements. This fact strongly indicates that learning has no appreciable tendency to saturate with overlearning. Rather, learning appears to asymptotically approach some ideal value. This asymptotic property of learning implies that the amount of change that takes place in the nervous system is proportional to the difference between actual performance and desired performance. A difference function in turn implies error correction, which requires a decrease in excitation upon conditions of incorrect firings.

This argument is not meant to suggest that synaptic facilitation does not occur anywhere in the nervous system. In fact the stellate a cells will shortly be conjectured to undergo synaptic facilitation. Synaptic facilitation very probably plays an important role in many places in the nervous system. However, in situations where saturation would degrade performance, and particularly in the cerebellar cortex, where other evidence points to weakening, synaptic weakening seems very likely to be the principle learning mechanism. It might be argued that the saturation argument holds equally well in the opposite sense, that is, that all synapses would eventually be reduced to zero. One answer to this is that the synaptic strengths tend toward zero asymptotically. Therefore the weaker a synapse becomes, the less is its contribution to any erroneous firings and the less it is weakened by any correction. Another answer is that new variable spiny synapses may be hypothesized to spontaneously and randomly grow and mature into active effection synapses. The result of this would not be to destroy learning but to mask it over a period of time by background noise. To clarify this point, no synapse that has undergone any decrementing is hypothesized to grow back in strength. However, new synapses are hypothesized to grow to full size and then mature into an effective state. From this point they are then decremented, perhaps all the way to zero. There may be some evidence for such a phenomenon in the visual cortex of the mouse. Ruiz-Marcos and Valverde [16] note that the density of spines on pyramidal cells in mouse visual cortex rises to a maximum shortly after the mouse opens its eyes. From that time the density of spines decreases asymptotically to a smaller value. Light deprivation considerably reduces the spine density. This might suggest that spines develop randomly under tropic influence of presynaptic nerves and are specifically decremented in the process of learning.


G. Response Speedup via Stellate a Cells

The notion that occurrence of a particular mossy fiber pattern causes a decrease in excitation of Purkinje, basket, and stellate b cells, and that this decrease in excitation causes the proper response of the Purkinje cell, raises a question of response speed. The decrease in excitation resulting from a decay of synaptic transmitter substance is not generally considered to occur as quickly as a build-up of excitation resulting from release of transmitter substance. Thus a system that operates solely on decay of excitation may lack the speed necessary for quick movements. It will now be suggested that stellate a cells are ideally situated for providing a speedup mechanism.

The main structural difference between stellate a and stellate b cells is in their axon arborisation. The stellate a cells send synaptic contacts to Purkinje cells in their immediate vicinity and to adjacent Purkinje cells in the longitudinal direction. Thus it is quite likely for a parallel fiber to excite a particular Purkinje cell and to inhibit the same Purkinje via a stellate a cell. Climbing fiber collaterals also contact stellate a cells. Thus, following the same reasoning used for Purkinje, basket, and stellate b cells, it is not unreasonable to assume that coincidence between climbing fiber and parallel fiber activity effects a change in synaptic strength of stellate a cells also. It would seem, however, that in order to perform a useful function, the synaptic change in this case should be a strengthening rather than a weakening. It will be conjectured that coincidence of a climbing fiber spike with parallel fiber activity on a stellate a cell will cause an increase in the synaptic strength of the parallel fiber-stellate a cell synapse. Thus the stellate a synapses are conjectured to change in the opposite direction from all the other variable synapses under the same coincidence conditions.

Consider parallel fiber pattern M1 to be imprinted positively on stellate a cells, but negatively on an immediately adjacent Purkinje cell. Occurrence of pattern M1 causes the Purkinje cell to receive less excitation. Pattern M1 causes the stellate a cell to receive more excitation, and hence actively inhibit the Purkinje. The result would be an increase in speed of the Purkinje cell response.

The stellate a cell variable synapses would of course be subject to the saturation problem discussed previously. However, if the stellate a contribution to the Purkinje input were small compared to the other inputs from basket and stellate b cells and parallel fibers, the saturation effect would be small in the steady state. The stellate a input would be significant only in the first few milliseconds following a transient. In this interval the stellate a cell would get the Purkinje response going in the proper direction. Later the other inputs to the Purkinje would predominate to set the proper final value. The same effect would obtain if the stellate a response were not necessarily small but merely of short duration.

Note that in the arguments concerning stellate a cells the word conjecture was used rather than hypothesis. Very little is known concerning the behavior of stellate a cells and any confident prediction concerning their function is certainly premature. Stellate a cells may have nothing at all to do with memory or variable synapses. In the next section it is suggested that perhaps stellate a cells may have rather to do with attention mechanisms.


H. The Function of Recurrent Purkinje Collaterals

The fact that the cerebellum is spontaneously active allows it to achieve a high degree of sensitivity and precision. A spontaneously active system is essentially linear, at least for small inputs. Thus any small input will produce an output whose size will depend on both the size of the input and the gain of the system. I f the system is not spontaneously active, small signals do not have any effect on the output until they exceed a certain threshold. This is usually not a desirable trait for a feedback control system.

As was discussed earlier, the mossy fiber ? granule cell ? Golgi cell interconnection network appears to work so as to maintain granule cell activity at some relatively constant level. In addition, the Purkinje cell axons put out recurrent collaterals that are known to contact Golgi cells, basket cells, and other Purkinje cells. These Purkinje recurrent collaterals send inhibitory impulses over a wide-ranging area, even into adjacent folia. The Purkinje recurrent collateral synapses on other Purkinje cells have the effect of maintaining the average Purkinje cell activity fixed at a relatively constant level over the entire cortex. If the average Purkinje activity rises too high, the inhibitory effect of the recurrent collaterals drives it back down. If Purkinje cell activity drops too low, the decrease in inhibition will let it rise again. Thus a relatively constant spontaneous discharge rate will be maintained despite rather large variations in cell conditions, such as nutrition or fatigue.

Another effect of the recurrent collateral inhibition on Purkinje cells is the contrast enhancement effect of lateral inhibition. Thus any local increase in activity will be accompanied by a surrounding field of depressed activity. There also appears to be some specific contralateral inhibition produced by Purkinje recurrent collaterals.

The existence of Purkinje recurrent collateral synapses on Golgi cells is very interesting. The effect is that of both positive and negative feedback since the affected parallel fibers both excite the Purkinje cells directly and inhibit them via basket and stellate cells. The total effect may be that when a general area of the cerebellar cortex is actively engaged in processing information, the Golgi cells limiting the input to that area are suppressed, thus allowing input to that area more free access. This would then constitute a crude form of attention mechanism. Any area actively engaged in processing information would be given priority over other areas that are inactive at the time. This of course is quite speculative, but a rather pregnant possibility.

The function of Purkinje recurrent collateral synapses with basket cells is not clear. The effect is certainly that of positive feedback. Positive feedback is commonly used in electronic circuitry to produce one or the other of two effects: either oscillatory behaviour or bistable switching behaviour. There is no evidence of any oscillatory effects in the cerebellum that are likely to be mediated by Purkinje recurrent collaterals. There is, however, a curious bistable effect in the firing rate of Purkinje cells that may be caused by the Purkinje recurrent collateral interaction with the various interneurons. Although a Purkinje cell sometimes is spontaneously active, at other times the same cell is completely quiet except for climbing fiber responses. This rather implies that Purkinje cells have at least two stable states, one spontaneously active, the other completely silent. The transition between states seems to be somewhat correlated with climbing fiber activity [3]. We might speculate that certain parts of the cerebellum are switched on by an attention mechanism when they are needed, and switched off again when they are not in use. The Purkinje collateral – basket cell or Golgi cell circuit may provide the positive feedback necessary to switch between states. Specific climbing fiber patterns could provide the trigger signal to initiate the switching. Climbing fiber inputs to Golgi cells may be the means by which climbing fibers trigger Purkinje cells into an active state. Climbing fiber inputs to stellate a (or basket and stellate b) cells might trigger Purkinje cells into a quiet state. Although these notions are admittedly tenuous, such activity certainly is characteristic of control systems far less complex than the brain. it should not be surprising if similar behavior is found in the brain.


I. Effects of the Intracerebellar Nuclei

It must be emphasized that details of the microstructure in the intracerebellar nuclei are much less well defined than in the cerebellar cortex. Even less is known about detailed interactions and pathways outside the cerebellum altogether. However, it is felt that the following type of argument must eventually be made before the function of the cerebellum can be said to be understood.

James Albus: A Theory of Cerebellar Function

Fig. 11

FIG. 11. Interaction between the cerebellar cortex and nuclear cells. Mossy fibers act on Purkinje cells, which act as modified Perceptron response cells. Mossy fibers, climbing fibers, and Purkinje axons all interact in nuclear cells.

Nuclear cells in the cerebellar and Deiters nuclei are contacted by collaterals from mossy fibers, collaterals from climbing fibers, and Purkinje axons. Thus circuits of the type shown in Fig. 11 probably exist.

The frequency of firing of the Purkinje cell is of the form

fP = fckcP – Xi(fm1,fm2,fm3, . . .,fmN) +f0P                        (7)


fP is firing rate of Purkinje cell,

fc firing rate of climbing fiber,

fcP is the climbing fiber input-Purkinje cell output transfer function,

Xi(fm1, …,fmN) is the input to the Purkinje of a learned pattern Mi of mossy fiber inputs (the sign is negative since the Purkinje learns to pause), and,

f0P is steady -state rate of Purkinje.

The firing rate of the nuclear cell, which is also spontaneously active, is given by

fN = fckcN – fPkPN + fm1kmN + f0N                      (8)

where kP is the spontaneous firing rate of the nuclear cell and kcN is the climbing fiber input-nuclear cell output transfer function. Substitution of (7) in (8) gives

fN = fc(kcN – kP) + fmkmN + Xi(fm1, …,fmN) + f0            (9)

where kP, is the combined effect of kPN and kcP and f0 is the combined effect of f0p and f0N.

Several interesting observations can be made from Eq. (9). First, the output of the nuclear cell is directly affected by mossy fiber input. Thus the nuclear cell may be part of a reflex arc. Second, the strength of this reflex arc is modulated by patterns arriving on the mossy fibers corresponding to patterns previously stored by climbing fibers. Third, the effect of climbing fiber activity fc on the nuclear cell depends on the factor (kcN – kp); kP is a negative quantity since kPN, the effect of the Purkinje on the nuclear cell, is inhibitory, and kcP, the effect of the climbing fiber on the Purkinje, is the inactivation response. Thus the factor (kcN – kP) is always positive.

Since the climbing fiber pattern is stored in the Xi pattern, the effect of the mossy fiber Xi pattern associated with the climbing fiber pattern reinforces the climbing fiber’s effect on the nuclear cell. Thus, as learning takes place, less and less input from the climbing fiber is necessary to produce the same amount of nuclear cell response. Fourth, the effect of an input on mossy fibers through the function Xi(fm1, …,fmN) is a positive response. The Xi function in (7) decreases the output of the Purkinje cell and hence in (9) increases the output of the nuclear cell.



It is reasonably certain that patterns of activity on mossy fibers represent to the cerebellum the position, velocity, tension, and so on of the muscles, tendons, and joints. This is feedback information that is required to control precise or sequential movements, or both. This information must modulate signals to the muscles to achieve precise movement under varying load conditions. This feedback information must also be able to generate the next command in a sequence of muscle commands in order to produce sequential motor activity at a subconscious level. The functioning of the cerebellum, as hypothesized in this article, seems rather well suited for either or both of these behaviors.

Assume, for example, that the red nucleus sends a command C1 through the inferior olive and thence via climbing fibers through Purkinje cells and nuclear cells to the muscles. At this time the muscles and joints in their resting state are sending pattern M1 to the cerebellum via mossy fibers. Thus C1 is imprinted on M1. Now when C1 reaches the muscles, they respond by moving to a new position. This generates a new mossy fiber pattern M2. By this time a second command C2 is sent from the red nucleus. Command C1 will be imprinted on M2. In a similar manner C3 is imprinted on M3, C4 on M4, and so on. This process may be continued for a lengthy sequence of motor commands C1C2C3… and resulting body positions M1M2M3… . Upon repetition of the sequence of motor commands C1C2C3…, the signals from the red nucleus will be reinforced at the nuclear cells by output from Purkinje cells responding to feedback mossy fiber patterns M1M2M3… . Upon each repetition more and more of the muscle control can be assumed by the output of the Purkinje cells, and less attention is required by higher motor centers.

Once learning is complete, the sequence of motor commands C1C2C3C4 can be elicited entirely from the Purkinje cells via the mossy fiber input patterns M1M2M3M4… . Little input is required from higher centers except perhaps to initiate or terminate the sequence.

The theory so far has no means of initiating or terminating such a sequence. it is possible that this operation takes place in the intra-cerebellar nuclei or outside the cerebellum altogether. Lack of detailed anatomical and physiological data makes it difficult to conjecture how this function is accomplished. However, it is perhaps not unreasonable to speculate that the Schiebel collaterals of climbing fibers to Golgi cells or stellate a cells, or to both, may be related to initiation or termination of sequence generation in the cerebellar cortex. The Golgi cells control the mossy fiber input pathway, which is a vital link in sequence generation. Excitation of Golgi cells via Schiebel collaterals could cut off mossy fiber input to the cerebellum and terminate a sequence. Inhibition of Golgi cells by Purkinje recurrent collaterals, on the other hand, would lower Golgi inhibition, possibly in response to specific patterns. This might initiate sequences upon certain key commands. Golgi cells may also have variable synapses, since they possess both spine synaptic contacts with parallel fibers and input from climbing fibers. However, more data are necessary before confident predictions are possible on these points.

The circuit described can also function as a modulator of conscious motor activity on climbing fibers. Assume that a sequence of motor commands from higher centers C1C2C3… had been imprinted on a series of mossy fiber patterns M1M2M3… as before. If the muscles upon receipt of conscious command C1 were to encounter greater than usual resistance, this would delay or prevent the appearance of M2 at the cerebellum,  and instead a pattern M’2 would appear, signalling the existence of extraordinary resistance to motion. The pattern M’2 would modify pattern C1 in a manner different from M2, perhaps calling for additional force or some other modification. What M’2 produces is governed by what previously had been imprinted on M’2. If previously C’2 an additional force command, had been imprinted on M’2, the C’2 would be substituted for C2, automatically when the M’2 feedback signal was received instead of the usual M2. By this means a sequence of conscious commands can be modified at the reflex level by cerebellar activity. This perhaps is the means by which motor activity such as running or skating can be under conscious control in a general sense but under reflex feedback control at the individual muscle level.

The implication, then, is that climbing fibers carry from higher centers control patterns that are to be stored. In this form the cerebellar memory becomes a form of conditioned reflex. If the climbing fibers are cut, we would expect deficiencies primarily in conscious motor control and further conditioning. This may in some measure account for data of Mettler [12], which noted a lack of obvious severe effects when climbing fibers were cut.

Marr [11] suggests an interesting analogy of the cerebellum as a language translator between data in the cerebrum and command sequences needed by the muscles. The cerebellum thus becomes analogous to a computer compiler that translates source language instructions into machine language instructions for execution by the machine hardware. Following the same analogy, the cerebellum becomes a subroutine library in which subroutines can be stored from above and cycled from below.




The theory of cerebellar function set forth in this article makes possible a number of predictions that are subject to experimental verification:

  1. Parallel fibers do not fire in coordinated beams in a conscious active animal, but rather in a widely scattered, apparently random fashion.
  2. One percent or less parallel fibers are active simultaneously, and this activity level is quite constant.
  3. Parallel fiber synapses with dendritic spines on Purkinje cells, basket cells, and stellate cells are modifiable synapses.
  4. The Purkinje cell response can be conditioned by climbing fiber inputs. Climbing fiber spikes are the unconditioned stimulus (US). Mossy fiber activity patterns are the conditioned stimulus (CS). The climbing fiber inactivation response is the unconditioned response (UR).
  5. The conditioning mechanism is a three-way coincidence between the inactivation response, a cell spike due to parallel fiber excitation, and parallel fiber synaptic activity.
  6. Parallel fiber synapses on Purkinje cells, basket cells, and stellate b cells are weakened by incorrectly firing during climbing fiber activity.
  7. Climbing fibers are essential for acquisition of certain types of motor skills, and for cerebellar feedback control of conscious motor activity. They are less necessary for conditioned reflex behaviour.
  8. Some of the mechanisms hypothesized in the cerebellum will almost certainly also occur in other parts of the brain. The expansion recoding system; the imprinting of patterns from specific fiber inputs onto synapses of nonspecific fibers; the use of laterally coursing inhibitory interneurons to achieve both positive and negative synaptic weight adjustment; the weakening of synaptic weights during training to achieve convergence; these are all basic principles of data processing likely to occur elsewhere in the nervous system.



The author thanks Mr. Anthony J. Barberra for his valuable criticism and suggestions.



  1. S. Albus, A model of memory in the brain, Cyberneticus (1970) (in press).
  2. S. Cajal, Histologie du systeme nerveux de l’homme et des Vertebres, Tome II. Maloine, Paris, 1911.
  3. D. Bell and R. J. Grimm, Discharge properties of Purkinje cells recorded on single and double microelectrodes, J. Neurophysiol. 32(1969), 1044-1055.
  4. M. Cover, Classification and generalization capabilities of linear threshold units, Rome Air Development center Tech. Documentary Rept. RADC-TDR-64-32(1964).
  5. C. Eccles, M. Ito, and J. Szentagothai, The cerebellum as u neuronal machine. Springer, Berlin, 1967.
  6. Escobar, E. D. Sampedro, and R. S. DOW, Quantitative data on the inferior olivary nucleus in man, cat and vampire bat, J. Comp. Neurol. 132(1968), 397433.
  7. A. Fox, D. E. Hillman, K. A. Sugesmund, and C. R. Dutta, The primate cerebellar cortex: A Golgi and electron microscope study, Progr. Brain Res. 25(1967), 174-225.
  8. Granit and C. G. Phillips, Excitatory and inhibitory processes acting upon individual Purkinje cells of the cerebullum in cats, J. Physiol. (London) 133(1956), 520-547.
  9. H. Hubel and T.N. Wiesel, Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex, J. Physid. (London) 160(1962), 106154.
  10. Jakob, Das Kleinhim, in Handbuch der mikroskopischen Anatomie des Menschen IV/I (W.V. Mollendorf, ed.). Springer, Berlin, 1928.
  11. Marr, A theory of cerebellar cortex, J. Physiol. (London), 202(1969), 437-470.
  12. A. Mettler, (1967), In a discussion following a paper by J. C. Eccles in Neurophysiological basis of normal and abnormal motor activities (M. D. Yahr and D. P. Purpura, eds.), pp. 411-414, Raven Press, N.Y., 1967.
  13. Minsky and S. Papert, Perceptrons: An introduction to computational geometry. MIT Press, Cambridge, Massachusetts, 1969.
  14. J. Nilsson, Learning machines: Foundations of trainable pattern–classifying systems. McGraw -Hill, New York, 1965.
  15. Rosenblatt, Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Spartan Books, Washington, D.C., 1961.
  16. Ruiz-Marcos and F. Valverde, Temporal evolution of the distribution of dendritic spines in the visual cortex of normal and dark-raised mice, Exptl. Brain Res. 8(1969), 284-294.
  17. T. Thach, Jr. Somatosensory receptive fields of single units in cat cerebellar cortex, J.Neurophysiol. 30(1967), 675-696.
  18. T. Thach, Discharge of Purkinje and cerebellar nuclear neurons during rapidly lternating arm movements in the monkey, J. Neurophysiol. 31(1968), 785-797.
  19. T. Thach, Discharge of cerebellar neurons related to two maintained postures and two prompt movements, 11: Purkinje cell output and input, J. Neurophysiol. 33(1970), 537-547.







Posted in Uncategorized | 1 Comment

About 2.0

The topic of posts on this blogsite have become rather narrower than the ‘neuroscience, technology, philosophy’ tagline would suggest. Here, I set out my stall with something a bit more informative than the references to Greek mythology and German literature you will find on the ‘About 1.0’ page.

The site considers various age-old philosophical problems (of consciousness, of free will, of morality, of knowledge, or science) but from a neuroscientifically-oriented standpoint.


A Physicalist Worldview

It takes a ‘physicalist’ stance:

  • There is only physical ‘stuff’ (such as matter).
  • Consequently, there is a gradual transition rather than a sharp distinction between self and non-self (the ‘environment’).

This is in contrast with the ‘traditional’ dualist view:

  • The realms of ‘mind’ and ‘matter’ are separate.
  • Consequently, there is a sharp distinction between the two.

Dualist ideas are as good as obsolete among neuroscientists but are still dominant among the general population and they underpin religious beliefs.

I have made an analogy in some posts, between:

  • Dualism: an old house with subsidence, and
  • Scientific physicalism: a brand new house, but still under construction.

It is becoming increasingly difficult for religiously-inclined people to ignore the cracks in the walls of the old house. Scientifically-inclined people feel superior with their new building but they generally do not recognise that their home is incomplete. The aim here is to look at how the new house might look when completed, so we can then judge if it is better than what came before – morally as well as scientifically.


A Simple Theory of the Brain

The ‘latest’ science is used to inform a ‘latest’ philosophy – a suggestion of what future generations might accept as normal. It is often the case that ‘better’ explanations are unsatisfactory to those who have grown up with a different worldview but are accepted almost unquestioningly by their grandchildren who have grown up with that new explanation established.

That ‘latest’ science is neuroscience (currently fashionable). At the heart of what I present is a model of the brain (as formulated by others) variously called the ‘Bayesian Brain’, ‘Predictive Brain’ or Karl Friston’s ‘Variational Free Energy’ (the term that I generally mention) and ‘active inference’. I frequently refer to it by the phrase ‘hierarchy of predictors’ as I think this is a more descriptive, more accessible term.

There is no presumption that this model of the brain is ‘correct’ (as it might be viewed by our grandchildren). It is a grossly simple explanation for the most complex one-and-a-bit-kilograms you will find anywhere in the universe. But it is hoped that it provides a better model of how the brain works than any established model and is the most appropriate non-academic one for the purposes here.


Biology, Physics and Philosophy

For me growing up, physics was full of ‘crunchy’ big ideas whereas biology (wherein neuroscience lies) was the soggy accumulation of little facts:

  • of meticulous drawings of bats,
  • of naming the parts of a bat,
  • of cataloguing the 1,240 species of bats and
  • of estimating the number of bats per square mile.

Even biology’s big idea – evolution – was soggy, providing qualitative post-hoc explanations in contrast to physics’s quantifiable predictions.

Ultimately, there are two philosophical questions:

  1. ‘Why is there something rather than nothing?’, and
  2. ‘Why are we conscious, so as to be able to perceive that ‘something’ and to be able to ask the above question?’

Or, alternatively:

Physics promised answers to the former; biology ignored the latter.



But there have been significant developments in neuroscience since my formal education ended, not least in the ability to visualize what is going on. Coming late to the biology party, I was astounded by exquisitely crunchy mechanical behaviour in microbiology, such as in the machinery of the synaptic vesicle (see below).

This ‘crunchy’ side of biology has not been, and maybe still isn’t, conveyed to the general public. Biology is more precise and more appealing for physics-y type people after all.

And neuroscience now promises to provide some sort of answer to big questions, like ‘what is it like to be a bat?’.

Sorry Sheldon, physics is fuddy-duddy. Neuroscience is where it’s happening.


Amy Farrah-Fowler (neuroscientist) and Sheldon Cooper (physicist)



Simple and Un-rigorous

Philosophers pride themselves on their rigour. Philosophy has been described as ‘rigorous but not technical’ in contrast to science being ‘both rigorous and technical’.

But this blogsite might be described ‘technical but not rigorous’.

It is speculative. It relies on immature ‘pre-science’. It aims at simplicity. It is reductionist. It aims to be simple enough for an intelligent layman to have a basic understanding for it then to be a springboard to detail elsewhere. It is unconstrained by the shackles of rigour in academia. True, I occasionally cite academic papers and books, but I generally don’t want to clutter things up with justification. If you seek justification, just google.

And I try to avoid ‘neural correlates’. I try to avoid citing scientific associations between some phenomenological experience (such as empathy) and something physically observable within the brain (such as an increase in activity in the Anterior Cingulate Cortex, as observed from changes in blood oxygenation  in functional MRI scans). Such justifications make people more like to believe neuroscientific propositions -but they are clutter getting in the way of the bigger picture.

The approach is systemizing , trying to assemble ideas (typically others’ ideas) together to form that bigger picture. Again, the assemblings are not rigorous. But hopefully some may prove interesting or ring true.


Shades of Grey

Dualism obviously creates a sharp distinction between body and soul. But this also creates other sharp distinctions as a result. Between human and animal for example. And between responsibility and not. These are crisp black and white distinctions.

But with physicalism, dichotomies are presented just for simplicity of explanation. Continua (shades of grey) are there if we want to see them, particularly if it helps understanding. Barriers can be removed.


Between human and animal



And finally…

There might be some reason why I would want to distance myself this blog. I might not want to have it associated with my professional life. I might think it is embarrassingly amateurish but that there is some merit sharing it. I might think that my writing style is terrible (I certainly don’t pay that much attention to it).

But the blog is about ideas and it shouldn’t matter.

The site is anonymous. I just prefer it that way.



Photo credit: Patrick Bouquet via Also available in colour.

Posted in Uncategorized | 2 Comments

Guilt and Shame

What constrains people’s behaviour when no one else is looking?

Both guilt and shame are feelings resulting from oneself having committed a bad act. But:

  • Shame arises from others knowing that one has committed that bad act, whereas
  • Guilt arises from internally knowing that one has committed that bad act.

Their opposites are:

  • to have high esteem – a good reputation, and
  • to have high self-esteem.


Although shame and guilt exist in all societies to some degree there is a stereotypical idea, originating from E. R. Dodds, that:

  • Oriental societies are ‘shame societies’ in which social order is maintained primarily through shame, and
  • Western societies are ‘guilt societies’ in which social order is maintained primarily through guilt.

But shame versus guilt discussions are muddied by there being different understandings of the difference between the two. I think this can largely be resolved by thinking of four categories rather than two. In my terminology, this gets described as one type of guilt and 3 types of shame:

  1. Public shame’: the painful feeling arising from others having observed the improper behaviour. (Embarrassment is the much weaker cousin of public)
  2. ‘Ultimate shame’: the painful feeling of arising from an all-seeing God having observed the improper behaviour.
  3. Self shame’: the painful feeling of a negative evaluation of oneself. The focus is on the defectiveness of the actor (the self).
  4. ‘True Guilt’: the painful feeling resulting from a belief that one has done something wrong. The focus is on a defectiveness of the act.


  • For some, the demarcation between ‘guilt’ and ‘shame’ separates 1 from 2, 3 and 4: it is that shame derives from other people being aware of the misdemeanour.
  • For some, the demarcation between ‘guilt’ and ‘shame’ separates 1 and 2 from 3 and 4: it is that guilt derives from oneself recognising that one has done wrong.
  • For some, the demarcation between ‘guilt’ and ‘shame’ separates 1, 2 and 3 from 4: it is the distinction between the actor and the act; the distinction between the guilty “I did something bad” and the shameful “I am bad”.

Regarding the extra shades of shame, the most significant demarcation between ‘guilt’ and ‘shame’ is that separating 1 from 2, 3 and 4. With this demarcation, ‘ultimate shame’ and ‘self-shame’ are referred to as ‘guilt’. This is how confusion arises.

Shame versus Guilt around the world

Morality and Self-Regulation

As has been proposed previously, the purpose of morality is to balance the wants of oneself against the sometimes conflicting wants of others for the general mutual benefit of the many individuals in a society. It is a benign means of social control.

Shame is the feeling that arises from other people knowing about one’s misdemeanours resulting in damage to one’s reputation. Improving reputations benefits individuals and the wider society, leading eventually to a culture of the presumption of trust.

But, in a pure shame culture, it is still OK to do something wrong:

as long as no one knows you have done it!

credit: Scott Adams

Ensuring proper moral conduct by having something always watching

…because this will not damage one’s reputation. The basic rule is:

Don’t do bad, or don’t get caught!

In contrast, guilt should promote better moral cooperative behaviour in that it makes people behave well

even when there is no one else watching.

The basic rule is:

Don’t do bad!

which is more obviously aligned to what we understand about morality. A guilt culture should accelerate the presumption of trust and attain a higher level of trust than a shame culture. It is like empathy in that it is not essential for a moral society, but it helps.

Guilt aligns the values of the self with those of society:

  • shame arises from a violation of cultural or social values, while
  • guilty feelings arise from violations of one’s internal values.


  • shame involves the feeling of disgust of others towards oneself, whereas
  • guilt involves the feeling of disgust of oneself towards oneself.

In short, as a way of maintaining social order,

  • self-regulation of individuals is preferable to external regulation;
  • that is: guilt is preferable to shame.

Neuro guilt and shame

Catholic Guilt and Protestant Shame

Our reputation with others is not significant to us for all other beings. We are not likely to be concerned about our reputation with one’s neighbour’s dog, for example. We generally only care about how we are seen by other people – and not all people – because there are repercussions for us for our transgressions.

And, for those that believe, there is also a very significant other – an all-seeing God– with very serious repercussions for us for our transgressions. For them, guilt is not known only to that individual. God knows too and He can punish the sinner in the afterlife. This is what I have termed ‘Ultimate Shame’.

If an individual publicly confesses their sins, their anxiety will be reduced even though they then suffer shame. Individuals would obviously prefer to be shamed before as few people as possible and for this knowledge not to spread beyond them. Confession to a single discrete priest manages this. ‘Catholic guilt’ becomes ‘Catholic shame’. Actually, Catholics trade ‘Ultimate Shame’ (supposed ‘guilt’) for something halfway between ‘Ultimate Shame’ and ‘Public Shame’ – a ‘Limited Shame’. The individual has acknowledged their wrong-doing, been forced to reflect upon it and compare it against the values of wider society. In recognising their wrong-doing, they have demonstrated that it was the act that was bad and not the actor.

Southern Europe is said to have a shame culture whereas their Northern cousins have more of a guilt culture. Southern Europeans are predominantly Catholic whereas Northern Europeans are predominantly Protestant. They trade their guilt for shame but the Protestants are stuck with guilt. In either case, even when there are no human witnesses, acts are not entirely private; it is still generally shame rather than guilt that is involved.

And shame does not require punishment. For example, I suspect that this is the case for the majority of those in Northern Europe who state they are Christian on census forms. This majority have no direct outward practice of their religion from one census to the next. For them, they have an un-theologized, un-analysed, un-formalized ‘personal God’ with whom they have a relationship than helps them. There is almost certainly a hope of an afterlife but no pretence of knowing – indeed, no effort applied to knowing more. But there is no punishment codified. Wrong-doings result in shame. Their personal God is an entity that holds them to account – God is an other.

Self Shame

Atheists do not have ‘Ultimate Shame’. A society of only atheists would seem to be a shame culture in which you really can do anything as long as no one finds about it, as there would be no damage to reputation.

The secularization of the West raises concerns from many that there will be a decline in moral standards as a result.

This may be true or it may be false, depending on evidence (something to be looked at in the future). But it is not a given. If individuals did feel bad about it, there could still be what I am calling ‘True Guilt’ – an intrinsic bad feeling about oneself which would motivate people away from ‘bad’ behaviour.

And even this secular guilt can still be a form of shame. We can be shamed before the ‘other within’ with whom we have our internal conversation. I call this ‘Self Shame’.  We can be brought up (conditioned) to have that ‘other within’ questioning us. At times, it can be our conscience. Shame, and the presumed moral standards that arise from ‘others knowing our wrong-doings’, is still possible without an omniscient being.

Act and Actor

But it is also possible that we can be brought up (conditioned) without an ‘other within’ questioner. I started off saying that:

  • Shame and high esteem arise from others knowing, whereas
  • Guilt and high self-esteem arises from a self-

but I have basically categorized everything as a form of shame, except for the absence of shame. Guilt does not feature.

One more distinction between shame and guilt was:

  1. For shame, the focus is on the defectiveness of the actor.
  2. For guilt, the focus is on a defectiveness of the act.

Dualist Deontology and Physicalist Virtue Ethics

As I have frequently contrasted previously:

  • Dualists (of the ‘substance’ type) believe that mind and matter are separate, whereas
  • Physicalists reconcile the two, believing that ‘mind’ (such as it exists) supervenes on the physical matter.

For dualists, mind is pure, untainted and unconstrained by the material and therefore could exist after the destruction of the material body.

  • The religious are almost always dualists,
  • and physicalists are almost always not religious.


  • To a dualist, our bodies might be very different but our ‘minds’ are essentially the same, capable of making right and wrong choices – and being judged (now, or later) equally. And it is the acts that are judged, not the actor. Having recognized a sin, a mind can change and act differently next time. The rightness or wrongness is in the act. This is in line with the ethical positions of Deontology and Consequentialism.
  • But to a physicalist, a bad act is causally a result of a bad actor. It could not have been otherwise. If I sinned – and recognized that I did – then there is something wrong with me – the biological me. Rightness or wrongness is embedded in the actor and the actor cannot easily change. This is in line with the ethical position of Virtue Ethics.

In my terminology, the distinction between guilt and self-shame is that the focus is on the actor in the former and on the act with the latter. Thus:

  • Guilt is associated with dualism and act-based ethical positions.
  • Self-shame is associated with physicalism and virtue ethics.


This was the 18th part of the ‘From Neural Is to Moral Ought’ series. It built upon a predecessor part, ‘Trust’, in which social institutions evolve so that agents (rationally) self-regulate their behaviour. Here I have considered the emotions (bad feelings) of guilt and shame. (It is similarly parenthetical to the series in that it is not ‘neuro’ at all, but the well-worn dualism-versus-physicalism dichotomy is considered again.)

Where I am going with this:

  • In a physicalist worldview with virtue ethics, it is the actor that is bad. Bad acts cannot just be confessed away. This is shame, and shame can be a very destructive inability to change.
  • Guilt comes with heightened anxiety of some form such as repression or self-punishment. But shame can also be very destructive.
  • A physicalist worldview can also reduce personal responsibility – ‘my brain made me do it’.
Posted in Uncategorized | Leave a comment

Mirroring and Mimicry

This is the seventeenth part of the ‘From Neural Is to Moral Ought’ series and follows on:

Here, I look at how mimicry and mirroring the behaviour of others can arise in the ‘hierarchy of predictors’ model of the brain, which leads to us empathizing with them.

68: Mimicry and Contagion

From Observing Others to Acting Ourselves

From previously, we have seen that:

  1. Observing others precedes the observation of self. For example, the recognition of our own hands was built upon the observation of the hands of others. Hence there is a significant association between the two. The lowest levels in the hierarchy of our brains react to the observation of our own hands and those of others in the same way.
  2. We have learnt to integrate sense (the sight of own hand) with movement (of our own hand).
  3. There is therefore an association between the sight of another’s hand and the movement of one’s own hand.
  4. It is only later that we learn to distinguish the observation of oneself from the observation of others (at a higher level of the brain hierarchy).
  5. There is therefore a ‘leak’ as it were from observing others to our own movement. This is not something that can be entirely unlearnt at lower levels and must be corrected at a higher level.

The Rubber Hand Illusion Again

So, the observation of another can cause movement because of ‘mistakes’ at the low levels.

As described previously, low levels react quicker than higher levels. So those ‘mistakes’ made by the low levels are quickly corrected by higher levels. It is better:

  • to have low levels acting fast ‘in case what I see is actually happening to me’ and for the higher levels to then veto with ‘it is not me, after all’


  • to spend ages deciding ‘what I should do’, by which time it is too late and it becomes ‘what I should have done’.

For example, in the case of the ‘rubber hand illusion’, fear is generated by low levels predicting that our hand is about to be smashed. This becomes pain when the rubber hand has been hit, arising from the belief that our hand has been hit. But higher levels quickly break the association between the rubber hand and oneself. The pain is only fleeting. (Considerable effort is needed to fool a mature brain into adopting a rubber hand as its own in the first place – for example, through the stroking of the left hand in addition to the stroking of the rubber hand on the right.)

Sensorimotor Contagion: unconscious mimicry

When low levels are screaming out for attention, sending huge error signals upwards, higher levels will respond. But in some cases, there is not enough of an upward signal to warrant higher-level attention. The higher levels are otherwise engaged and no higher level vetoing gets done. (This lack of response is reminiscent of the ‘boiling frog’ anecdote.)

Hence we get behaviour such as this:

  1. We are talking.
  2. You have your arms folded.
  3. I am not paying attention to them but I sub-consciously notice that your arms are folded.
  4. Because of the ‘leak’ from observing others to own movement, I start moving my arms and continue to do so until there is no dissonance between the sight of your arms folded and the proprioceptive sense of where my arms are.

This is then an example of (what I shall call) ‘sensorimotor contagion’ – a sub-conscious mimicry.

But it can work the other way around. It could equally be:

  1. We are talking.
  2. I have my arms folded.
  3. I am not paying attention to them but I sub-consciously notice that your arms are unfolded.
  4. Because of the ‘leak’ from observing others to own movement, I start moving my arms and continue to do so until there is no dissonance between the sight of your arms unfolded and the proprioceptive sense of where my arms are.

Regardless of who mimics who, there is a tendency towards behaving in a similar way.

Another well-known example of mimicry is ‘yawn contagion’ where one finds it difficult not to yawn if one sees someone else yawning.

69: Emotions

Moral Development

Previously, it has been explained how the ‘hierarchy of predictors’ framework learns – with short-term, higher-level knowledge eventually getting relearnt at a lower level to become:

  • embedded in longer-term memory,
  • quicker, and
  • more instinctive.

The same applies to our moral learning.

Over time, we learn to balance the wants of others against those of ourselves. Conscious deliberation gives way to an automatic intuitive response.

  • When we are young we are dependent on our immediate, caring family and are selfish.
  • Gradually, we learn that it is sometimes better to put the wants of another first in the short term in the expectation that this will pay back in the longer term. This is a one-to-one relationship.
  • Balancing the wants of ourselves with others gradually comes more naturally. We no longer have to consciously deliberate about when and how another will pay the favour back. Give-and-take becomes a (sub-conscious) habit and this means we sometimes prioritize others when there is no pre-determined payback.
  • This then leads to the establishment of a reputation within the immediate social group with which we identify as being most like ourselves. This group can be the extended family, but, because this stage is reached among adolescents, it is commonly a group of teenage friends.
  • In adulthood, our experience of ‘those like us’ expands. We act morally towards strangers in our society. Moral deeds become a currency rather than a good. A favour from A to B does not have to be repaid (traded) by B. B can act well towards others and others will act well towards A. A’s payback is indirect and unquantifiable (as is everyone else’s).

Conscious moral deliberation (the rational) becomes habituated in a lower level and, as it does so it gets extended more generally to a wider and wider group of individuals.

This account of moral development is consistent with that of Lawrence Kohlberg’s.

Hierarchical Levels

In the ‘hierarchy of predictors’ framework, we can categorize the many levels into 5 zones. From top to bottom:

  • 5: Conscious deliberation,
  • 4: Sub-conscious,
  • 3: Emotional: prioritizing actions,
  • 2: Integrating the various senses,
  • 1: Perceiving the various senses.

Note that zones 1, 2, 3 and 4 all operate unconsciously.


Emotions are lumped somewhere in the middle of this hierarchy. They should clearly be below consciousness and are above the low-level sensorimotor levels. (We typically think of this hierarchy being the cerebral cortex, but emotion is also strongly associated with sub-cortical parts of the brain such as the limbic system.) The ‘hierarchy of predictors’ framework is, well, just a framework – a simple skeleton around which to build an understanding.

Emotions motivate. They can produce strong, sophisticated motor action from integrated sense input. Strong emotions will shut off the error signals upwards, making it difficult for the rational to override the actions resulting from those emotions.

And we feel emotions – it has a subjective quality. There is ‘something it is like’ to feel just as there is ‘something it is like’ to see. Anger has a subjective experience just as seeing the colour blue does.

Emotional Contagion

Consider the simplistic progression:

  • There is an emotional association between happiness and smiling.
  • If you are happy, you may smile.
  • If I see you smiling, I may mimic you and smile too.
  • Smiling has an emotional association with happiness.
  • So I am then happy.

But more directly:

  • If I see you smiling, I understand you are happy. (This may initially have been consciously but has become habituated and automatic).
  • My understanding of happiness is associated with my memory of the emotion of happiness.

This is ‘emotional contagion’.

Both the ‘zone 4’ (subconscious memory of emotion) and the ‘zone 2’ (sensorimotor integration) together pull on the ‘zone 3’ emotions.

Connecting to Our Emotions

We have seen how zones 1 (perceiving) and 2 (sensorimotor integration) are associated:

  • Low-level sensation does not differentiate between self and others.
  • This can lead to unconscious mimicry (see above).

And we have seen how zones 5 (the conscious) and 4 (the sub-conscious) are associated:

What happens at one level also happens at a neighbouring level (but generally at a different time).

And we have now seen that zones 2 (sensorimotor integration) and 4 (the sub-conscious) are associated with zone 3 (emotions).

This covers the complete vertical integration of the zones.

Contagious Well-Being

From the ‘sensorimotor contagion’ (above) we end up with mimicry – some combination of you mimicking me and me mimicking you. Regardless, from my perspective, you become like me. This is good because people similar to me generally behave like me and I am then more confident that I can predict their behaviour. There are no surprises and this contributes towards higher personal well-being. Being around people we are familiar makes us feel good. And well-being is an emotion.

Contagion, Autism and Psychopathy

It has been found that ‘yawn contagion’ is highest in those more empathetic and lowest in cases of autism and psychopathy. This is what would be expected:

  • The autistic are less susceptible to contagion because of their difficulty in making the cognitive connection to emotion. This is from zone 3 (emotions) to zone 4 (the sub-conscious) – they are less able to understand your emotion from their observation of you.
  • The psychopathic are less susceptible to contagion because of their difficulty in making the physical connection to emotion. This is from zone 2 (sensorimotor integration) to zone 3 (emotions).

Agency and Contagion

Agency concerns the ownership of senses and action – me or others.

  • If agency takes place in zone 4 (the sub-conscious), then our emotions are triggered both by things happening to ourselves and us seeing them happen to others.
  • If agency takes place in zone 2 (sensorimotor integration), we have no emotional attachment to what happens to others.

Perhaps not surprisingly:

  • The more empathetic have less of a distinction between self and other.
  • The more psychopathic have a stronger ‘sense of self’.

This speculation is an alternative account.

70: Mirror Neurons

No account of how neuroscience affects morality would be complete without referring to ‘mirror neurons’. These have been identified by some as the source of our empathy towards others.

An Overview

A simplistic overview of mirror neurons is as follows:

  • They fire when either I do something or when I see other people do something.
  • These mirror neurons are found in the premotor cortex, the somatosensory cortex and the inferior parietal cortex, and
  • They do not fire when the object is missing (i.e. the action is only pretended) or when the object is present but without the actor, or when the actor is artificial.
  • They are concerned with the goals and intentions of actions.
  • They mirror the actions and intentions (the ‘what’ and the ‘why’) of other people onto ourselves and therefore help us understand them.
  • Hence they are the physical basis of empathy, through which we can understand other people’s intentions.

Criticism no. 1: No Neuron Type

The term ‘mirror neurons’ creates the impression that there are a particular type of neuron which has the behaviour of firing fire when either I do something or when I see other people do something, and that these neurons are different in form from other neurons. This is not true.

Instead, it is better to speak at a higher level of a ‘mirror neuron system’ as part of a larger system.

Criticism no. 2: No Localization

It should not be surprising that there are neurons that fire both when we do something and when we see other people do something. We should expect to find such neurons beyond the premotor, somatosensory and the inferior parietal cortical regions. For example, in parts of the brain concerning low-level senses, there will be neurons that fire both when I see my hand and when I see yours. Such neurons (‘ordinary’ neurons) will be distributed over a wide area of the brain, even if they are not all categorized as ‘mirror neurons’.

And where they are to be found, mirror neurons are not exclusive. Within the 3 cortical regions that are their home, they constitute only about 10% to 20% of the neurons within the 3 cortical regions previously identified (the premotor cortex, the somatosensory cortex and the inferior parietal cortex).

Criticism no. 3: No Neural Correlates

These localization of mirror neurons to those 3 regions has been found from performing functional MRI (fMRI) scans. Tasks performed by someone are correlated against oxygen levels within particular parts of their brain and this infers higher brain activity in these areas.

But these ‘neural correlates’  makes every area of the brain special. Each area is doing the task it has been correlated with.

This is antithetical to all the ideas here which are based around theorizing about how the brain is working rather than just classifying it. And the particular theory here is the ‘hierarchy of predictors’.

Saying that there is a ‘mirror neuron system’ which provides us with an ability to empathize (for example) suggests that this ‘system’ is doing something special – something different from what other circuits of the brain are doing. ‘Special’ areas have a hint of magic about them; their explanations do not explain. A theory provides an understanding of why a particular area performs the function that it is correlated with (we are rather a long way off that when it comes to the brain). Theories are more parsimonious. fMRI evidence can support or falsify a theory but it cannot replace it.

Counter-Criticism no. 1: Neural Correlates

Note however that research has shown that neural correlates can make a statement more ‘true’! Any statement that is made which is supported by a claim (just a claim) about a relationship between it and some fMRI scanning result is more like to be believed by the general populace than if no neuroscientific claim was provided.)

For example…

The areas that these mirror neurons are ‘found’ actually do relate well to what would be expected in generating motor actions from sensory input:

  • The premotor cortex possibly handles the planning of movement,
  • the somatosensory cortex handles our sense of touch around our body (it is the location of ‘Penfield’s Homunculus’), and
  • the inferior parietal cortex is the location of sensory integration.

Counter-Criticism no. 2: No Simulation

Patricia Churchland is sceptical of the claims of mirror neurons, in two ways. The first is that the

‘whole claim that empathy depends on simulation’

has not been established.

Now, simulation is another name for prediction, but one that could only be applied to high-level prediction – conscious deliberation. High-level empathy is ‘cognitive empathy’ and this is not the same as the ‘emotional empathy’ that Churchland is meaning. So in that way,

empathy does not depend on simulation

But the underlying neuroscientific framework to everything here – the ‘hierarchy of predictors’ (or the ‘predictive brain’) is that all levels of the brain are predicting, including emotional levels. In that way:

empathy must depend on prediction

Counter-Criticism no. 3: No Feeling

The second of Patricia Churchland’s criticisms mentioned here is that she scoffs at the idea that we actually feel what others feel when we see them in pain. Recounting seeing another get stung by a wasp, she did not feel any pain in herself corresponding to that that will have been felt by the other. Instead:

 ‘what I did feel was a visceral generalized sense of awfulness’.

But each and every one of us are different. Over 1% of us are Mirror-Touch Synaesthetes who do claim that they actually feel what others feel when we see them in pain. Personally, I concur with the ‘generalized sense of awfulness’ but there is also a fleeting localized feeling (yes, I will call it a feeling) in the corresponding part of me at the start, although it quickly diffuses away.  This is consistent with the ‘low-level fast’ and ‘high-level slow’ processes at work in the ‘rubber hand illusion’ described previously. We feel the pain of others more quickly than we can attribute ownership to it.

But Does it Make Any Difference?

Previously, I have argued that societies of trust can evolve just from the seed of maternal care. But the existence of empathy, arising from mimicry and mirroring, greatly accelerates that development. It motivates the vast majority of us to act in a more cooperative way.

But that doesn’t mean that our moral decisions should be driven by empathy – or any of our emotions for that matter. They might actually get in the way!  They may make us more likely to make irrational (sub-optimal) moral choices! And not all individuals have the same degree of empathy. We need to have to have a morality that works with all types – all different types. But we also need one that practically works for how we are, physically constituted.

Mirroring and mimicry does not determine morality


morality must account for our mirroring and mimicry.

Posted in Uncategorized | Tagged , , , , , , , , , , | 2 Comments