Intelligence and the Brain

Talk originally presented on 8 March 2013.

Notes of an extended version of the talk are given below.



  1. Introduction [extended]
  2. Context: Dualism and Physicalism [extra]
  3. Context: Free Will / Demarcation between Agent and Environment [extra]
  4. What is Intelligence?
  5. A Theory of Multiple Intelligences
  6. The Triune Brain
  7. The Subsumption Architecture
  8. The Extended Mind Thesis
  9. Intelligence Amplification
  10. Perception as Hypothesis
  11. The Tower of Generate and Test
  12. Evolutionary Epistemologies
  13. A Bigger Tower
  14. A Unified Theory of Intelligence
  15. Variational Free Energy
  16. Entropy
  17. Entropy and Life
  18. The Bayesian Brain
  19. Entropy and the Bayesian Brain
  20. Life and Intelligence
  21. Aside: An Overview of the Cerebrum
  22. Hierarchical Message Passing
  23. Minimising Surprise Across All Levels
  24. Free Energy vs Alief [extra]
  25. Behavioural Examples
  26. But It It True?
  27. No Paradox: Clarifying Entropy [extra]
  28. The Dark Room Problem [extra]
  29. The Maximum Entropy Principle [extra]
  30. A Tall Story [extra]
  31. Confirmation Bias: A Trait of Hedgehogs [extra]
  32. What Happens at the Top? [extra]
  33. Agent versus Environment: An Analogy [extra]
  34. Free Energy and Free Will [extra]
  35. Free Energy in Context: A Comparison With Evolution
  36. Intelligence and the Brain: A Quick Summary

Sections 5 to 14 inclusive cover ground previously explored in ‘Scientific Creatures’ but with additional sections 5, 6, 10 and 12. Duplicated sections have only brief notes here.

Sections 15 to 20 inclusive are covered in the blog posting ‘Entropy, Intelligence and Life’.

Sections 21 to 25 inclusive are covered in the blog posting ‘Hierarchical Message Passing’.

Sections 26 to 32 inclusive are covered in a blog posting ‘Free Energy: Criticisms and Conjectures’.

Section 33 is covered in a blog posting ‘Free Energy: Criticisms and Conjectures’.

Section 34 is covered in a blog posting ‘Free Energy and Free Will’.

Section 35 is covered in a blog posting ‘Free Energy in Context: A Comparison with Evolution’.

Section 36 is covered in a blog posting ‘Intelligence and the Brain: A Quick Summary’.

1. Introduction

  • What is ‘intelligence’?
  • What is the brain actually doing?
  • How is intelligence physically realised in the brain?

Quite how the brain manages to achieve what it does remains elusive. It’s true that almost every day there are news stories that some link has been found between some behaviour and activity in some part of the brain or other. But it seems there is no ‘big idea’ of how the brain works akin to Darwinism evolution for example – where all that’s needed is a fairly straightforward concept and a few billion years.

This talk goes in search of such a ‘big idea’, looking, in particular, at one very promising theory: Karl Friston’s theory of ‘Variational Free Energy’ – dating from as recently as 2006.

Before that, I look at some theories of intelligence and related issues from philosophy, psychology and robotics, before combining a number of these into a unified theory of intelligence.

2. Context: Dualism and Physicalism

Previous talks have looked at consciousness, the extension of mind and free will, in that order. A common theme is that they are looking at the problems from a physicalist perspective.

To make a grand generalisation, the world we grew up in was a dualist one.

Essentially, Dualism is the belief that there are two types of ‘stuff’, ‘mind’ and ‘matter’:

  • ‘Matter’ is the cold, mechanical stuff that exists in space. Following Newton, it is a clockwork universe. It provides us with a body but not the soul.
  • ‘Mind’ provides us with that that is beyond explanation: life (soul), intelligence (reason) and consciousness. We can only examine these as a ‘black box’. We cannot peer inside them for the are not ‘in space’. And so we can only provide psychological accounts for them. But the separation gives us free will and hence we can be held morally accountable.

Physicalism on the other hand maintains there is only one kind of ‘stuff’. We may call it ‘matter’ but we include energy as well. Basically, whilst we may not have all the answers,  it can be explained in a physicksy way. It is sometimes (but increasingly rarely) called ‘realism’ but there are many different types of ‘realism’ and, besides, ‘realism’ and being a ‘realist’ being ‘realistic’ have everyday connotations which aren’t helpful. Similarly, it is sometimes called ‘materialism’ but this too suffers from the connotations of being a ‘materialist’ being ‘materialistic’. So I’ll stick with ‘physicalism’.

Ask around and you’ll find that dualism still has a strong grip on people. But go into any philosophy, psychology or science department of a university and you will be hard-pushed to find anyone who admits to being a dualist. Most such academics will claim to be physicalists but there will be many dualist confusions  encroaching.

Of those things that have traditionally been placed off-limits to scientific enquiry, it is life that has been most successfully accounted for. Darwin provided the basic theory in ‘The Origin of Species’ (1859), Schrödinger provided the fundamental physical account in ‘What Is Life?’ (1944), and Crick and Watson provided the physical details of DNA (1953). And yet Lamarck provided a theory of evolution as far back as 1800. It may have been a wrong theory, but at least it gave an indication of what a physicalist theory of life might look like.

With those other ‘off-limits’ subjects, intelligence and consciousness, it would seem we don’t even have a wrong theory to satisfy us. For intelligence, Turing gave us what is now called the Turing Test, to at least get us used to the idea that intelligence might be physically realisable. But it seems we lack an overarching theory. And as for consciousness?! (It may well be that an adequate theory of intelligence might be a useful step in the right direction regarding consciousness.)

Physicalism brings these 3 subjects within the realm of science. However, in doing so,  it seemingly creates a problem regarding our moral accountability. But that is for another time.

3. Context: Free Will / Demarcation between Agent and Environment

This talk particularly builds upon the previous two.

The last talk was regarding free will. Towards its end, it introduced the concept of freedom and related it to complexity. This talk on intelligence builds on this relationship between freedom and complexity.

The talk before that included a look at Andy Clark’s ‘Extended Mind Thesis’. This talk on intelligence looks at this thesis again and, I think, sees it in a better perspective. The idea of intelligence outside ourselves is a common theme throughout this talk – where is the demarcation between ourselves and the environment we inhabit? The talk is trying to look at intelligence in the most general way, encompassing both the natural and the ‘artificial’.

Part 1: Philosophy / Psychology / Robotics

4. What is Intelligence?
I am not a great proponent of the ‘first define your terms’ line of argument. If we know what the thing was, we wouldn’t be asking the question in the first place. But it sets the scene, so here goes…
The dictionary definition: the Oxford Dictionaries online defines intelligence as:

‘the ability to acquire and apply knowledge and skills’

(Not particularly useful.)  But the etymology of the English word ‘intelligence’ is instructive: from Latin ‘inter-’ (between) and ‘legere’ (to choose):

‘to choose between’

…suggesting an ability to discriminate.
A recent introduction to neuroscience for a popular audience describes intelligence in a way that is more illuminating: Frank Amthor says in ‘Neuroscience for Dummies’ (2012):

‘Intelligence is exhibited in behaviour that is adaptive, that is, appropriately responsive to circumstances, particularly when those circumstances are complex and changing. An important aspect of intelligence is the ability to make predictions.’

All the bold emphasis above is mine. It is indicative of what is to follow.

5. The Theory of Multiple Intelligences

In ‘Frames of Mind: The Theory of Multiple Intelligences’, Howard Gardner (1983) proposed a model of intelligence that was not a single scale, but  8 (and subsequently extended to more) ‘uncorrelated modalities’:

  • Logical-mathematical
  • Spatial
  • Linguistic
  • Bodily-kinaesthetic
  • Musical
  • Interpersonal
  • Intrapersonal
  • Naturalistic

This is an example of what I am not looking for in a theory of intelligence! A major criticism of his theory was that it is ad hoc. It provides a description of intelligence without providing any explanatory value. There was also the embarrassing problem that the theory seemed to be untrue too – unfortunately for some, people who do well on one modality tend to do well on other modalities. The modalities are correlated, so there seems to something that those brilliant persons are doing that we can’t explain.

6. The Triune brain

A better theory is that of the ‘Triune brain’ originated by Paul McLean in the 1960s.

This set out the evolution of the brain over three stages:

  1. The ‘Reptilian complex’, physically corresponding to the basal ganglia at the bottom of the brain.
  2. Paleomammalian (early mammalian), physically corresponding to the limbic system within the brain.
  3. Neomammalian (newer mammalian) complex:  the neocortex within the brain.

These three crudely correspond to different types of behaviour:

  1. Instinct: reflex/mechanical (unemotional) responses.
  2. Emotional responses.
  3. Rational responses, including language and planning.


His theory has been popularized through Arthur Koestler’s ‘The Ghost in the Machine’ (1969), associating the darker side of human behaviour with our baser (reptilian) evolutionary history, and Carl Sagan’s ‘The Dragons of Eden’ (1977), speculating on the evolution of human intelligence.

However, more recent comparative neuroanatomy has discredited the theory. The evolutionary story is not quite to clear-cut. But even so, the theory remains in popular scientific circulation, particularly with reference to the memorable concept of ‘reptilian behaviour’.

But as a theory of intelligence goes, this is much more attractive than Gardner’s:

  • It maps particular behaviours onto particular regions of the brain.
  • It provides an evolutionary account of the brain.

It may not be a theory that can take us any further but at least it shows the direction we should be heading.

7. The Subsumption Architecture


Notes: Rodney Brooks’s robotics ‘subsumption architecture’ (1986) is biologically-inspired. Evolutionary argument. At first glance, very similar to the triune brain. Basic organism with sensor inputs and motor outputs with behaviour determined by genes. Those genetic ‘early decisions’ are a big constraint on further development of intelligence. Easier to build a new layer on top of the existing structure. Lower levels have fast response from sensor inputs to motor outputs. Higher layers deal with more exceptional behaviour but overall responses are slower. Contrast with the triune brain thesis: not mapped to physical parts of the brain, or 3 layers.

8. The Extension of Mind


part_unified_model2Notes: Andy Clark’s ‘Extended Mind’ thesis (2008) implies something of greater intelligence. But this idea can be better seen as putting an extra layer of functionality between the agent’s (Brooksian) layers of intelligence and its environment. But do not consider just putting tools (e.g. fingers, to count on, or pen and paper, to write on) out into the environment as ‘intelligence’. Only consider something as contributing to intelligence when it makes decisions i.e takes motor action in response to stimulus.

9. Intelligence Amplification

University of Birmingham’s ‘Justin’ robot

Notes: Best to consider this extra layer in terms of W. Ross Ashby’s concept of ‘intelligence amplifiers’ (1956). One example (at the highly-complex end) of the intelligence amplification spectrum: a humanoid household assistant robot to perform tasks such as making a hot drink. Another example – at the absurdly simple, other end of the spectrum: a kettle that automatically switches itself off when the water is boiling. Its ‘taking responsibility’ for this action frees up the higher levels of intelligence to pay attention to other tasks.
10. Perception as Hypothesis

gregory_perception_as_hypothesisNotes: Richard Gregory viewed perception as being active rather than passive. He was very interested in optical illusions. Even at low levels. Illusions do not show where the brain has made mistakes but show how perception is working. Optical illusions are generally highly unnatural – nothing we would have needed to evolve to deal with. Active perception: it has some hypothesis (model of expectation) of what is being seen. That model has memory. Slogan: ‘Perception as Hypothesis’

11. The Tower of Generate and Test

Notes: Daniel Dennett’s ‘Tower of Generate and Test’ concept from ‘Darwin’s Dangerous Idea’ (1995) provides a scale of intelligence based on formation and testing of hypotheses:

  1. ‘Darwinian’ (obviously, named after Charles Darwin): Creatures are created by random mutation and thenceforth have fixed behaviour. The rule ‘survival of the fittest’ applies.
  2. ‘Skinnerian’ (B. F. Skinner): Creatures that are adaptive: they learn by testing actions in environment. Their behaviour is henceforth changed. Example: bird tapping beak on a button that releases food.
  3. ‘Popperian’ (Karl Popper): Creatures have a model of the environment inside them. that enables them to imagine consequences. Popper’s slogan: ‘Our hypotheses to die in our stead’.
  4. ‘Gregorian’ (Richard Gregory): Creatures using their immediate outside environment (refer back to Andy Clark/Ross Ashby, mentioned previously).
  5. ‘Scientific’: Creatures operate in a society, language and science arise. An environment is created in which hypotheses can be shared and refined before testing – ‘making mistakes in public’.

12. Evolutionary Epistemology

Notes: Dennett is not the only one to provide a framework of knowledge based on evolution. Others provide much more scientific detail (but Dennett’s is the most accessible):

  • Konrad Lorenz, ‘Behind the Mirror’ (1973)
  • Donald Campbell, ‘Evolutionary Epistemology’ (1974)
  • Karl Popper, ‘A World of Propensities’ (1989)

13. A Bigger Tower

Notes: Modify Dennett’s Tower by replacing his social ‘scientific’ creatures and extending the scale with the original theme of generating and testing hypothesis. First 4 steps unmodified.

Antikythera mechanism reproduction

  1. Darwinian: Created by random mutation and thenceforth are fixed (‘survival of the fittest’).
  2. Skinnerian: Learn by testing actions in the external environment and are henceforth changed.
  3. Popperian: Create models of the external environment inside them. (‘permit our hypotheses to die in our stead’).
  4. Gregorian: Use the immediate outside environment.
  5. Scientific-I: Use external models of the outside environment e.g. the Antikythera mechanism to predict eclipses.
  6. Scientific-II: Create external objects that interact with the environment e.g. remote-controlled bomb-disposal robot. (‘permit our subordinate creations to die in our stead’).
  7. Scientific-III: Create adaptive models in the external environment e.g. computer algorithms such as genetic algorithms or Bayesian spam filters (description of the latter given in a later section).
  8. Scientific-IV: create objects in the environment that the agent can learn directly from i.e. the object teaches rather than just providing results of experiments. Not there yet.

Northrop Grumman Cutlass bomb-disposal robot

Picture emerges of a blurring between the agent and its environment, with elements of the environment within the agent, and elements of the agent in the environment, with models of the environment everywhere.

This scale has removed the social dimension. But the social aspect is not just another step up the tower and does deserve, literally, a whole new dimension. Can have ‘social intelligence’ using creatures at any level e.g.:IntelligenceAndCooperation

  • Many Darwinian creatures creating a ‘swarm intelligence’ like flocking.
  • Many Scientific-III creatures creating a ‘crowd intelligence’ (as in James Surowiecki’s concept of the Wisdom of Crowds’, as distinct from intelligent creatures behaving with a herd mentality).

Can envisage a scale of social progression along the social axis just as the Tower is a progression along the agent’s axis. Is there such a thing as social intelligence? Maybe instead: just call it cooperation – it is just semantic. Here: only looking at the intelligence of individual agents.

14. A Unified Theory of Intelligence

unified_model2Notes: Combine the various ideas so far to build up a picture:

  • Hierarchy of adaptive ‘perceivers’, each with models of what is below them in the hierarchy.
  • Gradual transition between agent and its environment. Traditional split at the body can be extended to include elements in the environment. But, equally, it can be contracted. Reality: gradual.
  • There is a separate social dimension.

Part 2: Neuroscience

15. Variational Free Energy

So far, we have just employed armchair philosophy, cogitating and manipulating various ideas of others. This is all very interesting but does it relate to how the brain really works? It’s time to move over to neuroscience. I’m going to introduce just one particular theory within neuroscience, namely ‘Variational Free Energy’, posited by Prof. Karl Friston (University College London), as recently as 2006. It has been referred to as a ‘unified brain theory’ and can be seen as such in 2 ways:

  1. It offers an overarching theory of what the brain is actually doing.
  2. A number of existing partial-theories can be seen as a subset of this theory.

unified_model2To summarize the theory in just 7 words, the theory’s slogan is ‘minimization of surprise through action and perception’.
To introduce the theory, I need to cover 3 strands:

  1. Entropy and ‘Free Energy’: concerning  information theory,
  2. The Bayesian Brain: concerning probability, and
  3. Hierarchical message passing: which brings us back to a familiar diagram (the diagram presented previously is a good model for the theory and all that has been said so far is applicable).

16. Entropy

Let’s look at the first of the 3 strands: entropy. Entropy comes in 2 flavours:

  1. ‘Shannon entropy’, named after Claude Shannon who invented the concept in 1948 along with Information Theory (which is the foundation for the signal processing to allow you to get such a good mobile or broadband throughput with such a small battery / such a poor bit of ‘wet string’ that is the phone line into your house).
  2. Classical entropy, part of thermodynamics, developed by various physicists (not least, Ludwig Boltzmann around 1877) in the 19th Century pertaining to (ultimately motivated by how to build better steam engines).

In Information Theory, entropy is a measure of ‘unpredictability’ or ‘information content’. If I have a 1 Megabyte file that is just the 4 letters ‘blah’ repeated a million times, there is much less that 1 Megabyte’s worth on information there. The message “’blah’ repeated a million times” just takes 31 characters (bytes). If you zipped up the 1 Megabyte file, it would be less than 31 bytes. Zipping up (compressing) files is one reasonable way to find out the amount of information contained in the file. Or if the file is audio, use MP3 compression instead. ‘Unpredictability’ is related to ‘surprise’ (as in the slogan ‘minimization of surprise through action and perception’) and surprise can be defined mathematically:

‘surprise’ = -log(P)

where P=probability. So,

  • If I toss a coin and it comes up heads, there’s not much surprise: -log2(0.5) = 1.
  • Whereas if I provide you with some numbers that turn out to be next week’s winning  1-in-a-million lottery numbers, you will be very surprised: -log2(1/1000000) = 19.9.

Thermodynamic entropy is commonly described as being a measure of disorder but is perhaps better understood in terms of ‘energy dispersal’.  In abstract terms:

entropy, S = k.log(W)

…where W is the number of microstates (possibilities).

To provide extreme (cosmic) examples of ‘energy dispersal’:

  • At the big bang, everything in the universe was together, hence S is low.
  • At the eventual ‘heat death of the universe’, where the universe is expanding but all the stars have died out, S will be very high.

BrownianStart2To provide a more down-to-earth example, if we start with a box with different compartments in which a particular gas is in just one compartment (a state of low entropy) and then open the doors between the compartments, when we come back an hour later, we will find that the gas has dispersed to all compartments (a state of high entropy).

BrownianEnd2This ‘Brownian Motion’ is an example of that fundamental law of physics – the second law of thermodynamics. There is a tendency towards disorder (increased entropy). We would be very surprised if we started with the high-entropy state and came back an hour later to find the box in a low entropy state. We would suspect interference by some intelligence creature. MaxwellsDemon2In theoretical physics, there is a thought experiment involving one such intelligent creature – ‘Maxwell’s Demon’ – that opens and closes the compartment doors at will in order to trap all the gas in a single compartment.

Information Theory (or Shannon) entropy was originally formulated by Shannon in an analogous way to thermodynamics but is has subsequently been shown that the 2 concepts are in fact related in a fundamental way. Because Information Theory entropy is even more abstract than thermodynamic entropy, I am going to relate things to thermodynamic entropy in what follows here. But Friston’s ‘Free Energy’ theory concerns the Information Theory variety. ‘Free Energy’ is a concept in thermodynamics that is very similar to entropy and subsequently taken across to Information Theory. As far as this talk is concerned, we need not make any distinction.

17. Thermodynamic Entropy and Life

In a series of public lectures in Dublin in 1943, the physicist Erwin Schrödinger famously asked ‘What is Life?’ and the answer he gave is that living things maintain their own order at expense of their surroundings. They do not depend on some magical law that counteracts the second law of thermodynamics. They just ‘export’ their own disorder. So, for example, Schrödinger is famous for thought experiments in which cats are put in boxes. Imagine is we put a cat in a box not with some radioactivity but with a mouse. When we look later, we would expect to find the cat, a mouse carcase and some cat faeces. The cat has maintained its own order by transforming the mouse into something less ordered.  This order that is ‘exported’ has become known as ‘negentropy’.
An example near the other end of the biological scale is the sodium-potassium ion pump. Thousands of these small biological machines sit in the walls of neurons and other cells. They collect 2 Potassium ions (atoms) from outside the cell and 3 Sodium cells from inside the cell as swaps them around, bringing the Potassium in and sending the Sodium out. (This is a significant component in creating a voltage across the cell – the ‘membrane potential’ -that allows neurons to fire.) MaxwellsDemon2I want to make comparisons here between the Sodium-Potassium ion pump and Maxwell’s Demon – imagine starting off with 2 compartments in a box with a mixture of gasses, with Maxwell’s Demon sorting them so that eventually there are separated out. These little machines are locally working against the natural tendency towards disorder (but they need energy to operate hence, on a larger scale, the second law of thermodynamics is not violated).

(For a stunning tour of the sodium-potassium ion pump, Click here.)


18. The Bayesian Brain
To understand Bayesian inference, we need to understand Bayesian probability as opposed to the classical interpretation of probability.

The classical interpretation is sometimes called ‘frequentist’. Probability represents a ‘propensity’, based on how things turn out ‘in the long run’. So, as we were taught in school, we can employ such probabilities when we find ourselves picking a red or black ball out of a bag at random when we conveniently know there are 100 red and 300 black balls in the bag.

In contrast, the Bayesian interpretation is a form of ‘subjective probability’ in which the probability represents a degree of belief. Thus a probability of ‘1-in-a-million’ means ‘not likely!’ rather than ‘I have conducted experiments on this and replayed this scenario billions of times and I find that the chance of x happening is 0.000001’.

Next, we need to understand Bayesian inference.  Inference is about deriving conclusions from assumptions. David Hume famously considered the philosophical problems of inferring that the sun will rise tomorrow because it has risen every day before that. And Bertrand Russell famously then gave the example of the chicken that infers that the approaching farmer is bringing food because that is what has happened every day beforehand – but today is the day the farmer instead just picks up the chicken and wrings its neck.
We do not need to concern ourselves with the maths here but Bayesian inference is based on Bayes theorem:

P(H|D).P(D) = P(D|H).P(H)

which can be expressed as

P(H|D) ∝ P(D|H).P(H)

which is interpreted as

posterior ← likelihood . prior

That is:

  • We start with a prior degree of belief.
  • New evidence comes along.
  • We then calculate the new (posterior) degree of belief, based on our previous degree of belief and the new evidence. This new degree of belief can be more or less than it was before, depending on the evidence.

We modify our predictions as a result of new information. And in using Bayes theorem to do it, this modification is optimal – which sometimes gets equated with being ‘rational’.

A real-world application of Bayesian inference is in spam filters for e-mail. When you receive an e-mail, the spam filter decides whether it will be put into your inbox or junk folder (recall that the etymology of the word ‘intelligence’ is ‘to choose between’). When you move a file from the junk folder to the inbox, you are telling it that it got it wrong. At this point, it will try to learn how to choose between junk and non-spam better, given this new information. It would ideally look afresh at all the e-mails that it has ever received and try to work out how best to discriminate junk mails. But it cannot look back at all those emails – many of the inbox emails and virtually all of the junk emails have been deleted. The data has been thrown away. All it has to go on is the statistics about those emails that it has collected. And it is these statistics that are updated in a Bayes-optimal  fashion such that, after the early days, it gets thinks right the vast majority of the time. It will adapt itself to the environment of your emails – even if you’re working in the banking sector of Nigeria.

And neuroscientists are increasingly understanding the way the brain works in Bayesian terms.

(There are many more-thorough explanations of Bayesian inference available on the net – generally about an hour long but stop by Daniel Wolpert’s 20-min ‘The Real Reason for Brains’ TED talk on the way.)

19. Entropy and the Bayesian Brain

In order to provide a very basic explanation of the notion:

‘Minimization of surprise through action and perception’

…I am going to look at a rather contrived example of answering a question of the game show ‘Who Wants to be a Millionaire?’.

Firstly, imagine if we were forced to give an answer before the question is even asked! We would be rather confounded. There is no information! All 4 possible answers are equally likely. It would just be a random guess:

  • A: 25%
  • B: 25%
  • C: 25%
  • D: 25%

BrownianEnd2Note: this is analogous to the gas dispersed across all compartments of the box.

But we are presented with the question:

‘What is the Capital of Australia?’

and, as a result, we have some expectations:

  • Sydney 35%
  • Melbourne 10%
  • Adelaide 5%
  • Brisbane 5%
  • Perth 2%
  • Banana 0%
  • 350ml 0%

When presented with the 4 possible answers, our options are narrowed down and our expectations change:

  • A: Melbourne 12%
  • B: Sydney 38%
  • C: Canberra 7%
  • D: Brisbane 5%

A change in stimulus causes a change in expectation. Perception is an active process (recall Richard Gregory’s ‘perception as hypothesis’).

We then decide to go ‘50:50’. This lead to a big surprise – both the most likely candidates have been removed:

  • A: Melbourne 0%
  • B: Sydney 0%
  • C: Canberra 30%
  • D: Brisbane 15%

There is a big difference between the previous (prior) and current (posterior) expectations. This large prediction error represents a significant information gain. (The difference between prior and posterior expectations is called the ‘cross-entropy’ or, more impressively, the Kullback-Leibler divergence).

As an alternative to the Popperian idea of imagining actions in our head so that the bad ones may ‘die in our stead’, action out in the environment may be see as ‘performing the experiment to optimize the model’.

After another action, ‘Ask the Audience’, our inclination that Canberra is the right answer is confirmed, leading us to choose answer C:

  • A: Melbourne 0%
  • B: Sydney 0%
  • C: Canberra 90%
  • D: Brisbane 5%

BrownianStart2With so much expectation in one ‘compartment’, this is analogous to the low entropy state of the box, before the compartment doors were opened. Thus, a combination of perception and action have changed our expectations from a high-entropy distribution to one of low entropy. I’ve tried to provide as simple as possible an explanation of the slogan:

‘Minimization of surprise through action and perception’

20. Life and Intelligence

There seems to be an interesting relationship here. Just as

life is about counteracting thermodynamic entropy,


intelligence is about counteracting information theory entropy.

The acceptance of intelligence in the form of ‘artificial intelligence’ divorces intelligence from life. But the understanding presented above makes us understand how we should not be surprised that intelligence has arisen from life.

21. Aside: An overview of the Cerebrum

Some necessary background information required for the next section…

unified_model2The picture, right, shows the two large hemispheres of the cerebrum at the top of the brain (picture credit: Picasa). (Recall the previously introduced crude ‘Triune brain’ notion of this ‘new mammalian’ structure sat above the older ‘old mammalian’ and ‘reptilian’ structures lower down.)

Imagine these hemispheres as two rugby-balls, deflated and squashed so as to fit within the confines of the skull. The 3mm-thick skin of the balls make up the cortex – most of it being formed from 6 layers of neurons (‘neocortex’, as opposed to the 4-layer parts of the cortex). Across the whole ball, the neurons are divided down into about a million hypercolumns, each composed of 50-120 minicolumns with about 80 neurons (across those 6 layers) per minicolumn (see the reconstruction, below, of five cortical columns in a rat’s brain,


from NeuroInformatics 2012). In those different layers, neurons look different due to different ways they connect with other neurons. For example, ‘pyramidal neurons’ (large neurons with a pyramid-like body – ‘soma’ – which have input branches connecting at the apex and the base of the pyramid) are mainly to be found in layers 3 and 5.

22. Hierarchical message passing

unified_model2The previous sections presented two of the 3 strands of the Free Energy theory: minimisation and Bayesian inference, resulting in a ‘predictor’ entity that adjusts its internal model of the external world as a result of interaction with it. This section presents the third: ‘hierarchical message passing’, in which many such predictors are connected together to form a hierarchy, as in the now-familiar figure, above, developed previously .

Friston’s figure, shown below (Picture credit: Frontiers in Human Neuroscience), is similar to mine, with the minor detail of being rotated by 90 degrees. But his provides rather more substance:

‘Forward’ connections (corresponding to ‘upward’ connections in my diagram and shown in red) from one level in the hierarchy to the next are ‘prediction errors’: mismatch signals which indicate the difference between what the internal model predicted and what the actual environment produced.

‘Backward’ connections (corresponding to ‘downward’ connections in my diagram and shown in black) from one level in the hierarchy to the next are predictions.


The circles and triangles shown represent collections of relatively few neurons and are superimposed on a picture of a cross-section of the grey matter of the cortex such that how high up those circles and triangles are corresponds to the layer within the cortex that those neurons are to be found. For example, the red triangles represent superficial pyramidal neurons – pyramidal neurons that are close to the surface of the cortex (at the top) – whereas the black triangles represent deep superficial neurons – pyramidal neurons that are deep within the cortex, towards the white matter (at the bottom).

So a key point to made here is that, as well as presenting a functional model of the cerebral matter (in most general terms) , there is also the mapping of parts of the ‘predictor element’ to types of neurons in particular layers of the cortex.

23. Minimizing Surprise Across All Levels

Each predictor will try to minimize surprise, with the restriction that all predictors are coupled with one another. So, for example, a change in stimulus may be initially surprising at a lower level, leading it to upper levels to ‘consider’, but after a short while, it’s model may have been adjusted such that it now accommodates the stimulus. Activity at higher levels may therefore cease. In practice, activity can ‘bubble up’ at a particular level for a short while only for it to disappear at any time. It is easy to see how this coupling of predictors could lead to (mathematically) chaotic behaviour, analogous to the ‘chaotic pendulum’.  A single pendulum with a small perturbation, has very predictable behaviour. Literally, it runs like clockwork. But hanging pendulums off of other pendulums and giving them a big kick produces chaotic behaviour. In the video (below), notice how at some times (such as at 0:07) most of the activity is in the main central pendulum and at other times (such as at 0:13) it is mainly in the second pendulum. Now imagine if there are not 2 but thousands of pendulums.

24. Free Energy versus Alief


Tamar Gendler’s notion of  ‘Alief’ describes sub-conscious beliefs that are often at odds with our conscious ones. Hence it may be described in terms of conflict across hierarchy. In contrast, Friston’s ‘minimising surprise across all levels’ can be viewed as cooperation across hierarchy. At times, higher and lower levels may be predicting differently but the whole is determining the best solution for action.

25. Behavioural Examples

How does this theory correspond to observed behaviour of the organism? How does the minimisation of error across the many layers lead to observed behaviour of the whole organism?

In various papers, Friston provides a number of examples:

  1. Attention: If something ‘catches our eye’ in our peripheral vision, we minimize the prediction error by turning our head to direct full attention to it. This similarly applies to investigating a sudden noise. But if this noise gets repeated, we eventually get used to it and it is no longer surprising (a process called habituation).
  2. Cued movements: learning eye and motor coordination to follow a moving target with a finger.
  3. Similarly, simulation of handwriting: generation and recognition of motor trajectories.
  4. Reflex arcs and proprioception: prediction errors drive action, overcoming reflexive action.
  5. Birdsong: the generation and subsequent perceptual categorisation of chirping, reconstituting ‘hidden states’ in the generator, i.e. determining what was being said from the sounds made by the chirper’s syrinx (the vocal organ of birds).
  6. Saccadic eye movement. Saccades are fast movements of the eye. The eyes move around, locating interesting parts of the scene and building up a map corresponding to the scene. As with the birdsong example, the issue is how to get at the hidden states that underlie the physical realisation of the biological organ (in this case, the eye rather than the syrinx).
  7. With saccadic eye movements, the search of the visual field is not random. A simple example: when we look for our car keys, we don’t look around randomly. We have expectations (predictions) of where they may be and eliminate these possibilities one by one.
  8. Understanding schizophrenia: Normally, hearing is suppressed when talking. Schizophrenia can be seen as a failure to suppress this, leading to the wrong attribution of the agency of speech.

26. But Is It True?


Friston’s ‘Variational Free Energy’ theory has been presented as  the closest thing we currently have to an overarching grand theory of how the brain creates intelligence. An obvious question to ask about it is ‘but is it true?’ How would we verify or refute it?


In Gregory Huang’s ‘New Scientist’ article on Variational Free Energy, Tomaso Poggio (at MIT) say that Friston’s theory is not testable. But over the past few years, a number of huge initiatives have been started on large-scale simulations of the brain. If any of these successfully demonstrate that they map well to actual brain structures, these then become (ethically acceptable) playgrounds for invasively testing the theory. These projects include:

  1. The ‘Blue Brain Project’ project (lead by EPFL, Lausanne, and started in 2005) is aiming towards a whole-brain emulation but is presently at the level of modelling cortical columns.
  2. The ‘Human Brain Project’ is a €1bn EU ‘Future and Emerging Technologies’ initiative (2013-1023) that builds on the ‘Blue Brain’ work.
  3. the ‘Human Connectome Project’ (US NIH 2009-2014) aims to map the high-level connectivity between parts of the brain using diffusion MRI scanning information.
  4. the $100m ‘BRAIN (Brain Research through Advancing Innovative Neurotechnologies) Initiative’ (US NIH/DARPA/NSF 2013-1023) aims to map the whole brain down to neuron-level, starting with simple organisms and working up from that.
  5. You can’t get much simpler than the C. Elegans roundworm. Its nervous system comprises just 302 neurons which have been mapped (a complete map of connectivity of neurons is called a ‘connectome’) but how we get from the connectome to actual worm behaviour is still poorly understood. The ‘Openworm’ project is an Open Source project to remedy this.

27.   No Paradox: Clarifying Entropy [extra]


If entropy is a measure of information content, and intelligence is about minimizing entropy (‘… through action and perception’) then this implies that intelligence is about minimizing information.  “The less I know, the more intelligent I am!”  is absurd!


Entropy (of the information theory variety) is commonly thought of as a measure of information. For example, Wikipedia introduces (information theory) entropy as follows

Entropy is a measure of unpredictability or information content.

…implying that a higher entropy entails a higher information content. Many have fallen into this trap (e.g. James Gleick’s book ‘The Information’).

We need to be more precise about the terms ‘the entropy’ and ‘the information’. Shannon entropy originates from the context of transmitting information across a noisy medium, for which:

  • Entropy is measured in ‘bits per symbol’ and it represents uncertainty (how able is the receiver able to predict what’s coming next).
  • The information content is the decrease in uncertainty (entropy) at the receiver.

Let me try to explain in terms of 3 files stored in memory:

  • ‘data_10Mb.txt’: a 10MB plain text file
  • ‘’:  a1MB ‘zipped up’ i.e compressed version of ‘data_10Mb.txt’
  • ‘data_1Mb.txt’: a 1MB plain file text file with just the first 1MB of ‘data_10Mb.txt’


  • the ‘data_1Mb.txt’ file contains less actual information than the other 2 files.
  • the zip file yields more actual information per memory byte.

MaxwellsDemon2The zip file has the highest entropy and this goes with having the highest concentration of information (information per memory byte). But is this contrary to our understanding of entropy as being a measure of dispersal? (Recall the compartments with Maxwell’s demon)

BrownianStart2 Here’s the explanation: if you looked at the distribution of 1s and 0s in the data in a zipped file and a plain text file, the former would look random (it’s not, of course) and the latter would have regular patterns (like bit 7 of each byte always being zero, because of the way the data is encoded in ASCII). The former looks like the ‘high entropy’ figure and the latter more like the ‘low entropy’ figure. The high entropy zipped file yields more information per byte.

(For more information, see ‘Information Is Not Entropy, Information Is Not Uncertainty!’ and ‘I’m Confused: How Could Information Equal Entropy?’)

28.   The ‘Dark Room Problem’


If animals ‘minimize surprise through action and perception’, wouldn’t this just lead them to find a dark cave where there is no stimulus and for them to stay there.


This is the so-called ‘Dark Room’ problem. To counter this argument:

  • You can avoid surprise in the short term by burying your head in the sand, but it is not conducive to long term survival.
  • Some creatures do inhabit ‘dark rooms’ but that is only because they are well-adapted to that environment – such as by having some sensory superiority over their predators/prey in that environment (in which case, it is not ‘dark’ for them).
  • A secure cave may be a safe place, but long term survival is likely to require us to go out to hunt prey to eat, eventually. In which case, actively exploring that hunting environment at relatively safe times will is a better strategy for minimising surprise.
  • For creatures such as ourselves, the best strategy in a dark room to avoid surprise is to turn the light on!


Surprise is minimized across all hierarchical levels. Higher levels operate at slower timescales. Our lowest levels may incline us towards slothful safety but these are counterbalanced by the higher levels that take a longer term view.

29.   The Maximum Entropy Principle [extra]


The ‘minimizing surprise’ mechanism should lead to organisms just trying to confirm their beliefs rather than predicting the truth.

This argument is similar to the ‘Dark Room’ problem and its basic response is the same too: confirmation bias, strengthening pre-existing beliefs, is good for opinion formation in the short term but, if those opinions (predictions) are wrong, they are not conducive to long term survival.
Organisms could rely on Darwinian selection but this is not indicative of intelligent behaviour. Intelligent behaviour is one in which the organism does the selection themselves. Another mechanism over and above those already describes is required and Friston invokes Edwin Jaynes’s ‘Principle of Maximum Entropy’. Recall that in Bayesian inference, new information modifies the prior probability distribution to produce a new posterior probability distribution. The ‘Principle of Maximum Entropy’ states that the prior probability distribution with the highest entropy should be used. In order words, we should keep our minds open as much as possible, considering as much as possible. The ‘Principle of Maximum Entropy’ (in the prior) is employed in the service of the ‘Free Energy’ principle in minimizing entropy (in the posterior).

In the previous example (‘What is the capital of Australia?’ in ‘Entropy, Intelligence and Life’), only a few actual Australian towns were considered. The Principle of Maximum Entropy acts to include as many answers into consideration as possible: any Australian town, any town, anything – including the bizarre such as ‘banana’ and ‘42’! As long as there are neural pathways that exist, they can be taken into consideration.

So, it could be claimed that intelligence is as much about maximizing entropy and minimizing it! In reality, it seems to simply come down to maximizing the information gain – the difference between the prior and posterior, which can be achieved by:

  • Maximizing the entropy in the prior, and
  • Minimizing the entropy in the posterior.

Intelligence is thus about gaining the maximum information that is possible out of a situation – which sounds more reasonable. And it is still intimately linked with entropy.


This ‘Principle of Maximum Entropy’ sounds like a convenient sticky plaster solution. I think this is just down to how the whole Free Energy story is presented – we have been presented with a ‘tall’ story…

30. A Tall Story [extra]


The Free Energy theory presents the brain as a single thread of layers whereas, in reality, it is a jumbled mess of interconnections.


That is entirely correct! The ‘hierarchical message passing’ part of the Free Energy story tells of small collection of neurons on top of other collections on top of other collections, and so on. As such, it is a ‘tall’ story (presenting a picture of the brain’s connectome that is very tall but very thin. It is, literally, a very one-dimensional story that entirely neglects the ‘width’ of connectivity. For example, Felleman and Van Essen’s well-known ‘circuit diagram’ of the visual cortex on the Macaque monkey (see below; credit: York University) shows many areas working in parallel with one another, independently. For example, it is well known that visual signals are not just passed up from the retina (‘RGC’ on the diagram), via the Lateral Geniculate Nucleus (‘LGN’), to the ‘V1’ cortical area, but that they go to other areas as well.


This ‘hierarchical message passing’ story is a necessary simplification of reality – its purpose is to show the vertical connection of behaviour from small collections of neurons right up to the high-level, externally-observable behaviour of the animal’s brain.

But the principle can be extended to 2 dimensions, with there being a ‘fanout’ to more than one higher or lower level. Some of Friston’s more recent work shows this. The figure below (from ‘Dopamine Affordance and Active Inference’) is for a simulation of cued reaching.


31. Confirmation Bias: a Trait of Hedgehogs [extra]


The ‘Principle of Maximum Entropy’ should ensure there is no confirmation bias – but in reality, people are prone to confirmation biases. For example, they perform poorly in the Wason Acceptance Task. Maybe we should not overestimate the role that the ‘Principle of Maximum Entropy’ plays, after all.

Philip Tetlock’s book ‘Expert Political Judgment: How Good Is It? How Can We Know?’ shows that so-called ‘experts’ were generally only slightly more accurate than chance at making accurate predictions within their expert field, and worse than relatively simple statistical models.  Against advanced statistical inferencing machines (fully employing the ‘Principle of Maximum Entropy’), experts should be expected to be significantly poorer performers.

Among the many ways he looked at trying to demarcate his group of experts (into ‘bad’ and ‘very bad’) in order to see what could be done to improve their efforts, Tetlock split them into ‘hedgehogs’ and ‘foxes’, following Isaiah Berlin’s metaphor of ‘The Hedgehog and the Fox’. This is based on a text fragment attributed to the ancient Greek poet Archilochus:

the fox knows many things, but the hedgehog knows one big thing.

Tetlock found that the ‘fox’ personality type was a better predictor than the ‘hedgehog’ type. (Chapter 4 in his book is called ‘Honoring Reputational Bets: Foxes Are Better Bayesians than Hedgehogs’ – note the reference to Bayesian inference.)

The connection I’m making here is between the maximum entropy principle and the ‘fox’ personality type. Foxes performed better because they considered more options. Their priors have a higher entropy than those of hedgehogs, who rejected many possible potential solutions because they were too tied to ideologies.

32. What happens at the top?


The Free Energy theory presents a hierarchy. What happens at the top? The highest level has nowhere to feed prediction errors to or receive predictions from. Is that where the Cartesian theatre is?!


It has already been noted that the 1-dimensional ‘hierarchical message passing’ part of Free Energy theory is a gross simplification and that 2-dimensional models can be built. These may have one or may have more predictors at the top. It is also possible (most likely, even) that the 2-dimensional connectivity, network simply cannot be transformed into a hierarchy in which any node is higher than any other such that the is no top. And there is no ‘Cartesian Theatre’ in terms of a centre which receives most information and acts upon it (also: ignoring any connotations of consciousness here). Decisions are made at all levels in all predictors.


This posting has made many conjectures in tying some high-level human behaviour to the Free Energy theory, some of which are almost certainly  ‘Greedy Reductionisms’ (Daniel Dennett: “in their eagerness for a bargain, in their zeal to explain too much too fast, scientists and philosophers … underestimate the complexities, trying to skip whole layers or levels of theory in their rush to fasten everything securely and neatly to the foundation.”) but are provided for their (hopefully, thought-provoking) entertainment value.

33. Agent versus Environment: An Analogy

Here, I look at the demarcation between an agent and its environment, with reference to Karl Friston’s ‘Variational Free Energy’ theory.

Agents are frequently considered as objects that can move around an environment but movement is not necessarily required. Philosophically, agency is concerned with action. Here, I am considering agents more as holders of intelligence (acting on the environment) than as mobile things. It is not essential for an intelligent agent to move around – particularly if it can extend itself to control subordinates out in its environment such as robots/drones as discussed previously).

The ‘Variational Free Energy’ theory provides a bridge between:

  • the relatively simple behaviour of neurons, and
  • the overwhelmingly complexity of organisms (controlled by their overwhelmingly complex brains).

I want to provide a visual analogy between the rather abstract concept of intelligence and the more accessible notion of height in a landscape:

  • In a landscape, there is a tendancy for things to move downwards due to gravity.
  • In thermodynamics, there is a tendancy for things to move towards increased disorder.
  • Intelligence has been presented as counteracting entropy.

In the analogy, higher intelligence is equated with a higher position on a hill (a higher altitude).

The bridging between the (conjectured) low-level Bayesian inferencing of neurons and the high-level intelligence of organisms is achieved by the hierarchy of inferers applying the basic method over and over again. A canal lock can’t raise a boat very high but put a lot of them together and the result can be impressive (‘a journey of a thousand miles begins with a single step’ –  Lao-Tzu).


Caen Hill locks, near Devizes: The main flight of 16 locks raises boats approx 40 metres in a distance of less than 1 kilometre.

The result is

  • a great creation accumulation of knowledge, and
  • a great accumulator of knowledge.

It is like the accumulator is hoarding knowledge up into a large pile, a bit like the earth piled up at the man-made mound of Silbury Hill (below).


Silbury Hill, near Avebury: A 40-metre high man-made hill built approx 4750 years ago (a bit before the first Egyptian pyramids).

But note that within the hierarchy the prediction is always acting downwards. At any point on the surface of the ‘information mound’:

  • There is a slope:  an ‘information gradient’.
  • What is up-hill is able to make predictions about what is below it.
  • What is down-hill is not able to make predictions about what is above it.


A silly ‘philosophical’ question: where does a hill begin? Where is the demarcation between ‘hill’ and ‘land around the hill’? With Silbury Hill, there is a fairly clear base of the hill, on flat land. In a natural landscape, it is not so easy to make this demarcation.

The point: We are not able to demarcate between an agent and its environment. There is a continuum:

  • ‘higher’ brain functions (such as in the Prefrontal Cortex),
  • ‘lower’ brain functions (such as in the Striate Cortex V1),
  • the body,
  • bodily extensions (a mobile phone),
  • the self-made habitat (house, car).
  • the locality, with objects constructed/used by the agent, through to.
  • the wider environment.

At any point in the hierarchy, upwards is ‘more agent and less environment’ and downwards is ‘more environment and less agent’. That is all that can be said.

34. Free Energy and Free Will

Here, I tie together ideas from this talk about prediction with those concerning free will from the previous talk ‘Free Will/Free Wont’.

In a dualist worldview, the mind exists apart from matter and can be the seat of free will; it has no physical constraint. In a physicalist worldview, our behaviour is ultimately determined by the physics of the material world, via biology (I am saying nothing about consciousness here). Karl Friston’s ‘variational free energy’ theory provides a plausible account of how masses of neurons can behave in order to create the higher-level behaviour of the brain and so close the gap between the lower sciences (the traditionally ‘physical’ sciences: physics/chemistry/biology) and the higher sciences (the traditionally ‘non-physical’ sciences: psychology/ethology/sociology). In doing so, it would appear to leave no room for free will.

In the ‘Free Will/Free Won’t’ talk, the dualist concept of free will was replaced by the psychological concept of ‘conscious will’ (after Dan Wegner) and the physicalist concept of ‘freedom’. This notion of freedom introduced the idea that freedom is related to unpredictability and Shannon’s information theory, using as an example the humble fly’s erratic flight behaviour and our general inability to swat it as a result. In contrast, ‘Intelligence and the Brain’ is all about prediction. But predictability (from the standpoint of the agent) and unpredictability (from the standpoint outside of the agent – that is, of the environment) are really just two sides of the same coin. Previously, an analogy was made between predictive power and a hill. The higher up the hill, the greater the predictive power. But if there’s a downhill slope in a particular direction, there must be an uphill slope in the opposite direction!

  • Downhill: An agent predicts its environment
  • Uphill: An agent is unpredictable to the environment (it is difficult to predict uphill).

This gradient is what gives us our freedom.


A Neolithic mound: a physicalist analogy of our freedom.

The environment is difficult to predict because:

  1. It is complex. This is particularly true for organisms within it. It (and they) display chaotic behaviour.
  2. We cannot see inside those objects (agents) to see what is going on inside them.

The behaviour of those agents is very difficult to predict because they themselves embed their own predictive models of their environment, and hierarchically (at many levels) too.

So there are two problems for prediction (and hence our freedom) here:

  1. chaos (in the mathematical sense), and
  2. hidden states.

1 – Chaos:

Chaotic systems are one in which the behaviour is unpredictable even though the system is deterministic. Edward Lorenz defined it as:

“When the present determines the future, but the approximate present does not approximately determine the future.”


The classic example of a chaotic system is the double pendulum (see right; credit: Wikipedia). A single pendulum disturbed only slightly is so predictable it runs, literally, like clockwork. But a double pendulum that has been given a good swing, is a different animal. At times, most of the energy is in the outer pendulum and at other times, the outer pendulum just flops around, following the inner one. We cannot predict when transitions between these two types of motion will occur.

This is just with two arms – two degrees of freedom. Imagine how many there may be in the brain! The now familiar diagram of a hierarchy of predictors is a bit a like having many pendulums hanging off one another.


A hierarchical model of the brain.

2 – Hidden States:

We cannot see inside the minds of others:

  • We cannot read the minds of others (such as in the film ‘What Women Want’).
  • We cannot monitor others by using instruments. The best instruments we currently have are large machines such as fMRI scanners. These, along with current knowledge, provide us with effectively no predictive information about the others at all. And they are not exactly inconspicuous!

(The former point is a dualist one; the latter is a physicalist one.)

Seeing inside another’s mind, hearing their thoughts or reading measurements of their brain scans – it doesn’t matter what sense might be used or whether it is internal or external. Our predictive power and our freedom would be considerably enhanced if we could access the states of others’ minds. And this is without a reciprocal arrangement; imagine the loss of freedom if others could read our minds.

Without any magical mechanism to see into the minds of others, we need to try to recreate what is going on the minds of other complex organisms by using our own brains.

Karl Friston provides a Free Energy example of recreating what is going on in the brains of others. That example is: how does one bird work out what another bird is communicating? This might seem trivial and very different from interpreting intentions of humans, but it is just a matter of scale.

A bird creates a sounds in its voicebox (syrinx) by chirping (which is itself a chaotic mechanism). A listening bird must decode that sound, to get beyond the physical details of the chirping sound, back to the source of that sound – the ‘decision’ to make one particular type of sound, rather than another. The listening bird needs to be able to discriminate between the different chirping which signify different things – to get beyond the body and lower brain levels and to hidden states within the singing bird.

Predicting inner (hidden) states of others is difficult but not impossible. In terms of the analogy of the hill (between freedom and the height of a hill), we can see the plateaus half-way up a neighbouring mountain from the vantage point of the top of our mountain. The taller our own mountain, the easier it is to see the mountains of others.


Friston: bird chirping and perceptual categorization


I have discussed the ability to predict a fly’s movement and to decode a bird’s song. But for us humans, it is generally far, far more difficult for an observer to predict our behaviour. In my analogy of prediction with height, these examples are like Silbury Hill to a human’s Mount Everest. We have privileged access to our own thoughts that others do not and we have some ability to getting inside the minds of others. It is the combination of these two things that give us our freedom.

That freedom does not need any random source. The chaotic complexity of our brains and small differences in our prior beliefs are enough to provide us will all the freedom we have.

Our freedom is large but finite. I have made the analogy with the hill and every hill is surmountable. In comparison, the dualist notion of mind separate from matter is analogous to a fantastically tall tower, impenetrable to anything the physical world might throw at it. But this is fantasy.


A fantastically-tall tower: A dualist analogy of our freedom.

35. Free Energy in Context: A Comparison With Evolution

Starting to wrap-up, I will now try to put Karl Friston’s ‘variational Free Energy’ theory into context.

At this point, I am less interested in whether the Free Energy theory is right than whether it could be (approximately) right. Let me explain the difference. And to do that, I want to make a comparison with evolution. I am presenting the Free Energy theory as a theory (albeit an unproven theory) that explains intelligence in terms of physical mechanisms. This is in a similar way to how Darwinian evolution explains life in terms of physical mechanisms.

For an explanation of life in physical terms:

  • The general theory was provided by Darwin in 1859.
  • The underlying theoretical fundamentals were provided by Schrödinger in 1944.
  • The actual physical mechanism was provided by Watson & Crick in 1953.

It took time – the best part of 100 years – to get from an initial correct theory to a satisfactory physical explanation.

Friston’s theory provides all 3 corresponding aspects for a theory of intelligence:

  • the general theory is that of  ‘minimisation of surprise through action and perception’.
  • the fundamentals are of Bayesian minimisation of entropy.
  • the actual physical mechanism is suggested to be within neural cortical columns (with much more work needed on this).

But if I were to suggest a historical date for comparison with evolution, I would suggest that Free Energy is currently where evolution was in around the year 1800 – around the time of Lamarckian evolution. Recall:  Lamarck provided a theory of evolution whereby offspring inherited characteristics acquired by their parents during their parents’ lifespan, for example, a giraffe stretching its neck to reach the highest branches of trees will have offspring with longer necks than it if had not stretched. The point here: Lamarckism may be a wrong theory but it provided an example of what a correct theory might look like, 150 years before all the science was put in place. On a scale from ‘creationism’ to ‘DNA’, Lamarckism is right next to ‘DNA’ – it is right on the big issues and wrong on the details. Now, it may be 150 years before we have a comparable scientific theory for intelligence. None of us alive today will be around then. Here, I want to provide a glimpse at what that (correct) theory of intelligence might look like.

At this point, the Free Energy theory provides some understanding of how we can relate the behaviour of a whole organism (a whole brain organ), comprising many billions of neurons, to the behaviour of relatively few neurons. In providing this as an example, we are suggesting what is not there:

  • There’s no ‘magic’ deviation from the laws of physics – even as currently understood.
  • There’s no reliance on complications such as indeterminacy, quantum phenomena and randomness that many want to include.

Whether the theory has any merit will be tested over the coming years – decades, even. How far up the scale will the theory take us? Maybe it will have some explanatory value all the way to the ‘top layers’ and, in doing so, allow us to explain the behaviour of whole organisms. But I guess it will need one or more extra ideas to get us there. There’s a big gap to bridge: there are many orders of magnitude from a single neuron in a hypercolumn to the 10-billion-ish neurons in a cerebral hemisphere.

36. Intelligence and the Brain: A Quick Summary

To summarize this discussion of intelligence in a small number of bullet points:

  • Intelligence has been presented here with reference to Karl Friston’s ‘Free Energy’ tentative theory of how the brain works.
  • It potentially bridges the explanatory gap between the physical behaviour of neurons and the psychological behaviour of the brain.
  • Friston’s Free Energy theory is about ‘the minimization of surprise through action and perception’ and I have tried to explain what this means.
  • Putting this more colloquially at a behavioural level, intelligence is about having predictions, adjusting them in the light of experience and making an effort to improve them.
  • An aspect of that last part is building better mechanisms to improve our intelligence beyond the limits of our own brains, building more and more advanced models in the outside world.
  • Not the least of our extending mechanisms is the social dimension, with the cooperation of individuals which leads ultimately to scientific knowledge.
  • The social dimension can extend intelligence, whether the individual members of the society are simple creatures, sophisticated social animals or advance beings each using advanced models in the outside world.
  • A two-dimensional map of intelligence is proposed, extending Daniel Dennett’s concept of  ‘the Tower of Generate and Test’.
  • The abstract, behavioural model of intelligence is mapped to the physical structure of what neurons are doing within the cortical columns of the brain.
  • The brain’s neocortex comprises many millions of these cortical columns.
  • At the simplest level of understanding, these columns create a hierarchy with each column communicating with those above and below them in the hierarchy.
  • In this way, ‘surprise’ is minimized at all levels – from low-level (sub-conscious) to our highest levels of understanding.
  • Each cortical column provides just a small increment in the organism’s intelligence. But, as Lao-tzu said, a journey of a thousand miles begins with a single step. The overall effect is a sophisticated ‘prediction machine’, interacting in its environment.
  • Each layer in the hierarchy is trying to ‘pull’ towards a most-likely ‘opinion’ – acting against entropy.
  • Intelligence can be understood in terms of Shannon entropy (in information theory).
  • Intelligence and life are related through entropy: intelligence is about counteracting Shannon entropy in the same way that life is about counteracting thermodynamic entropy.
  • Life embodies intelligence but intelligence can exist apart from life.
  • As such, artificial intelligence is no less real an intelligence than natural intelligence (although it is currently vastly inferior).
  • Analogies are made between intelligence and the slope of a hill and between intelligence and a chaotic pendulum as examples to help visualize how intelligence builds up over the many layers in the hierarchy to provide the agent with the freedom that helps protect it from its environment, and others.
  • Prediction and unpredictability are two sides of the same coin.

As an even shorter conclusion:

  • Intelligence has been described as a physical process, arising within the brain which has an interesting relationship with life, information theory, entropy, prediction and freedom.



Picture credits:

For images not acknowledged within the text above.


9 Responses to Intelligence and the Brain

  1. Pingback: Talk: Intelligence and the Brain | Headbirths

  2. Pingback: Entropy, Intelligence and Life | Headbirths

  3. Pingback: Free Energy: Hierarchical Message Passing | Headbirths

  4. Pingback: Free Energy: Criticisms and Conjectures | Headbirths

  5. Pingback: Agent versus Environment: An Analogy | Headbirths

  6. Pingback: Free Energy and Free Will | Headbirths

  7. Pingback: Free Energy in Context | Headbirths

  8. Pingback: Intelligence and the Brain: A Quick Summary | Headbirths

  9. Pingback: Rules, Hierarchy and Prediction | Headbirths

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s