This posting forms part of the talk ‘Intelligence and the Brain’.
Here, I look at some criticisms of Karl Friston’s ‘Variational Free Energy’ theory and add some observations.
26. But Is It True?
Friston’s ‘Variational Free Energy’ theory has been presented as the closest thing we currently have to an overarching grand theory of how the brain creates intelligence. An obvious question to ask about it is ‘but is it true?’ How would we verify or refute it?
In Gregory Huang’s ‘New Scientist’ article on Variational Free Energy, Tomaso Poggio (at MIT) say that Friston’s theory is not testable. But over the past few years, a number of huge initiatives have been started on large-scale simulations of the brain. If any of these successfully demonstrate that they map well to actual brain structures, these then become (ethically acceptable) playgrounds for invasively testing the theory. These projects include:
- The ‘Blue Brain Project’ project (lead by EPFL, Lausanne, and started in 2005) is aiming towards a whole-brain emulation but is presently at the level of modelling cortical columns.
- The ‘Human Brain Project’ is a €1bn EU ‘Future and Emerging Technologies’ initiative (2013-1023) that builds on the ‘Blue Brain’ work.
- the ‘Human Connectome Project’ (US NIH 2009-2014) aims to map the high-level connectivity between parts of the brain using diffusion MRI scanning information.
- the $100m ‘BRAIN (Brain Research through Advancing Innovative Neurotechnologies) Initiative’ (US NIH/DARPA/NSF 2013-1023) aims to map the whole brain down to neuron-level, starting with simple organisms and working up from that.
- You can’t get much simpler than the C. Elegans roundworm. Its nervous system comprises just 302 neurons which have been mapped (a complete map of connectivity of neurons is called a ‘connectome’) but how we get from the connectome to actual worm behaviour is still poorly understood. The ‘Openworm’ project is an Open Source project to remedy this.
27. No Paradox: Clarifying Entropy
If entropy is a measure of information content, and intelligence is about minimizing entropy (‘… through action and perception’) then this implies that intelligence is about minimizing information. “The less I know, the more intelligent I am!” is absurd!
Entropy (of the information theory variety) is commonly thought of as a measure of information. For example, Wikipedia introduces (information theory) entropy as follows…
Entropy is a measure of unpredictability or information content.
…implying that a higher entropy entails a higher information content. Many have fallen into this trap (e.g. James Gleick’s book ‘The Information’).
We need to be more precise about the terms ‘the entropy’ and ‘the information’. Shannon entropy originates from the context of transmitting information across a noisy medium, for which:
- Entropy is measured in ‘bits per symbol’ and it represents uncertainty (how able is the receiver able to predict what’s coming next).
- The information content is the decrease in uncertainty (entropy) at the receiver.
Let me try to explain in terms of 3 files stored in memory:
- ‘data_10Mb.txt’: a 10MB plain text file
- ‘data_10Mb.zip’: a1MB ‘zipped up’ i.e compressed version of ‘data_10Mb.txt’
- ‘data_1Mb.txt’: a 1MB plain file text file with just the first 1MB of ‘data_10Mb.txt’
- the ‘data_1Mb.txt’ file contains less actual information than the other 2 files.
- the zip file yields more actual information per memory byte.
The zip file has the highest entropy and this goes with having the highest concentration of information (information per memory byte). But is this contrary to our understanding of entropy as being a measure of dispersal? (Recall the compartments with Maxwell’s demon)
Here’s the explanation: if you looked at the distribution of 1s and 0s in the data in a zipped file and a plain text file, the former would look random (it’s not, of course) and the latter would have regular patterns (like bit 7 of each byte always being zero, because of the way the data is encoded in ASCII). The former looks like the ‘high entropy’ figure and the latter more like the ‘low entropy’ figure. The high entropy zipped file yields more information per byte.
(For more information, see ‘Information Is Not Entropy, Information Is Not Uncertainty!’ and ‘I’m Confused: How Could Information Equal Entropy?’)
28. The ‘Dark Room Problem’
If animals ‘minimize surprise through action and perception’, wouldn’t this just lead them to find a dark cave where there is no stimulus and for them to stay there.
This is the so-called ‘Dark Room’ problem. To counter this argument:
- You can avoid surprise in the short term by burying your head in the sand, but it is not conducive to long term survival.
- Some creatures do inhabit ‘dark rooms’ but that is only because they are well-adapted to that environment – such as by having some sensory superiority over their predators/prey in that environment (in which case, it is not ‘dark’ for them).
- A secure cave may be a safe place, but long term survival is likely to require us to go out to hunt prey to eat, eventually. In which case, actively exploring that hunting environment at relatively safe times will is a better strategy for minimising surprise.
- For creatures such as ourselves, the best strategy in a dark room to avoid surprise is to turn the light on!
Surprise is minimized across all hierarchical levels. Higher levels operate at slower timescales. Our lowest levels may incline us towards slothful safety but these are counterbalanced by the higher levels that take a longer term view.
29. The Maximum Entropy Principle
The ‘minimizing surprise’ mechanism should lead to organisms just trying to confirm their beliefs rather than predicting the truth.
This argument is similar to the ‘Dark Room’ problem and its basic response is the same too: confirmation bias, strengthening pre-existing beliefs, is good for opinion formation in the short term but, if those opinions (predictions) are wrong, they are not conducive to long term survival.
Organisms could rely on Darwinian selection but this is not indicative of intelligent behaviour. Intelligent behaviour is one in which the organism does the selection themselves. Another mechanism over and above those already describes is required and Friston invokes Edwin Jaynes’s ‘Principle of Maximum Entropy’. Recall that in Bayesian inference, new information modifies the prior probability distribution to produce a new posterior probability distribution. The ‘Principle of Maximum Entropy’ states that the prior probability distribution with the highest entropy should be used. In order words, we should keep our minds open as much as possible, considering as much as possible. The ‘Principle of Maximum Entropy’ (in the prior) is employed in the service of the ‘Free Energy’ principle in minimizing entropy (in the posterior).
In the previous example (‘What is the capital of Australia?’ in ‘Entropy, Intelligence and Life’), only a few actual Australian towns were considered. The Principle of Maximum Entropy acts to include as many answers into consideration as possible: any Australian town, any town, anything – including the bizarre such as ‘banana’ and ‘42’! As long as there are neural pathways that exist, they can be taken into consideration.
So, it could be claimed that intelligence is as much about maximizing entropy and minimizing it! In reality, it seems to simply come down to maximizing the information gain – the difference between the prior and posterior, which can be achieved by:
- Maximizing the entropy in the prior, and
- Minimizing the entropy in the posterior.
Intelligence is thus about gaining the maximum information that is possible out of a situation – which sounds more reasonable. And it is still intimately linked with entropy.
This ‘Principle of Maximum Entropy’ sounds like a convenient sticky plaster solution. I think this is just down to how the whole Free Energy story is presented – we have been presented with a ‘tall’ story…
30. A Tall Story
The Free Energy theory presents the brain as a single thread of layers whereas, in reality, it is a jumbled mess of interconnections.
That is entirely correct! The ‘hierarchical message passing’ part of the Free Energy story tells of small collection of neurons on top of other collections on top of other collections, and so on. As such, it is a ‘tall’ story (presenting a picture of the brain’s connectome that is very tall but very thin. It is, literally, a very one-dimensional story that entirely neglects the ‘width’ of connectivity. For example, Felleman and Van Essen’s well-known ‘circuit diagram’ of the visual cortex on the Macaque monkey (see below; credit: York University) shows many areas working in parallel with one another, independently. For example, it is well known that visual signals are not just passed up from the retina (‘RGC’ on the diagram), via the Lateral Geniculate Nucleus (‘LGN’), to the ‘V1’ cortical area, but that they go to other areas as well.
This ‘hierarchical message passing’ story is a necessary simplification of reality – its purpose is to show the vertical connection of behaviour from small collections of neurons right up to the high-level, externally-observable behaviour of the animal’s brain.
But the principle can be extended to 2 dimensions, with there being a ‘fanout’ to more than one higher or lower level. Some of Friston’s more recent work shows this. The figure below (from ‘Dopamine Affordance and Active Inference’) is for a simulation of cued reaching.
31. Confirmation Bias: a Trait of Hedgehogs
The ‘Principle of Maximum Entropy’ should ensure there is no confirmation bias – but in reality, people are prone to confirmation biases. For example, they perform poorly in the Wason Acceptance Task. Maybe we should not overestimate the role that the ‘Principle of Maximum Entropy’ plays, after all.
Philip Tetlock’s book ‘Expert Political Judgment: How Good Is It? How Can We Know?’ shows that so-called ‘experts’ were generally only slightly more accurate than chance at making accurate predictions within their expert field, and worse than relatively simple statistical models. Against advanced statistical inferencing machines (fully employing the ‘Principle of Maximum Entropy’), experts should be expected to be significantly poorer performers.
Among the many ways he looked at trying to demarcate his group of experts (presumably into ‘poor’ and ‘very poor’!) in order to see what could be done to improve their efforts, Tetlock split them into ‘hedgehogs’ and ‘foxes’, following Isaiah Berlin’s metaphor of ‘The Hedgehog and the Fox’. This is based on a text fragment attributed to the ancient Greek poet Archilochus:
the fox knows many things, but the hedgehog knows one big thing.
Tetlock found that the ‘fox’ personality type was a better predictor than the ‘hedgehog’ type. (Chapter 4 in his book is called ‘Honoring Reputational Bets: Foxes Are Better Bayesians than Hedgehogs’ – note the reference to Bayesian inference.)
The connection I’m making here is between the maximum entropy principle and the ‘fox’ personality type. Foxes performed better because they considered more options. Their priors have a higher entropy than those of hedgehogs, who rejected many possible potential solutions because they were too tied to ideologies.
32. What happens at the top?
The Free Energy theory presents a hierarchy. What happens at the top? The highest level has nowhere to feed prediction errors to or receive predictions from. Is that where the Cartesian theatre is?!
It has already been noted that the 1-dimensional ‘hierarchical message passing’ part of Free Energy theory is a gross simplification and that 2-dimensional models can be built. These may have one or may have more predictors at the top. It is also possible (most likely, even) that the 2-dimensional connectivity, network simply cannot be transformed into a hierarchy in which any node is higher than any other such that the is no top. And there is no ‘Cartesian Theatre’ in terms of a centre which receives most information and acts upon it (also: ignoring any connotations of consciousness here). Decisions are made at all levels in all predictors.
In an article published in Nature Reviews Neuroscience, Professor Lisa Feldman Barrett (Northeastern) and W. Kyle Simmons (Laureate Institute for Brain Research, OK) contend that limbic tissue, which also helps to create emotions, is at the top of the brain’s prediction hierarchy.
“The unique contribution of our paper is to show that limbic tissue, because of its structure and the way the neurons are organized, is predicting. It is directing the predictions to everywhere else in the cortex, and that makes it very powerful.”
For example, when a person is instructed to imagine a red apple in his or her mind’s eye, Barrett explained that limbic parts of the brain send predictions to visual neurons and cause them to fire in different patterns so the person can “see” a red apple.
… limbic regions of the brain send but do not receive predictions.
The abstract is at:
The paper is behind a paywall but the supplementary info is at:
This posting has made many conjectures in tying some high-level human behaviour to the Free Energy theory, some of which are almost certainly ‘Greedy Reductionisms’ (Daniel Dennett: “in their eagerness for a bargain, in their zeal to explain too much too fast, scientists and philosophers … underestimate the complexities, trying to skip whole layers or levels of theory in their rush to fasten everything securely and neatly to the foundation.”) but are provided for their (hopefully, thought-provoking) entertainment value.