This posting forms part of the talk ‘Intelligence and the Brain’. It follows on from:
- A Unified Theory of Intelligence in which the the brain is viewed as a hierarchy of predictors (see diagram below).
- Entropy, Intelligence and Life which explains part of Karl Friston’s ‘Variational Free Energy’ theory using a crude example of how a single ‘predictor’ behaves in order to ‘minimize surprise through action and perception’ – using Bayesian inference to minimise entropy.
Here, I piece together many of these predictor elements into a hierarchical chain…making a chain link from individual neurons up to the overall behaviour of the brain.
21. Aside: An overview of the Cerebrum
The picture, right, shows the two large hemispheres of the cerebrum at the top of the brain (picture credit: Picasa). (Recall the previously introduced crude ‘Triune brain’ notion of this ‘new mammalian’ structure sat above the older ‘old mammalian’ and ‘reptilian’ structures lower down.) Imagine these hemispheres as two rugby-balls, deflated and squashed so as to fit within the confines of the skull. The 3mm-thick skin of the balls make up the cortex – most of it being formed from 6 layers of neurons (‘neocortex’, as opposed to the 4-layer parts of the cortex). Across the whole ball, the neurons are divided down into about a million hypercolumns, each composed of 50-120 minicolumns with about 80 neurons (across those 6 layers) per minicolumn (see the reconstruction, below, of five cortical columns in a rat’s brain, from NeuroInformatics 2012). In those different layers, neurons look different due to different ways they connect with other neurons. For example, ‘pyramidal neurons’ (large neurons with a pyramid-like body – ‘soma’ – which have input branches connecting at the apex and the base of the pyramid) are mainly to be found in layers 3 and 5.
22. Hierarchical message passing
The previous post presented two of the 3 strands of the Free Energy theory: minimisation and Bayesian inference, resulting in a ‘predictor’ entity that adjusts its internal model of the external world as a result of interaction with it. This post presents the third: ‘hierarchical message passing’, in which many such predictors are connected together to form a hierarchy, as in the now-familiar figure, above, developed previously . Friston’s figure, shown below (Picture credit: Frontiers in Human Neuroscience), is similar to mine, with the minor detail of being rotated by 90 degrees. But his diagram provides rather more substance: ‘Forward’ connections (corresponding to ‘upward’ connections in my diagram and shown in red) from one level in the hierarchy to the next are ‘prediction errors’: mismatch signals which indicate the difference between what the internal model predicted and what the actual environment produced. ‘Backward’ connections (corresponding to ‘downward’ connections in my diagram and shown in black) from one level in the hierarchy to the next are predictions. The circles and triangles shown represent collections of relatively few neurons and are superimposed on a picture of a cross-section of the grey matter of the cortex such that how high up those circles and triangles are corresponds to the layer within the cortex that those neurons are to be found. For example, the red triangles represent superficial pyramidal neurons – pyramidal neurons that are close to the surface of the cortex (at the top) – whereas the black triangles represent deep superficial neurons – pyramidal neurons that are deep within the cortex, towards the white matter (at the bottom). So a key point to made here is that, as well as presenting a functional model of the cerebral matter (in most general terms) , there is also the mapping of parts of the ‘predictor element’ to types of neurons in particular layers of the cortex.
23. Minimizing Surprise Across All Levels
Each predictor will try to minimize surprise, with the restriction that all predictors are coupled with one another. So, for example, a change in stimulus may be initially surprising at a lower level, leading it to forward the prediction error to upper levels to ‘consider’, but after a short while, it’s model may have been adjusted such that it now accommodates the stimulus. Activity at higher levels may therefore cease. In practice, activity can ‘bubble up’ at a particular level for a short while only for it to disappear at any time. It is easy to see how this coupling of predictors could lead to (mathematically) chaotic behaviour, analogous to the ‘chaotic pendulum’. A single pendulum with a small perturbation, has very predictable behaviour. Literally, it runs like clockwork. But hanging pendulums off of other pendulums and giving them a big kick produces chaotic behaviour. In the video (below), notice how at some times (such as at 0:07) most of the activity is in the main central pendulum and at other times (such as at 0:13) it is mainly in the second pendulum. Now imagine if there are not 2 but thousands of pendulums.
24. Free Energy versus Alief
Tamar Gendler’s notion of ‘Alief’ describes sub-conscious beliefs that are often at odds with our conscious ones. Hence it may be described in terms of conflict/competition between levels of hierarchy. In contrast, Friston’s ‘minimising surprise across all levels’ can be viewed as cooperation across hierarchy. At times, higher and lower levels may be predicting differently but the whole is determining the best solution for action.
25. Behavioural Examples
How does this theory correspond to observed behaviour of the organism? How does the minimisation of error across the many layers lead to observed behaviour of the whole organism? In various papers, Friston provides a number of examples:
- Attention: If something ‘catches our eye’ in our peripheral vision, we minimize the prediction error by turning our head to direct full attention to it. This similarly applies to investigating a sudden noise. But if this noise gets repeated, we eventually get used to it and it is no longer surprising (a process called habituation).
- Cued movements: learning eye and motor coordination to follow a moving target with a finger.
- Similarly, simulation of handwriting: generation and recognition of motor trajectories.
- Reflex arcs and proprioception: prediction errors drive action, overcoming reflexive action.
- Birdsong: the generation and subsequent perceptual categorisation of chirping, reconstituting ‘hidden states’ in the generator, i.e. determining what was being said from the sounds made by the chirper’s syrinx (the vocal organ of birds).
- Saccadic eye movement. Saccades are fast movements of the eye. The eyes move around, locating interesting parts of the scene and building up a map corresponding to the scene. As with the birdsong example, the issue is how to get at the hidden states that underlie the physical realisation of the biological organ (in this case, the eye rather than the syrinx).
- With saccadic eye movements, the search of the visual field is not random. A simple example: when we look for our car keys, we don’t look around randomly. We have expectations (predictions) of where they may be and eliminate these possibilities one by one.
- Understanding schizophrenia: Normally, hearing is suppressed when talking. Schizophrenia can be seen as a failure to suppress this, leading to the wrong attribution of the agency of speech.