Relearning Hierarchies of Predictors
The ‘hierarchy of predictors’ hypothesis that operates by the ‘minimization of surprise through action and perception’ has been introduced and described elsewhere but the account has been static. It does not account for the way the brain organizes itself dynamically by learning through experience. The existing hypothesis is:
That there are many levels arranged in a hierarchy, with the bottom layer having sense input and motor output to the wider environment. Each level is trying to minimize surprise by maintaining a model of its wider environment within itself in order to predict its sense input and thereby act in response. At each level, the action associated with the prediction is enacted in proportion to how successful the prediction is, and the sense error is passed up to the sense input of the next higher level. Actions from that higher level propagate downwards in proportion to how wrong the prediction was. The level learns from the experience (adapts its predictions):
1. Proportionate to how wrong it was,
2. Inversely-proportionate to how much learning has gone before.
The elaborated hypothesis adds:
Ordinarily, new tasks (which don’t fit in with any previous predictions), propagate up to high (conscious) levels where they will be acted upon clumsily (slowly). Repetitive occurrences of this new task will initially allow the high levels to improve their predictions. But after a while, lower levels will also learn and will end up acting upon the task and shutting off the higher levels (the error propagated upwards becomes small). Over time, the task will get ‘relearnt’ at lower and lower levels and this allows the response to become faster. They become habitual.
Although higher levels react more slowly than low level ones (because they are further away from the environment), they adapt to changes more quickly.
In the early stages of life, there is no prior experience. No level is any good at predicting. Action will be determined by many levels and there will be learning at many levels, but the higher levels will learn fastest. However, whatever learning there is at lower levels will shut off learning at higher levels and over time this lower-level learning will come to determine behaviour. So, early gross pattern learning eventually beds down at the lowest levels and more refined pattern learning beds down on top of that.
In early learning, the lower levels will be making poor predictions and propagating significant errors up to higher levels for them to adapt to. But some action will be generated from those lower levels – and poor action at that. The higher levels must provide strong action downwards to try to suppress this poor behaviour. Over time, the higher levels will learn to react less strongly as their error inputs become weaker.
The hierarchy thus self-organizes its learning.
Higher levels are refined models that can easily change. It can be said that it is here that there is short-term memory. Lower levels have entrenched behaviour and thus represent long-term memory.
Integrating Pyramids of Predictors
A linear hierarchy as a model of the brain is of course a gross simplification. But simplification (‘abstraction’) enables us to see the wood for the trees – to get beyond the mass of detail in order to get to some understanding of the stupendously complex object that is the brain.
A ‘pyramid of predictors’ model is better than the ‘linear hierarchy of predictors’ model in that it provides more explanatory power for only a minimal increase in complexity.
This new model introduces the following refinements to the linear hierarchy thesis:
• Generally, a process communicates down to more than 1 lower-level processes and is just one of many communicating to the process above it.
• At low levels, there is local action in response to a local stimulus. But higher up the hierarchy, more information is brought together, finding patterns across a wider range of sense input.
• Action is directed towards the child process that is feeding the largest error upwards. (This can sometimes lead to a misunderstanding of what is going on.)
So we get the self-organization of predictions being performed at the appropriate level in the pyramidal hierarchy as a result of:
1. The appropriate range of speed and sense input (lower levels operating quickly and higher levels being able to draw upon more information), and
2. The ‘relearning’ at lower levels.
We can speculate that this self-organization develops into the following levels or processes, from low (fast) to high (slow):
- Sensorimotor reflex: At the lowest level, there is a very close coupling between sense input and motor output. Example: reflex action.
- Sensory integration: Identification of patterns among multiple sense inputs, leading to a local expectation/prediction which may be confirmed or refuted. An example of this is the ‘information processing’ of detecting horizontal/vertical lines in the visual cortex of cats.
- Sensorimotor integration: The interaction between a particular sense function and the motor functions associated with that sense, such as vision influencing saccadic eye movement.
- Sensory integration: Higher-level identification of patterns of a particular sense, such as recognizing hands and how hands move.
- Sensorimotor integration: The interaction between a particular sense function and motor functions not associated with that sense. For example, during development it is found that two particular hands are exceptional (they are the exceptions to the learnt patterns of how hands move) – their jerky movements are surprising. Through integrating motor action with sense input, it becomes possible to learnt how to control these hands so they move like others hands. (As yet these hands are not yet one’s ‘own’; there is no ‘self’ yet.)
- Hypotheses and Deliberation: All ‘perception is hypothesis’ but this become more apparent at higher levels in the hierarchy. Some hypotheses result in action in the environment; some hypotheses result solely in imaginative ‘deliberations’ within the brain. All these must work within the limits of prior experience.
- Multi-modal sensory integration: Different sensations are integrated. Where there is conflicting information between senses, one ‘guess’ (hypothesis) must win. An example of this is the McGurk integration of sight and sound.
- Full sensorimotor integration: All senses and motor functions are united to create a coherent proprioceptive ‘body’, complete with feelings and emotions.
- Agency: A sense of ‘me’ arises. Those ‘exceptional’ hands identified earlier are indeed exceptional because they are mine. But things can still go wrong with identifying ‘me’ such as with the momentary confusion in the ‘rubber hand illusion’.
- Conscious deliberation: At the top are long-term deliberations which sometime get distracted by lower-level emotional feelings screaming for attention and sometimes are able to suppress those emotions.
Some key points of that hierarchical self-organization are:
• It is not (necessarily true) that we detect that others are like us – it can easily the other way around: we learn that we are like others. ‘Others’ precede ‘us’. In our pattern-matching learnings, we have copied others.
• Sensory detection of, for example, hands happens at a much lower level than the level that identifies whose hands they are.
• Agency happens at a higher level than feelings.
(Whilst quite stand-alone, this was the sixteenth part of the ‘From Neural Is to Moral Ought’ series in that it provides the necessary background to get from parts XV to XVII.)