Common Sense Consciousness
There are common-sense notions of what consciousness is about which tell us:
- We are consciousness when we are awake,
- We are not consciousness if we are asleep except when we are dreaming,
- People under anaesthetic are not consciousness.
- People in a coma are not consciousness but those suffering from ‘locked in’ syndrome are.
- People have a single consciousness. It is not that there are multiple consciousnesses within them.
- There is no higher consciousness – groups of people are not conscious.
- Machines are not conscious.
But these can be wrong. For example, to take the last point, there is the danger of us being ‘biochauvinist’, failing to recognize that non-biological stuff can be conscious in any way.
We Need a Theory
Much has been said on the nature of consciousness by philosophers but, as with much of philosophy, it is pre-scientific. We are still grappling with the problem to find a way to make it scientific where we can progress beyond speculating by testing hypotheses – predicting and quantifying them. It is like we are at the same stage as the ancient Ionian philosophers were when speculating about the physical nature of the universe. For example:
- Thales speculated that ‘everything is water’ and provided reasons for his argument,
- Anaximenes speculated that ‘everything is air’ and provided reasons for his argument, and
- Heraclitus speculated that ‘everything is change’ and provided reasons for his argument.
No amount of speculation on its own could have ever led anyone to our current understanding of the physical world, involving quantum theory and relativity. Our understanding has developed through a long series of theories that have all been refuted as being ‘wrong’ but were necessary steps to make progress.
We have been lacking theories which would provide the first step towards a scientific understanding of the fundamentals of consciousness. This is ‘proto-science’ – at the start of the scientific process. We need to have a theory that is scientific in that it describes consciousness in wholly physical terms and that, given a specific physical state, can predict whether there is consciousness. As there is progress, theories and methods get established into what we normally understand as ‘science’. It can then provide useful applications. For example, a good theory would provide us with 100% success rate in avoiding ‘anaesthesia awareness’. It must agree with our common-sense understanding of consciousness to some degree but it may surprise us. For example, it might tell us:
- We are consciousness throughout the time we are asleep – the difference is that our experiences are not laid down in memory.
- In some specific circumstances, machines and/or groups of people can be conscious.
Integrated Information Theory
“the only really promising fundamental theory of consciousness”
In it, Tononi proposes a measure named after the Greek letter φ (‘phi’) which is the amount of ‘integrated information’ of a system. Consciousness is a fundamental property of the universe which arises wherever φ > 0. It is therefore a form of ‘panpsychism’ – consciousness can arise anywhere. The higher the value of φ, the larger the amount of consciousness. Consciousness is a matter of degree. Humans have large brains and very large φ and are highly conscious. Small rodents have smaller φ and are therefore less conscious. But sleeping humans must have a lower φ than wakeful rodents.
I have previously posted about Tononi’s theory, by providing an overview of his book ‘Phi: A voyage from the Brain to the Soul’. The book is a curious fusion of popular science and fiction and so, disappointingly avoids all technicalities involved with the theory and the calculation (quantification) of φ.
In one form of the ‘Integrated Information Theory’, φ is calculated as:
In short, φ is a measure of the information flow within a system. It is essentially formulated back from wanting (!) the following:
- The information flow between humans is much much less than the information flow within a human brain.
- The distinguishing indicator between wakefulness and REM sleep versus non-REM sleep is that there is a large drop in ‘long’ range’ communication in the latter – information flow is much more localised.
And this (necessarily) leads to the conclusions we ‘want’:
- We are not conscious in non-REM sleep or in a coma but are at other times, including if suffering from locked-in syndrome.
- There is not a consciousness associated with a group of people.
A positive φ requires the mutual flow of information within the system – between parts of the system, there is flow in both directions. In short, there are loops and ‘internal states’ i.e. memory. Tononi provides a metaphor of a digital camera. A 10-megapixel camera sensor provides 10 megabits of information but there is no integration of that information and no memory. In contrast:
- The human visual system combines information from neighbouring rod and cone photo-receptors in the retina before the information gets to the cortex of the brain, and
- There are more connections in the brain going from the ‘higher’ levels down towards the retina than there are going in the opposite direction.
A camera sensor has zero φ. so there is no consciousness. But a thermostat has memory (precisely 1 bit capacity) and a loop because of its hysteresis. It has some small positive value of φ. Hence is has some (absolutely minimal) degree of consciousness!
This all sounds like a crack-pot theory but it is being taken seriously by many. Tononi’s academic specialization is on sleep but he has worked at Gerald Edelman’s Neurosciences Institute, La Jolla, working with Gerald Edelman on metrics for brain complexity. This has evolved into his metric for consciousness. (Incidentally, he has also worked with Karl Friston who was also at the Neurosciences Institute at the same time). Christof Koch is now collaborating with Tononi on the theory. My point: he is not someone on the fringes of this academic field.
Cynically, we might say that the theory has credibility because there is so very little else of substance to go on. We need to recognize that this is all still just ‘proto-science’.
‘IIT 1.0’ and ‘IIT 2.0’ based measures of ‘effective information’ (ei) on entropy – the effective information was an average ‘Kullback–Leibler divergence’ (alternatively termed ‘relative entropy’). This may sound familiar: entropy and the Kullback–Leibler divergence also feature in Karl Friston’s ‘Variational Free Energy’ theory of generalized brain function.
But ‘IIT 3.0’ uses a different metric for ‘effective information’. The basis of this is known:
- in mathematical circles by the formal term of the ‘Wasserstein distance’, and
- in computer science circles by the (literally) more down-to-earth term of the ‘Earth Mover’s Distance’ (EMD)
Imagine the amount of earth that a digger would have to move to make a pile of earth of a particular shape (‘distribution’) into the shape of another (these piles of earth represent probability distributions). When applied to simple binary distributions, this just reduces to the ‘Hamming distance’ used in Information Theory for communication systems.
Unlike previous editions, ‘IIT 3.0’ explicitly provided an example that I find rather incredible.
Figure 21 of ‘IIT 3.0’ shows 2 circuits, A and B (see below). The circuits consist of circles connected together with red and black arrows. The circles are ‘nodes’. The arrows are signals which are inputs to and outputs from the nodes. My interpretation of these diagrams is as follows:
- Black arrows mark ‘excitatory’ connections.
- Red lines with a dot at one end mark ‘inhibitory’ connections (going to the end with the dot).
- At each node, the input values are added (for excitatory connections, effectively scaled by 1) or subtracted (for inhibitory connections, effectively scaled by -1). If they meet the criterion marked at the node (e.g ‘>=2’) then each output will take the value 1 and otherwise it will be 0.
- Time advances in fixed steps (let us say 1 millisecond, for convenience) and all nodes are updated at the same time.
- The diagrams colour some nodes yellow to indicate that the initial value of a node output is 1 rather than 0 (for a white node).
Figure 21. Functionally equivalent conscious and unconscious systems.
The caption for the figure reads:
(A) A strongly integrated system gives rise to a complex in every network state. In the depicted state (yellow: 1, white: 0), elements ABDHIJ form a complex with ΦMax = 0.76 and 17 concepts. (B) Given many more elements and connections, it is possible to construct a feed-forward network implementing the same input-output function as the strongly integrated system in (A) for a certain number of time steps (here at least 4). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain. The transition from the first layer to the second hidden layer in the feed-forward system is assumed to be faster than in the integrated system (τ << Δt) to compensate for the additional layers (A1, A2, B1, B2)…
The caption concludes with a seemingly outrageous statement on zombies and consciousness which I will come back to later on.
Unfortunately, in the figure:
- With the ‘integrated system’, I cannot reproduce the output sequence indicated in the figure!
- With the ‘feed-forward system’, it is difficult to determine the actual directed graph from the diagram but, from my reasonable guess, I cannot reproduce the output sequence indicated in this figure either!
But there are strong similarities between Tononi’s ‘integrated system’ versus ‘feed-forward system’ and ‘IIR filters’ versus ‘FIR filters’ in Digital Signal Processing that are more than coincidental. It looks like Tononi’s two ‘complexes’ as he calls them are derived from IIR and FIR representations. So I am going to consider digital filters instead.
An input signal changes over time, but only at discrete time intervals. For the purposes of this example, assume there is a new sample every millisecond. There is an input stream of samples around time t:
X[t], X[t+1], X[t+2], X[t+3], X[t+4] and on.
And there is an output stream of samples:
Y[t], Y[t+1], Y[t+2], Y[t+3], Y[t+4] and on.
A simple filter that smoothes out changes in input ‘samples’ can be formed by averaging the input with the previous output value:
Ya(t) = ½.Xa(t) + ½.Ya(t-1)
This is a filter of a type called an ‘infinite impulse response’ (IIR) filter. A diagram for an IIR filter is shown below:
A ‘z-1’ indicates a delay of 1ms. The b, a0 and a1 boxes are multipliers (b, a0 and a1 are the constant values by which the signals are multiplied) and the ‘Σ’ circle sums (adds). The diagram shows a ‘second order’ filter (two delays) but I will only consider a first order one:
b = 1/2
a1 = 1/2
a0 = 0
A single non-zero value within a series of zero values is called an ‘impulse’:
X = … 0, 0, 0, 0, 1, 0, 0, 0, 0, …
If this impulse is fed into a filter, the resulting output from that impulse is called the ‘impulse response’. For the IIR filter it will be as follows:
Y = … 0, 0, 0, 0, 0.5, 0.25, 0.125, 0.0625, …
Y(1) = 1/2
Y(2) = 1/4
Y(3) = 1/8
Y(4) = 1/16
and in general form:
Y(t) = 2–t.
so there is some non-zero (but infinitesimally small) output at very high t – the response carries on infinitely and this is why the filter is called an ‘infinite impulse response filter’.
If we put a ‘step’ into the IIR filter…
X = … 0, 0, 0, 0, 1, 1, 1, 1, 1 …
we get a ‘step response’ out, which shows the smoothing of the transition:
Y = … 0, 0, 0, 0.500, 0.750, 0.875, 0.938, 0.969, 0.984, 0.992, …
This IIR filter is the equivalent to Tononi’s ‘integrated system complex’.
The DSP equivalent to Tononi’s ‘feed-forward system complex’ is a ‘finite impulse response’ (FIR) filter:
Y(t) = b0.X(t) + b1.X(t)1) + b2.X(t-2) + b3.X(t-3) + … + bN-1.X(t-N+1))
A diagram corresponding to this FIR filter (of ‘order N-1’) is shown below:
Here, the triangles are multipliers and the ‘+’ circles obviously add.
Now, we can try to get a FIR filter to behave very similarly to an IIR filter by setting its coefficients
b0 , b1 , b2 , b3 … bN-1
to be the same as the first N terms of the IIR’s impulse response. The values after t=5 are quite small so let’s set N=6:
b0 = 1/2
b1 = 1/4
b2 = 1/8
b3 = 1/16
b4 = 1/32
b5 = 1/64
so the transfer equation is:
Y(t) = (1/2).X(t) + (1/4).X(t -1) + (1/8).X(t -2) + (1/16).X(t -3) + (1/32).X(t -4) + (1/64).X(t -5)
and the step responses is then:
Y = … 0, 0, 0, 0.500, 0.750, 0.875, 0.9375, 0.96875, 0.984375, 0.984375, …
The FIR’s ‘impulse response’ only lasts for 6 samples – it is finite, hence why the filter is called a ‘finite impulse response filter’. The output is not dependent on any input value from more than 6 samples prior. but the first 6 output samples following an impulse will be the same as that of the IIR’s and so behave in a very similar way.
(Note: The output never gets any higher than 0.984375 – the sum of all the coefficients)
IIR and FIR Filters are alike but not the same
This is exactly the same situation as described by Tononi:
Reiterating Tononi’s figure caption:
Given many more elements and connections, it is possible to construct a ‘feed-forward’ network implementing the same input-output function as the ‘integrated system’ in (A) for a certain number of time steps (here at least 4). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain. …
And then there is the punchline that I omitted previously…
… Despite the functional equivalence, the ‘feed-forward system’ is unconscious, a “zombie” without phenomenological experience.
So it is true with the digital filters:
Given more elements and connections, it is possible to construct a FIR filter implementing the same input-output function as the IIR filter for a certain number of time steps (here 6). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain.
… Despite the functional equivalence, the FIR filter is unconscious, a “zombie” without phenomenological experience (unlike the IIR filter)!
For the FIR filter, there are no loops in the network – the arrows all point south/east and the value is then φ=0, in contrast with the non-zero φ for the IIR filter which does have loops.
To anyone that understands digital signal processing, the idea that an IIR filter has some consciousness (albeit tiny) whereas an equivalent FIR filter does not is absurd. This is an additional absurdity beyond that of the panpsychist idea that any filter could have consciousness in the first place.
Could Androids Dream of Electric Sheep?
In a previous talk (‘Could Androids Dream of Electric Sheep?’) I considered whether something that behaved the same way as a conscious human would also be conscious.
If something behaves the same way as a conscious human, we can still deny that it is not conscious because it is just an imitation. We would not credit a computer running Joseph Weizenbaum’s famous ELIZA program as being genuinely conscious in the same way as we are (although the Integrated Information Theory would grant it as having some lower value of φ, but one that is still greater than zero).
A narrower philosophical question is whether a computer running a simulation (‘emulation’) of a human brain would be conscious. (Yes – ‘whole brain simulation’ is not possible – yet.) A simulation at a sufficiently low level can show the same phenomenon as the real object (such as ‘getting wet’ in a rainstorm in a weather simulation.) In this case, the ‘same thing’ is going on, but just implemented in a different physical substrate (electronic transistors rather than gooey biological stuff); a functionalist would say that the simulation is conscious by virtue of it being functionally the same.
The yet narrower argument is if the physical construction of the ‘simulation’ was the same. It would no longer be a simulation but a direct (atom-by-atom) copy. Anyone insisting on this can be accused of being ‘bio-chauvinist’ in denying that computer simulations are conscious. But it is still possible that consciousness is not duplicated. For example, if whatever it is that causes consciousness is at a sub-atomic level, an atom-for-atom copy might miss this out. How would we know?
I took a functionalist position.
However, the example above shows that, according to the ‘Integrated Information Theory’, it is possible for two systems to be functionally the same (caveat: almost) but for one to be conscious whilst the other is not. In short – that (philosophical) zombies can exist.
But any ‘system’ is just a component in a larger system. It is not clear to me whether, if one component with φ>0 is substituted with a functionally identical one with φ=0, that the φ of the larger system is reduced. In a larger system, the loop-less φ=0 implementation ends up with loops around it.
To be continued (eventually, hopefully).