How Susceptible are Jobs to Computerisation?

News articles and reports appear almost daily on the subject of how technological developments in Artificial Intelligence and robotics will cause dramatic changes to employment over the next few decades. (Artificial Intelligence includes techniques such as ‘machine learning’, ‘deep learning’, artificial neural nets and ‘data mining’.) A high proportion of these articles refer back to a 2013 study by Carl Frey and Michael Osborne called ‘The Future of Employment: How Susceptible are Jobs to Computerisation?’ in which they asserted that 47% of total US employment is at risk.

Here, I go back to this original source and provide a summary.

The Method

Starting with a US Department of Labor list of employment categories, Frey and Michael Osborne produced estimates for the probability of computerisation for 702 occupations. (Throughout, reference to ‘computerisation’ means to automation by Artificial Intelligence, which is underpinned by computer technology.) This estimate was derived by assessing occupations in terms of the following factors:

  • Dexterity: The ability to make precisely coordinated movements to grasp, manipulate, or assemble objects.
  • Creative Intelligence: The ability to come up with original ideas, develop creative ways to solve a problem or to compose, produce, and perform works of music, dance, visual arts, drama, and sculpture.
  • Social Intelligence: Being perceptive of others’ reactions and understanding why they react as they do. Being able to negotiate to reconcile differences and persuading others to change their minds or behaviour. Providing personal assistance, medical attention, emotional support, or other personal care to others such as co-workers, customers, or patients.

They then examine the relationship between an occupation’s probability of computerisation and the wages and educational attainments associated with it.

Included in their analysis is a history of the 19th and 20th Centuries in terms of the effect of technological revolutions on employment and contrast this with the expected effect in the 21st Century.

The Results

Whilst the probabilities of automation is listed for all 702 occupations, the results are most succinctly presented in the figure (their ‘Figure III’) below:

Frey and Osborne: The Future of Employment: How Susceptible are Jobs to Computerisation?, Figure III

How Likely is it that your job can be automated?

In the figure, they have organized those 702 occupations into various categories and demarcated based on the probability of computerisation:

  • High: probability over 70%.
  • Medium: probability between 30% and 70%.
  • Low: probability under 30%.

In the table below, I have extracted just some of the 702 probabilities related to some of the categories:

  • Management / financial / legal
  • Engineering and technical
  • Education
  • Healthcare, and
  • Food

…to provide examples that support the above graphs. They clearly show healthcare and education as low-risk categories. Professional engineering jobs are low-risk but technician jobs are spread across the middle-risk and high-risk. Food-related jobs are firmly high-risk. There are a few surprises here for me. ‘Cooks, Restaurant’ and ‘Bicycle repairers’ are going to be almost completely automated and ‘Postsecondary teachers’ are going to be untouched. Will all restaurant meals be microwave-reheated?! Will robots strip down and reassemble bikes? Will online teaching have no impact on teaching roles?

Rank Prob.% Occupation Type
6 0.4% Occupational Therapists HEALTH
11 0.4% Dietitians and Nutritionists HEALTH
14 0.4% Sales Engineers HEALTH
15 0.4% Physicians and Surgeons HEALTH
17 0.4% Psychologists, All Other HEALTH
19 0.4% Dentists, General HEALTH
25 0.5% Mental Health Counsellors HEALTH
28 0.6% Human Resources Managers MGMNT
40 0.8% Special Education Teachers, Secondary School EDU
41 0.8% Secondary School Teachers, Except Special and Career/Technical Education EDU
46 0.9% Registered Nurses HEALTH
53 1.1% Mechanical Engineers TECH
54 1.2% Pharmacists HEALTH
63 1.4% Engineers, All Other TECH
70 1.5% Chief Executives MGMNT
77 1.7% Chemical Engineers TECH
79 1.7% Aerospace Engineers TECH
84 1.9% Civil Engineers TECH
82 1.8% Architects, Except Landscape and Naval TECH
98 2.5% Electronics Engineers, Except Computer TECH
104 2.9% Industrial Engineers TECH
112 3.2% Postsecondary Teachers EDU
115 3.5% Lawyers MONEY
120 3.7% Biomedical Engineers TECH
152 6.9% Financial Managers MONEY
153 7% Nuclear Engineers TECH
163 8.4% Childcare Workers EDU
188 14% Optometrists HEALTH
191 15% Kindergarten Teachers, Except Special Education EDU
192 15% Electricians TECH
226 25% Managers, All Other MGMNT
249 35% Plumbers, Pipefitters, and Steamfitters TECH
253 36% Computer Numerically Controlled Machine Tool Programmers, Metal and Plastic TECH
261 38% Electrical and Electronics Repairers, Powerhouse, Substation, and Relay TECH
263 38% Mechanical Engineering Technicians TECH
290 48% Aerospace Engineering and Operations Technicians TECH
317 56% Teacher Assistants EDU
386 70% Avionics Technicians TECH
398 72% Carpenters TECH
422 77% Bartenders FOOD
435 79% Motorcycle Mechanics TECH
441 81% Cooks, Fast Food FOOD
442 81% Word Processors and Typists MONEY
443 81% Electrical and Electronics Drafters TECH
453 82% Sheet Metal Workers TECH
460 83% Cooks, Institution and Cafeteria FOOD
477 84% Lathe and Turning Machine Tool Setters, Operators, and Tenders, Metal and Plastic TECH
489 85% Nuclear Technicians TECH
514 88% Semiconductor Processors TECH
522 89% Bakers FOOD
583 93% Butchers and Meat Cutters FOOD
596 94% Bicycle Repairers TECH
625 95% Postal Service Clerks MONEY
629 96% Office Clerks, General MONEY
641 96% Cooks, Restaurant FOOD
657 97% Cashiers MONEY
671 98% Bookkeeping, Accounting, and Auditing Clerks MONEY
688 98% Brokerage Clerks MONEY
698 99% Insurance Underwriters MONEY

De-Skilling: The First Industrial Revolution

Frey and Osborne provide some historical perspective, looking at the impact of past technological revolutions.

They start with the case of William Lee who invented the stocking frame knitting machine in 1589. But Queen Elizabeth I refused to grant him a patent: “Consider thou what the invention could do to my poor subjects. It would assuredly bring to them ruin by depriving them of employment, thus making them beggars”.

But by 1688, protection of workers in Britain had declined. The property owning classes were politically dominant and the factory system began to displace the artisan shop. The Luddite riots of 1811-1816 were a prominent example of the fear of technological unemployment. It was the inventors, consumers and unskilled factory workers that benefited from mechanisation. Arguably, unskilled workers have been the greatest beneficiaries of the Industrial Revolution.

An important feature of nineteenth century manufacturing technologies is that they were largely “de-skilling”. Eli Whitney, a pioneer of interchangeable parts, described the objective of this technology as “to substitute correct and effective operations of machinery for the skill of the artist which is acquired only by long practice and experience; a species of skill which is not possessed in this country to any considerable extent”.

Up-Skilling: The Second Industrial Revolution

In the late nineteenth century, electricity replaced steam and water-power and manufacturing production shifted over to mechanised assembly lines with continuous-process and batch production methods. This reduced the demand for unskilled manual workers but increased the demand for skills – there was demand for relatively skilled blue-collar production workers to operate the machinery and there was a growing share of white-collar non-production workers.

This shift to more skilled workers continued:

“the idea that technological advances favour more skilled workers is a 20th century phenomenon.”.

“the story of the 20th century has been the race between education and technology”

The Computer Revolution

Office machines reduced the cost of information processing tasks and increased the demand for educated office workers. But the supply of better educated workers filling these roles ended up outpacing the demand for their skills and this led to a sharp decline in the wage premium of clerking occupations.

Educational wage differentials and overall wage inequality have increased sharply since the 1980s. The adoption of computers and information technology explains some of the growing wage inequality of the past decades. Computerisation has eroded wages for (middle-income manufacturing) labour performing routine tasks and so workers have had to switch to relatively low-skill, low-income service occupations, pushing low-skilled workers even further down (and sometimes off) the occupational ladder. This is because the manual tasks of service occupations are less susceptible to computerisation, as they require a higher degree of flexibility and physical adaptability.

Educational wage differentials and overall wage inequality have increased sharply since the 1980s. The adoption of computers and information technology explains some of the growing wage inequality of the past decades. Computerisation has eroded wages for (middle-income manufacturing) labour performing routine tasks and so workers have had to switch to relatively low-skill, low-income service occupations which are less susceptible to computerisation as they require a higher degree of flexibility and physical adaptability. This has increasingly led to a polarised labour market, with growing employment in the high-income cognitive jobs and low-income manual occupations (the ‘lovely jobs’ and ‘lousy jobs’ as Goos and Manning have called them), accompanied by a hollowing-out of middle-income routine jobs.

Off-shoring is the other big factor affecting wage inequality. It is having a similar effect on jobs as automation. Alan Blinder (who used the same Department of Labor database that Frey and Osborne subsequently used) examined the likelihood of jobs going offshore, and concluded: that 22% to 29% of US jobs are or will be offshorable in the next decade or two.

The Automation of Routine Tasks

Frey and Osborne consider cutting the jobs cake in two ways:

  • Between routine and non-routine jobs, and
  • Between cognitive and non-cognitive jobs.

Previously, the tasks that have been automated have been routine, non-cognitive ones. Routine tasks are ones that follow explicit rules – behaviour that can be codified (and then coded). New Machine Learning technologies open up routine, cognitive tasks to automation and computers will quickly become more productive than human labour in these tasks. Non-routine tasks, whether cognitive on non-cognitive, are more difficult to codify and their automation would have to follow later – gradually, as the technology develops.

But Machine Learning improves the ability of robots to perceive the world around them and so it also helps automate routine, non-cognitive (manual) tasks that have not been possible previously.

Robots are becoming more advanced, and cheaper too (Rethink Robotics’s ‘Baxter’ only costs about $20,000). They can already perform many simple service tasks such as vacuuming, mopping, lawn mowing, and gutter cleaning and will likely continue to take on an increasing set of manual tasks in manufacturing, packing, construction, maintenance, and agriculture. It must be expected that they can gradually replace human labour in a wide range of low-wage service occupations – which is where most US job growth has occurred over the past decades.

The Automation of Non-Routine Tasks

More advanced application of Machine Learning and Big Data will allow non-routine tasks to be automated. Once technology has mastered a task, machines can rapidly exceed human labour in both capability and scale. Machine Learning algorithms running on computers are commonly better able to detect patterns in big data than humans. And they are not subject to human bias. Fraud detection is already almost completely automated. IBM’s Watson is being applied to medical diagnoses. Symantec’s Clearwell acquisition (now Veritas ‘eDiscovery’) can extract general concepts from thousands of legal documents. And this intelligence is made more accessible with improved voice Human-Computer Interfaces such as Apple’s Siri and Google Now.

Education is one sector that will be affected by this. Universities are experimenting with MOOCs (Massive Open Online Courses). From what they are learning about how students react to these online courses, they will be able to create interactive tutors that adjust their teaching to match each individual student needs.

And there are ways of automating non-routine manual tasks not through new technology but just by restructuring the tasks. For example, in the construction industry, on-site tasks typically demand a high degree of adaptability. But prefabrication in a factory before transportation to the site provides a way of largely removing the requirement for adaptability.

Employment in the Twenty-First Century

Over the years, the concern over technological unemployment has proven to be exaggerated because increased productivity has led to increased demand for goods, enabled by the better skills of the workforce. But Frey and Osborne cite Brynjolfsson and McAfee: as computerisation enters more cognitive domains, it will become increasingly difficult for workers to outpace the machines.

Frey and Osborne’s headline is that 47% of total US employment is in the ‘high risk’ category; this will affect most workers in production, transportation and logistics and office administrative support in a first wave of changes.

Wary of the difficulties of making predictions, they have restricted themselves to just analysing the likelihood of jobs that currently exist being automated as a result of near-term technological breakthroughs in Machine Learning and Robotics. Regarding timescales of the effects, they only go as far as saying ‘perhaps a decade or two’ for the first wave to take effect. And they are not wanting to forecast future changes in the occupational composition of the labour market or how many jobs will actually be automated. Many jobs will disappear completely but many roles will be modified because the offloading of automated tasks just frees-up time for human labour to perform other tasks. For example, while it is evident that much computer programming can be automated, Frey and Osborne say there are ‘strong complementarities’ in science and engineering between the power of computers and the high degree of creative intelligence of the scientists and engineers.

Beyond this first wave, they say there will be slowdown in labour substitution, which will then be driven by incremental technological improvements. All told, a ‘substantial share’ of employment, across a wide range of occupations, is at risk in the near future.

There is a strong negative correlation between a job’s risk of automation and wages/educational attainment. For example, paralegals and legal assistants are in the high risk category whereas the highly-paid, highly-educated lawyers are in the low risk category.

This marks a profound change in the balance of jobs. Whereas the nineteenth century manufacturing technologies largely substituted for skilled labour through the simplification of tasks and the Computer Revolution of the twentieth century caused a hollowing-out of middle-income jobs (splitting the jobs market into high-wage, high-skill and low-wage, low-skill occupations), Frey and Osborne predict that, as technology races ahead, the Machine Learning and Robotics revolution will take out the bottom of the market, requiring the low-skill workers to acquire creative and social skills and reallocate to tasks that are non-susceptible to computerisation!

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , , , , , , , | Leave a comment

From Neural ‘Is’ to Moral ‘Ought’

This talk takes its inspiration from Joshua Greene’s ‘From neural ‘is’ to moral ‘ought’: what are the moral implications of neuroscientific moral psychology?’

He says:

“Many moral philosophers regard scientific research as irrelevant to their work because science deals with what is the case, whereas ethics deals with what ought to be.”

but Greene (director of Harvard’s ‘Moral Cognition Lab’) continues:

“I maintain that neuroscience can have profound ethical implications by providing us with information that will prompt us to re-evaluate our moral values and our conceptions of morality.”

So: what are those profound implications?

In this talk I explore various ideas to try to present a neuroscientific perspective on morality.

Is to Moral Ought

We’ll start with some brief background to ethics (the ‘moral ought’ of the title) and then the ‘is to ought’ part. ‘Normative ethics’ is about the right (and wrong) way people should act in contrast to ‘descriptive ethics’ which, not surprisingly, just describes various ethical theories.

There are 3 major moral theories within normative ethics:

  • Deontology which emphasizes duties and the adherence to rules and is frequently associated with Immanuel Kant,
  • Consequentialism which emphasizes the consequences of an action in determining what should be done and is frequently associated with Jeremy Bentham’s and John Stuart Mill’s Utilitarianism that aims for “the greatest happiness of the greatest number”,
  • and the less familiar Virtue Ethics which emphasizes the goodness (good character) of the agent performing the action rather than the act. Virtue ethics is frequently associated with Aristotle but various other philosophers have produces lists of virtues that define a good person. For example, Plato defined the ‘4 cardinal virtues’ (Prudence, Justice, Courage and Temperance) and Aquinas defined the ‘3 theological virtues (Faith, Hope and Charity). Lawrence Kohlberg (who we will hear of later on) criticised Virtue Ethics in that everyone can have their own ‘bag of virtues’ but there is no guidance of how to choose those ethics.

Whilst it is true that:

 “… science deals with what is the case, whereas ethics deals with what ought to be.”

… it is technically possible to get from an ‘is’ to an ‘ought’. We might assert a fact that ‘murder decreases happiness’ (an ‘is’), perhaps asserted because of a neuroscientific way of measuring happiness. But it would not be logically true to derive the imperative ‘do not murder’ (an ‘ought’) from this. However, if predicated by the goal of ‘maximization of happiness’, it is true:

if goal then { if fact then imperative }

‘if our goal is to achieve the maximum happiness and murder decreases happiness then do not murder’

But this just shifts the problem one step back from specifics to wider philosophical questions. The issue is then:

  • What should our goal be?
  • What is the purpose of morality?
  • What is the purpose of life, mankind and the universe?

And there is the issue:

  • Who gets to decide?

The Cognitive Essence of Morality

For me, if I get to decide the purpose of morality, I think it comes down to this – everyone can decide what their own goals are, and the essence of morality is then:

The (deliberative) balancing the wants (goals) of oneself with those of (sentient) others.

It is about self-regulation.

Immediately, this casts the problem into cognitive terms:

  1. In order to balance goals, we need a faculty of reason.
  2. In order to understand the concepts of ‘self’ and ‘others’ we need a ‘theory of mind’.
  3. We feel that we can choose our wants but they are ultimately physiological i.e. neurological.
  4. (The issue of identifying sentience i.e. consciousness is not considered here.)

To be moral requires intelligence, a ‘theory of mind’ and maybe other things.

Iterated Knowings

What is ‘theory of mind’?

It is an ability to understand that others can know things differently from oneself. We must understand this if we are to balance their wants against ours.

lkmco.org

The Sally Anne test

The classic test for a theory of mind is the ‘Sally Anne Test’ which presents a story:

  • Sally has a marble which she puts her marble into a basket. She then goes out for a walk. During this time, Anne takes the marble from the basket and puts in to a box. Sally then comes back.

The question is then:

Where will Sally look for her marble?

If we think Sally will look for her marble in the box then we have no theory of mind.

This theory fits neatly into a scale of ‘Iterated Knowings’ set our originally by James Cargile in 1970 but prominently discussed by Daniel Dennett and Robin Dunbar.

The scale starts at the zero-eth level: some information (‘x’). Information relates something to something else. If ‘some input’, then ‘some output’. Information can be encapsulated by rules.

At the first level, we have beliefs (‘I know x’) which we recognise can be different from reality (‘x’).

At the second level, we understand theory of mind: ‘I know you know x’. Knowing it is possible for others to not know things, it is possible to deceive them: ‘I know that Sally will not know the marble is in the box’.

At the third level, there is ‘communicative intent’: ‘I know you know I know x’. I can communicate information to you and know that you have received it. I am able to understand that you can understand that you have been deceived by me – I can understand reputation.

At the fourth level, it is possible to understand roles and narrative: ‘I know you know I know you know x’ where ‘you’ are an author, for example. In the 1996 film production of ‘Hamlet’, Kenneth Branagh’s Hamlet kills Richard Briers’s Polonius. A failure to understand roles would mean that we would think that Branagh has killed Briers.

At the fifth level, there is an awareness of roles and narratives that are distinct from the role or narrative. There is an awareness that others have their own narratives that are different from one’s own, even though the experiences are similar – there can be other cultures, myths, religions and worldviews. Many adults do not attain this level.

At each level, there is an awareness of the phenomenon at the lower level that is distinct from the phenomenon itself. It is possible to understand sentences at seemingly higher levels, for example:

“I know that Shakespeare wants us to believe that Iago wants Othello to believe that Desdemona loves Cassio”

but this is still really only a fourth-level phenomenon – that of understanding roles.

These levels of iterated knowings are also referred to as orders of intentionality.

Cognitive Theories of Moral Development

In order to:

balance the wants of oneself with those of others

we need rational intelligence and a theory of mind as already stated. But we also need an ability to work out what the ‘other’ wants. Judging from appearance, this requires ‘social cognition’ – an ability to read faces and body language, to understand what the other is feeling.

But there is another ingredient required for us to actually act morally – for us to care about the other.

By my definition, a moral agent tries to understand what the other wants – tries to apply the ‘Platinum Rule’:

‘Do unto others as they would want to be done by’

as opposed to the more common baseline of moral behaviour, the ‘Golden Rule’:

‘Do unto others as you would want to be done by.’

Having said that care is required, it is possible to manage without it by upping the order of intentionality.

A third-order agent understands reputation. It may not care about the other but it (sociopathically) balances its wants against the other to maintain a reputation which helps itself in the long term.

It is also possible to manage without social cognition through communication. A third-order agent may not be able to understand what you want but it may be able to ask you.

And finally, it is also possible to manage without either social cognition or a caring nature – by relying on communication and reputation.

We have here the basis of the theory of moral development in which there is increasing:

  • intelligence
  • level of intentionality
  • social cognition.
  • and care

and in which we are better with more of each characteristic. We could say that these are the cognitive moral virtues: intelligence, intentionality, social cognition and care!

Note that fifth-order intentionality is a level which many adults do not attain. All too often, moral conflict arises not because the others’ opinion differs from one’s own but because of an inability to understand that the other has a different worldview into which they fit knowledge. As Jacques Rancière has said:

“Disagreement is not the conflict between one who says white and another who says black. Rather, it is the conflict between one who says white and another who also says white but does not understand the same thing by it.”

A rather more famous theory of moral development based upon a theory of cognitive development is that of Lawrence Kohlberg’s, based upon Jean Piaget’s. It too has a 6-point scale, with the sixth being one which many do not attain:

  1. Infantile obedience: ‘How can I avoid punishment?’
  2. Childish self-interest: ‘What’s in it for me?’
  3. Adolescent group conformity (norms)
  4. Adult conformity to law and order
  5. Social contract / human rights
  6. Universal ethical principles / conscience

I will say no more about this other than to point out some similarity between my ‘Iterated Knowings’ theory and Kohlberg’s: the former’s characteristics of rules, deception, reputation and roles map approximately onto Kohlberg’s first 4 levels.

Up Close and Personal

Returning to Joshua Greene’s ‘From neural ‘is’ to moral ‘ought’’ paper, a significant part is devoted to two scenarios considered by Peter Unger:

Firstly:

You receive a letter asking for a donation of $200 from an international aid charity in order to save a number of lives. Should you make this donation?

Joshua Greene: ‘From neural ‘is’ to moral ‘ought’: what are the moral implications of neuroscientific moral psychology?’ – Nature reviews Neuroscience 4(10) pp.846-9 (2003)

The aid agency letter

Secondly:

You are driving in your car when you see a hitchhiker by the roadside bleeding badly. Should you take him to hospital even though this means his blood will ruin the leather upholstery of your car which will cost $200 to repair?

Joshua Greene: ‘From neural ‘is’ to moral ‘ought’: what are the moral implications of neuroscientific moral psychology?’ – Nature reviews Neuroscience 4(10) pp.846-9 (2003)

Should you take the injured hitchhiker to hospital?

The vast majority of us would not look badly upon anyone who did not donate the $200 but would consider the person who left the hitchhiker behind to die to be a moral monster.

But given $200 and a choice between the two scenarios, a Utilitarian should help the far-flung family rather than the hitch-hiker.

Greene says that we think there is

 ‘some good reason’

why our moral intuitions favour action when the choice is

‘up close and personal’

rather than far removed. He points out that the moral philosopher Peter Singer  would maintain that there is simply no good reason why we should.

I have proposed social cognition and caring for others as some of the essential characteristics of morality. These suggest our preference for the ‘up close and personal’. We care because we see.

I speculate that our caring stems from our need to identify between what is ourselves and what is not. In the rubber hand illusion, our eyes deceive us into thinking a rubber hand is actually our hand; momentarily we feel pain when the hand is hit before we work out that our sense of touch is not agreeing with our eyes. We unconsciously mimic others – when seeing someone with crossed arms, we may cross our own to reduce the discrepancy between our sense of proprioception and what we see. This is a weak connection (yawn contagion is much stronger – we cannot help ourselves). This makes a connection between seeing others in pain and having a deep sense of where it would hurt on ourselves. Again, we wince at the sight of others being hurt but this soon disappears as the recognition that ‘it is not me’ takes over. But at least there is this initial feeling of the pain at the sight of others in pain – the origins of empathy. (Some people claim  they literally feel the pain of others – that this sense does not quickly dissipate. This condition is called ‘mirror-touch synaesthesia’.)

Oxytocin and Vasopressin

http://www.nature.com/news/gene-switches-make-prairie-voles-fall-in-love-1.13112

Pair-bonded prairie voles

So I have provided a tentative a psychology story of the origins of care. But what does neuroscience tells us about this? In her 2011 book ‘Braintrust’ (sub-titled ‘What neuroscience tells us about morality’), Patricia Smith Churchland highlights some research in behavioral neurobiology into the very different behaviour between two very similar creatures. Prairie voles pair-bond for life whereas Montane voles are solitary. (The most prominent researchers on this topic are Thomas Insel (1992-), Sue Carter (1993-), Zuoxin Wang (1996-) and Larry Young (1999-).)

One physical difference is in two closely-located parts of the brain, the ventral pallidum  and the nucleus accumbens.

Compared with montane voles, prairie voles have much higher densities of neuromodulator receptors for Oxytocin and Vasopressin in these areas.

Larry Young

The Prairie vole brain. NAcc: Nucleus Accumbens, VP: Ventral Pallidum, PFC: Pre-Frontal Cortex, OB: Olfactory Bulb

What does this ‘higher density of neurotransmitters receptors’ mean? Well, neuromodulators are molecules that bind onto receptors on a neuron and control the firing of that neuron. A larger number of receptors on neurons for a particular neurotransmitter will increase the chance of that neuron firing when in the presence of such neurotransmitters. But a higher number of neurotransmitters will achieve the same result.

The most effective way of getting extra Oxytocin into the brain is via a nasal spray. Conversely, if an antagonistic drug is sprayed instead, these molecules with lock onto the receptors but they are the ‘wrong keys’ – the do not release proteins within the neuron that modulate the firing of the neuron. This effectively reduces the number of receptors. Put very simply, by increasing or decreasing the effects of these neuromodulators, researchers have found they can make Prairie voles behave more like Montane voles and vice versa.

This is an extremely simplistic view; the qualifying details do not matter here. The point is that we can experimentally control behaviour associated with these neurotransmitters – which is?…

Oxytocin and Vasopressin are primarily associated with reproduction in mammals including arousal, contractions and lactation. The ‘cousins’ of Oxytocin and Vasopressin have performed equivalent functions in other creatures for hundreds of millions of years.

From this reproduction starting point, these neurotransmitters have evolved to control maternal care for offspring, pair-bonding and allo-parenting. Allo-parenting is maternal care for young that is not by its parents, typically the ‘aunties’ of orphans. There is not any (magical) genetic mechanism for allo-parenting. It is just a result of seeing young physically close by needing care – from them being ‘up close and personal’.

And from human tests, it has been shown that they improve social cognition (at the expense of other learning) – the memory of faces, the recognition of fear and the establishment of empathy and trust.

This improved social cognition has led to interest from the autism community. Autism is sometimes thought of as lacking a ‘theory of mind’ but this is extreme. It is better characterized as having impaired social cognition. Tests with Oxytocin on autistic people show an improvement in eye gaze and the interpretation of emotions and a reduction in repetitive behaviour.

Oxytocin has also been connected with generosity. In the ‘Ultimatum game’ psychological test, the subject of the experiment proposes a split of money potentially given to them with another. The other person decides whether to accept the deal or to punish unfair offers so that neither party get anything; deals generally get accepted where the subject offers more than 30% of the stake. Oxytocin nasal sprays increases the proportion offered.

This all sounds fantastic. We just need everyone to spray some Oxytocin up our nostrils every morning and we will become more caring and considerate of others.

https://www.pinterest.com/pin/323062973241334479/

Oxytocin molecular structure

Paul Zak, an early researcher into the trust-related effects of Oxytocin, has zealously promoted the idea of the ‘Moral Molecule’ (as his book is called). But it has also been criticized as the ‘Hype Molecule’, particularly as more research was done which revealed some negative aspects of the neurotransmitter and its cousin.

Vasopressin has a conciliatory ‘tend-and-befriend’ effect on females but it will reduce ‘fight or flight’ anxiety in men and make them more aggressive in defence of the mate and of the young.

This may be the origin for behaviour that has been described as ethnocentric (even as ‘xenophobic’). For example, an early experiment based around Dutch, German and Muslim names found that German and Muslim names were less positively received when the Dutch subjects had been given Oxytocin.

Since we are considering morality as a balancing act, Oxytocin could be characterized as tilting the balance from ‘me’ more towards ‘you’ but also from ‘them’ towards ‘us’.

This and many practical matters means that we won’t be having our daily nasal sprays just yet.

Generosity

Piff et al: 'Higher social class predicts increased unethical behavior'

Another BMW driver fails to stop for a pedestrian.

So far, I have characterized morality as balancing the wants of oneself with those of others and looked at how Oxytocin tips the balance towards others and can increase generosity.

Paul Piff (Berkeley) has devised various experiments to judge the generosity of the affluent. One test considered car type as an indicator of wealth and monitored which cars stopped at pedestrian crossings. High status cars were less likely to stop than other makes.

Another indicator of generosity is charitable giving. Various studies show that the most generous regions of a country are not the most affluent. In the USA, Utah and the Bible Belt stand out for higher generosity. Research indicates that it is not religious beliefs that are important here but regular attendance at services. These services involve moral sermons, donations and meeting familiar people.

downtrend.com

Charitable giving in USA

Other factors that improve charitable giving include

  • being with a partner (‘pair-bonded’),
  • living in a rural community and
  • being less affluent (as suggested by Piff’s research).

There is a common theme here: being ‘up close and personal’ in meaningful relationships with others:

  • There is anonymity in an urban environment.
  • We are insulated from others in a car.

I have characterized morality as balancing the wants of oneself with those of others. Through psychology, we can understand why our preference for the ‘up close and personal’ has evolved. But this tells us nothing about how we should behave and this has nothing to do with neuroscience. But the neuroscience of Oxytocin & Vasopressin is one avenue towards a physical understanding of care and how it constrains us and how we might be able to control it in the future.

Reason vs Emotional Intuition

So, we emotionally feel a preference for the ‘up close and personal’ but our rational inclination is that this should not be. Just as there is the balance between self and others, there is a balance between emotion and reason – the two halves of psychology’s ‘dual process theory’. As described by Daniel Kahneman  in ‘Thinking, fast and slow’, ‘System 1’ is the fast, unconscious, emotional lower level and ‘System 2’ is the slower, conscious, reasoning higher level.

This split between rational and emotional decision-making corroborates well with Joshua Greene’s experiments in which his subjects answered trolleyology questions whilst in an fMRI scanner. Making decisions quickly was correlated with activity in the Amygdala and the Ventro-Medial Pre-Frontal Cortex (VM-PFC) whereas questions that caused longer deliberation was correlated with activity in the Dorso-lateral Pre-Frontal Cortex (DL-PFC). Both the Amygdala and the VM-PFC are associated with social decision-making and the regulation of emotion. In contrast, the DL-PFC is associated with ‘executive functions’, planning and abstract reasoning. We can say that the former regions are associated with ‘now’ and the latter region is associated with ‘later’.

The classic (Benthamite) form of Utilitarianism is ‘Act Utilitarianism’ in which an individual is supposed to determine the act which leads to the ‘the greatest happiness of the greatest number’. Such a determination is of course impossible but even practical deliberation to produce a reasonably good guess can often be too slow.

This has led to the ‘Rule Utilitarian’ approach of ‘pre-calculating’ the best response to typical situations to form rules. Then it is just a case of selecting the most applicable rule in a moral situation and applying that rule. That allows quite fast responses but these are often poor responses in retrospect.

Now, R. M. Hare proposed a ‘Two-Level Utilitarianism’ which is a synthesis of both Act- and Rule- Utilitarianism: apply the ‘intuitive’ rules but in the infrequent cases when there is a reduced confidence in the appropriate rules (such as more than one rule seeming to apply and those rules are in conflict), move on to ‘critical’ deliberation of the best action.

This looks a lot like ‘dual process theory’!

The Predictive Mind

We have a reasonable understanding of what goes on in the brain at the very low level of neurons, and we know what it is like at a very high level in the brain because we experience it from the inside every single day. But how we get from the small scale to the large scale is a rather difficult proposition!

‘Dual process theory’ is a crude but useful model upon which we can build psychological explanations but we now have a very promising theory of the brain that I have frequently mentioned elsewhere. Its most complete formulation is Karl Friston’s strangely-named ‘Variational Free Energy’ theory from as recently as 2005 but its pedigree can be traced back through Richard Gregory, William James to Hermann von Helmholtz in 1866, before the foundation of psychology as a discipline.

For the context here, I will not go over the details of this theory but the most basic behaviour of the brain is as a ‘hierarchy of predictors’, my preferred term for the theory that Jacob Hohwy calls ‘the Predictive Mind’, Andy Clark calls ‘predictive processing’ and yet others call ‘the Bayesian Brain’. All levels concurrently try to predict what is happening at the level below and provide prediction errors upwards on its confidence about its predictions. We then view the brain as multiple-level (more than 2) with lower levels dealing with the fast ‘small scale’ moving upwards to longer-term ‘larger scale’ levels. Psychology’s conceptual Dual Process theory becomes a subset of neuroscience’s physically-based Predictive Mind theory.

downtrend.com

Felleman and Van Essen’s famous ‘wiring diagram’, showing the hierarchical organization from low levels (bottom) up to high levels (top)

This can inspire us to imagine a ‘multi-level Utilitarian’ moral theory which is superior to Hare’s ‘2-level Utilitarianism’. Noting that the ‘hierarchy of predictors’ operates:

  • continuously,
  • concurrently, and
  • dynamically

…we can produce a better moral theory…

Moral theories generally consider how to make a single decision based upon a particular moral situation, without revisiting it later.

We deal with the easy moral issues quickly, going back to the more complex that require more deliberation. This better consideration (prediction) of the consequences of possible actions may also be influenced by a change in circumstance since previously considered. And this change may be as a result of our (lower-level) actions previously made.

Eventually, the window of possible action upon a moral problem will pass and we can return to the ‘larger-scale’ problems which still linger. (When we have solved the injustices of inequality, poverty and violence in the Middle East, and have no more immediate problems to deliberate over, we can take a holiday.)

It automatically and dynamically determines the appropriate level of consideration for every problem we encounter.

I think this is a sensible moral theory. It is an intelligent theory. This is true almost by definition, because this Predictive Mind mechanism is how evolution has produced intelligence – an embodied general intelligence acting in a changing environment.

Georgia State University

Neuro-ethics

I somewhat provocatively point out an irony that:

  • A moral philosopher sits in his armchair, proudly proposing a moral theory that is detached from the world of ‘is’.
  • Inside his head is a bunch of neurons wired together in a particular way to produce a particular way of thinking.
  • But his moral theory is an inferior description of the way his brain thinks!

So we end up with a cognitive theory in which moral problem solving isn’t really any different from any other type of problem solving! This is an Ethical Naturalist point of view.

From Dualism to Physicalism

For ordinary people of our grandparents’ generation, the dominant philosophical belief was of the separation of mind and matter. We had free will – the mind was free to make choices, unconstrained by the physical world.

In contrast, our grandchildrens’ generation will have grown up in an environment where the idea of the brain defining behaviour within what is essentially a deterministic world is commonplace. The concept of ‘free will’ is unlikely to survive this transition of worldviews intact and unmodified.

Now, there is no single fact of neuroscience that makes any Dualist suddenly switch over to being a Physicalist. People don’t change worldviews just like that. But the accumulation of coherent neuroscientific information over many years does cause a shift. As Greene says

“Neuroscience makes it even harder to be a dualist”

So, though we can always invoke the is/ought distinction to ensure that neuroscience and morality are disconnected, its influence on our metaphysics indirectly affects our concepts of morality.

With a Dualist worldview, we can say that if it is wrong for person A to do something in some precise situation, then it is also be wrong for person B to do that in that same precise situation. A and B can be substituted. It is the act that is moral.

However, with a Physicalist worldview, we have to accept that the physical state of an agent’s brain plays a part.

Psychology Fun!

Trajectory of the tamping iron through Phineas Gage’s head

Consider the two classic case studies of Phineas Gage and Charles Whitman:

  • Whilst working on the railroads in 1848, an explosion blew an iron rod straight through Phineas Gage’s head, up under a cheekbone and out through his forehead, leaving a gaping hole in his brain. He miraculously survived but his personality was changed from that of a responsible foreman beforehand to an irreverent, drunken brawler.
  • Charles Whitman personally fought his “unusual and irrational thoughts” and had sought help from doctors to no avail. Eventually he could hold them back no more whereupon he went on a killing spree killing 16. Beforehand, he had written “After my death I wish that an autopsy would be performed on me to see if there is any physical disorder.” The autopsy revealed a brain tumour.

It is not surprising to us that substantial changes to the physical brain cause it to behave substantially differently.

We can no longer say that it is equally blameworthy for persons A and B to do something in exactly the same situation because their brains are different.

Were I to find myself standing on the observation deck of the University of Texas tower with a rifle in my hand, I would not start shooting people at random as Whitman did. A major reason for this is that I don’t have the brain tumour he had. But if I were to have a brain like Whitman’s, then I would behave as he did! In shifting towards a physicalist position, we must move from thinking of acts being good or bad towards thinking of actors (the brains thereof) being good or bad. We move from Deontology or Consequentialism towards Virtue Ethics.

There is the concept of ‘flourishing’ within Virtue Ethics. We try to ‘grow’ people so that they are habitually good and fulfil their potential. To do this, we must design our environment so that they ‘grow’ well.

And when we talk of ‘bad brains’, we don’t blame Whitman for his behaviour. In fact, we feel sorry for him. We might actively strive to avoid such brains (by providing an environment in which doctors take notice, or take brain scans, when people complain to them about uncontrollable urges, for example). ‘Blame’ and ‘retribution’ no longer make sense. As others have said:

  • ‘with determinism there is not blame, and, with not blame, there should be no retribution and punishment’ (Mike Gazzaniga)
  • ‘Blameworthiness should be removed from the legal argot’  (David Eagleman)
  • `We foresee, and recommend, a shift away from punishment aimed at retribution in favour of a more progressive, consequentialist approach to the criminal law’ (Joshua Greene and Jonathan Cohen)

Summary

I have defined the essence of morality as being the balancing the wants of oneself with those of others:

  • As well involving reason, this means getting into someone else’s mind (rather than just getting into their shoes). On a scale of ‘iterated knowings’, we need at least a ‘theory of mind’. I have set out a theory of the moral development of a person in which there is progression up the scale of iterated knowings up to having a desire and ability to understand another’s entire epistemological framework, which is something relatively few people reach.
  • Whilst we can act morally based on the selfish maintenance of reputationand a rather mechanical ability to communicate, it is better if we also have ‘social cognition’ (an ability to see how another feels and read what they want, more directly than verbal communication) and to actually care about the other.
  • The origins of both social cognition and care lie in our basic cognitive need to be able to distinguish between self and non-self. In doing this, we can unconsciously relate the feelings of others back onto ourselves when we seethem, allowing us to empathize with them.

We can make a link from the actions of the neurotransmitters Oxytocin & Vasopressin up through social cognition and empathy to the shifting of the balance towards others in being more considerate and generous to others. A common factor in this behaviour is proximity – an unconscious emotional preference for those we know and see around us. This provides us with ‘some good reason’ why biasing towards the ‘up close and personal’ feels intuitively right even though we logically think there should be no bias.

The moral philosopher R. M. Hare proposes a sensible balancing of intuition and logic. But this ‘dual process’ psychology type of moral theory is just an inferior form of the more general neuroscientific theory of the ‘predictive mind’, advocated by Karl Friston, Jacob Hohwy, Andy Clark and others. The latter inspires an improved moral theory that:

  • Generalizes to advocating more detailed slower deliberation for more complex moral dilemmas, rather than just offering a two-stop shop.
  • Relates moral thinking to generalintelligent thinking of an agent embodied within an environment. This is an ethical naturalist position: moral problem solving is not distinct from other types of problem solving.
  • Improves the theory in being dynamic. Moral decisions are not ‘fire and forget’. We should continue to deliberate on our more complex moral problems after we have made a decision and moved on to subsequent moral situations, particularly as circumstances change or we see the results of our actions.

So ‘is’ might inspire ‘ought’ but it still does not imply it. Not directly, anyway.

Neuroscientific knowledge pushes society further away from dualism, towards physicalism in which the moral actor is embedded within its own environment and hence physically determined in the same way. Our moral framework must then shift towards a Virtue Ethics position of trying to cultivate better moral actors rather than the Deontological or Consequentialist focus on correct moral acts.

This forces us to re-evaluate blame and praise, shifting us away from retribution. We must actively cultivate a society in which people can morally ‘flourish’.

Our new-found knowledge in neuroscience forces us recognize that our neural construction constrains but also increasingly allow us to overcome it – but at our peril.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , | Leave a comment

Backpropagation

 

The Fall of Artificial Neural Networks: XOR gates

In the 1969 book ‘Perceptrons: an introduction to computational geometry’, Marvin Minsky and Seymour Papert demonstrated that single-layer Artificial Neural Networks could not even implement an XOR (‘exclusive or’) logical function. This was a big disappointment. In the history of Artificial Neural Networks, this is seen as a significant contributor to the ‘AI winter’ of reduced interest (and hence also of reduced funding) in them.

The Rise of Artificial Neural Networks: Back-Propagation

The backpropagation algorithm effectively solved the exclusive-or problem in that:

  • To implement XORs required one or more hidden layers in the network (between the inputs and the output layer).
  • The backpropagation algorithm enabled multi-layer networks to be trained.

This contributed to a resurgence of interest in Artificial Neural Networks. Backpropagation was invented independently a number of times, most notably by Paul Werbos (1974), Rumelhart, Hinton and Williams (1986) and Yann LeCun (1987).

Watch Victor Lavrenko’s Youtube for more technical details on the XOR problem…

The Backpropagation Code

The purpose of this post is to provide example code for the backpropagation algorithm and demonstrate that it can be solve the XOR problem.

As noted elsewhere:

  • The code here is unapologetically ‘unpythonic’.
  • If you do not have a Python application installed, you can open the online https://repl.it/languages/python3interpreter in a new window and use that. All code fragments are combined at the end of this piece into a single listing that can be copy-pasted into the interpreter.

As well as being unpythonic, the code here differs from typical implementations in that it can handle more than 2 layers. The code can be configured for any full-connected feed-forward network of any number of layers greater than 1 and any number of neurons for each layer.

Some Housekeeping

Firstly, let’s sort out some housekeeping. Here are 2 functions so that:

  • We can pause the run to see things before they disappear off the top of the screen. (We can stop if we type ‘n’)
  • We can control how much information gets printed out by varying a ‘verbosity’ variable value.
def prompted_pause(s):
    import sys
    ok = input(s)
    if ok=="n" or ok=="N":
        print("Stopping here")
        sys.exit()

verbosity = 1

def print_info(v,s, end="DEFAULT"):
    if verbosity >= v:
        if end == "DEFAULT":
            print(s) # With newline
        elif end == "":
            print(s, end="") # Without newline
        else:
            print(s, end)

Where there is a call to print_info(3, “Blah”), the 3 means that the message “Blah” will only get printed out if the verbosity level is 3 or more. Across the whole program below, verbosity levels are such that:

  • If verbosity is set to 1, it will only print out the minimal.
  • If verbosity is set to 2, it will only print out more.
  • If verbosity is set to 3, it will only print out the minimal.

The Application

The neural network will be trained to behave like a ‘full adder’ circuit. This is a common building block in digital electronic circuits. It adds up three 1-bit numbers to produces a 2-bit output number (range 0…3). The ‘CI’ and ‘CO’ signals are the carry-in and carry-out respectively. As an example application, by chaining 32 of these circuits together (connecting the CO output of one full adder to the CI input of another) we get a circuit that adds two 32-bit numbers together.

http://cs.smith.edu/dftwiki/index.php/Xilinx_ISE_Four-Bit_Adder_in_Verilog

Full Adder circuit

The full adder has been chosen because:

  • It contains at least one XOR gate (it has 2), to demonstrate that a multilayer network can learn this non-linearly-separable behaviour, and
  • It has more than one output (it has 2), to provide Python code that is more generalised.

This is not a good example of what a neural network could be used for. Here, there are only 8 possible combinations of inputs. Any 3-in 2-out (combinatorial) function can be defined with just 16 bits of information.

The training set is the same as the test set and just defines the LINK truth table of a full adder. After training, the network will be tested against all the 8 possible input combinations. A more appropriate application is where then number of possible input combinations is much greater than the number of vectors it can be trained and tested against.

# Full Adder example:
Training_Set = [
    # A  B  CI   S  CO
    [[0, 0, 0], [0, 0]],
    [[0, 0, 1], [0, 1]],
    [[0, 1, 0], [0, 1]],
    [[0, 1, 1], [1, 0]],
    [[1, 0, 0], [0, 1]],
    [[1, 0, 1], [1, 0]],
    [[1, 1, 0], [1, 0]],
    [[1, 1, 1], [1, 1]]
]
# Bit assignments...
SUM   = 1
CARRY = 0

For example, there are 2 bits set to 1 in the input [0, 1, 1] so the sum is 2 which is binary ‘10’, so the output vector is [1, 0].

A Neuron

We now define the code for a single neuron. For each neuron, we need to have:

  • A list of weights, one for each neuron input (from the layer below).
  • A bias – this behaves in the same way as a weight except the input is a constant ‘1’.
  • A gradient ∂E/∂z. This is used for training the network.

The FeedForward_Neuron function calculates a neuron’s output y, based on its inputs x, and its bias b. A sum-of-products is formed:

z = Σwi.xi + b

and the output y is derived from that using the Sigmoid function:

y = σ(z)

The Sigmoid function provides the non-linearity that allows the network to learn non-linear relationships such as the ‘XOR’ function (compare this with the simple, linear network in ‘Fish, Chips, Ketchup’).

import math

class Neuron:
    def __init__(self, bias):
        self.B = bias
        self.W = []
        self.dEdz = 0.0

""" The logistic function """
def Sigmoid(z):
    return 1 / (1 + math.exp(-z))

""" Generate neuron output from inputs"""
def FeedForward_Neuron(inputs, bias, weights):
    z = bias
    for i in range(len(inputs)):
        z += inputs[i] * weights[i]
    return Sigmoid(z)

We start with the list of weights being empty; we will fill these in as we build up the network of neurons.

As in previous posts, the code is not Pythonic. Here, there are no ‘def’ functions (‘methods’) defined within any class. All functions are outside which means they require all the information used to be passed as parameters to the function. This is to make it clear what the dependencies are. Python examples of back-propagation available elsewhere on the interweb will used classes properly and use vector operations where I have used for loops.

A Layer of Neurons

A neuronal layer is then just an array of neurons. The biases and weights of the neurons all get initialized to random values before any training is done.

Updating a neuronal layer is just updating each neuron in turn.

import random

class NeuronLayer:
    def __init__(self, num_neurons, num_inputs):
        self.Neuron = [] # Build up a list of neurons
        for n in range(0, num_neurons):
            print_info(3,"  Neuron[%d]" % (n))
            # Add a neuron to the layer, with a random bias
            self.Neuron.append(Neuron(random.random()))
            print_info(3,"    Bias = %.3f" % self.Neuron[n].B)
            # Give it random weights
            for i in range(0, num_inputs):
                self.Neuron[n].W.append(random.random()) # Initialized randomly
                print_info(3,"    Weight[%d] = %.3f" % (i, self.Neuron[n].W[i]))

def FeedForward_Layer(inputs, layer):
    outputs = []
    for neuron in layer.Neuron:
        neuron.X = inputs
        y = FeedForward_Neuron(neuron.X, neuron.B, neuron.W)
        neuron.Y = y
        outputs.append(y)
    return outputs

The Neural Network

A complete multilayer network can then be created, with:

  • a particular number of inputs and outputs,
  • a particular number of layers
  • a particular number of neurons in each layer.

And we can use this network by feeding it input signals which propagate up through the layers to then return the outputs.

The application calls for 3 inputs and 2 outputs. Typically, the number of layers is 2 but you can configure for more than this (for so-called ‘deep’ networks). Here, we configure the network as follows:

num_inputs   = 3
num_outputs  = 2
num_neurons_in_layer = [4, num_outputs] # num. neurons in each layer from inputs up to output
# The num. neurons in the top (output) layer is the same as the num. output ports
output_layer = len(num_neurons_in_layer)-1 # Layer number

The num_neurons_in_layer variable defines the number of layers as well as the number of neurons in each layer. You can experiment with the number of neurons.

To actually create the network, we use:

Net = []
for L in range(len(num_neurons_in_layer)):
    if L==0: # Input layer
        i = num_inputs
    else:
        i = num_neurons_in_layer[L-1] # (Fully connected to lower layer)
    print_info(1, "Create layer %d with %d neurons and %d inputs" % (L, num_neurons_in_layer[L], i))
    Net.append(NeuronLayer(num_neurons = num_neurons_in_layer[L], num_inputs = i))

Feed-Forward

For actual usage, we just apply the inputs then update each layer in turn from the input layer forward to the output layer.

def FeedForward_Net(inputs, Net):
    for L in range(len(Net)): # Up through all layers
        print_info(3, "  Feed-Forward layer Net[%d]" % L)
        if L==0:
            y = FeedForward_Layer(inputs, Net[L])
        else:
            y = FeedForward_Layer(y, Net[L])
    return y

Testing

In the trivial example here, we test the network by applying all input combinations, as defined in the truth table training set.

def Test_Network(Net, Training_Set):
    print("Test Network:")
    for i in range(8):
        Training_Input, Training_Output = Training_Set[i]
        print("  %d+%d+%d" % (Training_Input[0], Training_Input[1], Training_Input[2]), end="")
        result = FeedForward_Net(Training_Input, Net)
        rounded_result = [round(result[0]), round(result[1])]
        print(" = %d%d"       % (rounded_result[CARRY], rounded_result[SUM]), end="")
        print(" (%.3f, %.3f)" % (        result[CARRY],         result[SUM]), end="")
        if rounded_result == Training_Output:
            print(" correct")
        else:
            print(" bad")

Not surprisingly, the network does not behave as desired before it is trained. There will be just a 50:50 chance that the output will be correct.

Test_Network(Net, Training_Set)

Training with Back-Propagation

Now we want to train the network to behave according to the application (in this case, to behave like a full adder circuit). We train using the ‘back-propagation’ algorithm. This involves:

  1. Applying the inputs for a particular training set and propagate these forward to produce the outputs.
  2. Seeing how the outputs differ from what you want them to be (what the training set outputs say they should be). The mismatch is called the ‘error’ E.
  3. For each neuron in the output layer, working out what change to the signal z (remember: z = Σwi.xi + b and y = σ(z)) would be needed to make the output correct (i.e. make E=0). This is ∂E/∂z, the ‘partial derivative’ of the error with respect to z.
  4. For each layer working from that output layer back to the input layer, repeat the above operation for each neuron. Setting ∂E/∂z will require using the weights and ∂E/∂z values of the neurons of the higher layers. (We are propagating the error derivatives back through the layers.)
  5. We update the weights of each neuron by deriving a partial derivative of the error with respect to the weight ∂E/∂w (derived from the ∂E/∂z values already calculated). We adjust each weight by a small fraction of this change, determined by the ‘learning rate‘) ε so that a weight w is changed to become w+ ε.∂E/∂w. We do the same with the biases ∂E/∂b.

We perform the above operations for each item in the training set in turn. Over many iterations, the weights converge on values that produce the desired behaviour (hopefully); this is called ‘gradient descent’.

As an example, consider how to modify the weight w12 that connects neuron n1 in layer 1 to a neuron n2 in layer 2 where this is in a 3-layer network. The error derivatives ∂E/∂z3 in the higher layer, 3, have already been calculated.

The error derivative for n2 is calculated using the ‘chain rule’, multiplying the derivatives of everything along the signal path:

∂E/∂z2 = ∂E/∂z3 . ∂z3/∂y2 . ∂y2/∂z2.

This result is used for both:

  1. Continuing to propagate back to lower layers (in this case, just layer 1), and
  2. Calculating the derivative for the weight adjustment:

∂E/∂w12 = ∂E/∂z2 . ∂z2/∂w12.

For more details, I recommend Matt Mazur’s…excellent worked-out example and Pythonic Python code.

def calc_dEdy(target, output):
    return -(target - output)

# Derivative of the sigmoid function:
def dydz(y):
    return y * (1 - y)

def calc_dEdz(target, output):
    # Are these vectors or scalars?
    return calc_dEdy(target, output) * dydz(output);

def dzdw(x):
    return x # z=sum(w[i].x[i]) therefore dz/dw[i]=x[i]

LEARNING_RATE = 0.5 # Often denoted by some Greek letter - often epsilon

# Uses 'online' learning, ie updating the weights after each training case
def Train_Net(Net, training_inputs, training_outputs):
    # 0. Feed-forward to generate outputs
    print_info(2,"  Feed-forward")
    FeedForward_Net(training_inputs, Net)

    for L in reversed(range(len(Net))): # Back through all layers
        print_info(2,"  Back-prop layer Net[%d]" % (L))
        if L == output_layer: # Output layer
            # 1. Back-propagation: Set Output layer neuron dEdz
            for o in range(len(Net[L].Neuron)): # For each output layer neuron
                print_info(3,"    Back-prop Net[%d].Neuron[%d]" % (L, o))
                print_info(3,"    %d" % (training_outputs[o]))
                print_info(3,"    calc_dEdz(%.3f, %.3f)" % (training_outputs[o], Net[L].Neuron[o].Y))
                Net[L].Neuron[o].dEdz = calc_dEdz(training_outputs[o], Net[L].Neuron[o].Y)
        else:
            # 2. Back-propagation: Set Hidden layer neuron dE/dz = Sum dE/dz * dz/dy = Sum dE/dz * wih
            for h in range(len(Net[L].Neuron)):
                print_info(3,"    Back-prop Net[%d].Neuron[%d]" % (L, h))
                dEdy = 0
                for output_neuron in range(len(Net[L+1].Neuron)):
                    dEdy += Net[L+1].Neuron[output_neuron].dEdz * Net[L+1].Neuron[output_neuron].W[h]
                Net[L].Neuron[h].dEdz = dEdy * dydz(Net[L].Neuron[h].Y)
    # 3. Update output layer neuron biases and weights: dE/dw = dE/dz * dz/dw
    for L in range(len(Net)): # Up through all layers
        print_info(2,"  Update weights in layer Net[%d]" % (L))
        for n in range(len(Net[L].Neuron)):
            dEdb = Net[L].Neuron[n].dEdz * 1.0
            # dE/db = dE/dz * dz/db; dz/db=1 (the bias is like a weight with a constant input of 1)
            Net[L].Neuron[n].B -= LEARNING_RATE * dEdb # db = epsilon * dE/db
            for w in range(len(Net[L].Neuron[n].W)):
                dEdw = Net[L].Neuron[n].dEdz * dzdw(Net[L].Neuron[n].X[w])
                Net[L].Neuron[n].W[w] -= LEARNING_RATE * dEdw # dw = epsilon * dE/dw

We train the network until it is ‘good enough’. For that, we need a measure of how good (or how bad) the network is performing whilst we are training. That measure is derived by the Total_Error function. In this simple example, there are only 8 possible combinations of inputs so, at each training round, a training vector is randomly selected from the 8.

""" For reporting progress (to see if it working, or how well it is learning)"""
def Total_Error(Net, training_sets):
    Etotal = 0
    """ Use the first 8 training vectors as the validation set """
    num_validation_vectors = 8 # There are only 8 vectors in the Full-Adder example
    for t in range(num_validation_vectors):
        training_inputs, training_outputs = training_sets[t]
        FeedForward_Net(training_inputs, Net)
        Etotal += 0.5*(training_outputs[0] - Net[output_layer].Neuron[0].Y)**2
        Etotal += 0.5*(training_outputs[1] - Net[output_layer].Neuron[1].Y)**2
    return Etotal

Etotal = 0.0
for i in range(0, 100000):
    Training_Input, Training_Output = random.choice(Training_Set)
    if i%100==99:
        print_info(1,"Training iteration %d" % i, end="")
        print_info(3,"  %d+%d+%d" % (Training_Input[0], Training_Input[1], Training_Input[2]), end="")
        print_info(3,"  =  %d%d" % (Training_Output[CARRY], Training_Output[SUM]), end="")
        print_info(1,"")
    Train_Net(Net, Training_Input, Training_Output)
    if i%100==99:
        Etotal = Total_Error(Net, Training_Set)
        print_info(1,"  Validation E = %.3f" % Etotal)
        if Etotal < 0.02:
            break

Testing the Trained Network

Then we test the network again to see how well it has been trained.

Test_Network(Net, Training_Set)

With the error threshold to stop training fixed at 0.02, you can experiment with changing the size and depth of the network and seeing how many training iterations it takes to get to that error threshold.

An example output is given below – the beginning and end at least…

#########################################
Create Neural Net
#########################################
Create layer 0 with 4 neurons and 3 inputs
Create layer 1 with 2 neurons and 4 inputs
#########################################
Testing
#########################################
Continue?y
Test Network:
  0+0+0 = 11 (0.834, 0.829) bad
  0+0+1 = 11 (0.864, 0.851) bad
  0+1+0 = 11 (0.877, 0.874) bad
  0+1+1 = 11 (0.893, 0.886) bad
  1+0+0 = 11 (0.857, 0.847) bad
  1+0+1 = 11 (0.880, 0.864) bad
  1+1+0 = 11 (0.890, 0.884) bad
  1+1+1 = 11 (0.901, 0.893) correct
#########################################
Training
#########################################
Continue?y
Training iteration 99
  Validation E = 1.972
Training iteration 199
  Validation E = 1.881
Training iteration 299
  Validation E = 2.123
…
Training iteration 10099
  Validation E = 0.020
Training iteration 10199
  Validation E = 0.020
Test Network:
  0+0+0 = 00 (0.014, 0.080) correct
  0+0+1 = 01 (0.029, 0.942) correct
  0+1+0 = 01 (0.028, 0.946) correct
  0+1+1 = 10 (0.972, 0.065) correct
  1+0+0 = 01 (0.029, 0.936) correct
  1+0+1 = 10 (0.980, 0.062) correct
  1+1+0 = 10 (0.976, 0.061) correct
  1+1+1 = 11 (0.997, 0.922) correct

All together

Piecing all this code together so we have a single file to run…

print("#########################################")
print("Reporting/control")
print("#########################################")

def prompted_pause(s):
    import sys
    ok = input(s)
    if ok=="n" or ok=="N":
        print("Stopping here")
        sys.exit()

verbosity = 1

def print_info(v,s, end="DEFAULT"):
    if verbosity >= v:
        if end == "DEFAULT":
            print(s) # With newline
        elif end == "":
            print(s, end="") # Without newline
        else:
            print(s, end)

"""
   #########################################
   Application: Full adder
   #########################################
"""

# Full Adder example:
Training_Set = [
    # A  B  CI   S  CO
    [[0, 0, 0], [0, 0]],
    [[0, 0, 1], [0, 1]],
    [[0, 1, 0], [0, 1]],
    [[0, 1, 1], [1, 0]],
    [[1, 0, 0], [0, 1]],
    [[1, 0, 1], [1, 0]],
    [[1, 1, 0], [1, 0]],
    [[1, 1, 1], [1, 1]]
]
# Bit assignments...
SUM   = 1
CARRY = 0

print("#########################################")
print("Create Neural Net")
print("#########################################")

import math

class Neuron:
    def __init__(self, bias):
        self.B = bias
        self.W = []
        self.dEdz = 0.0

""" The logistic function """
def Sigmoid(z):
    return 1 / (1 + math.exp(-z))

""" Generate neuron output from inputs"""
def FeedForward_Neuron(inputs, bias, weights):
    z = bias
    for i in range(len(inputs)):
        z += inputs[i] * weights[i]
    return Sigmoid(z)

import random

class NeuronLayer:
    def __init__(self, num_neurons, num_inputs):
        self.Neuron = [] # Build up a list of neurons
        for n in range(0, num_neurons):
            print_info(3,"  Neuron[%d]" % (n))
            # Add a neuron to the layer, with a random bias
            self.Neuron.append(Neuron(random.random()))
            print_info(3,"    Bias = %.3f" % self.Neuron[n].B)
            # Give it random weights
            for i in range(0, num_inputs):
                self.Neuron[n].W.append(random.random()) # Initialized randomly
                print_info(3,"    Weight[%d] = %.3f" % (i, self.Neuron[n].W[i]))

def FeedForward_Layer(inputs, layer):
    outputs = []
    for neuron in layer.Neuron:
        neuron.X = inputs
        y = FeedForward_Neuron(neuron.X, neuron.B, neuron.W)
        neuron.Y = y
        outputs.append(y)
    return outputs

"""
A complete multilayer network can then be created,
"""

# Configuration...
num_inputs   = 3
num_outputs  = 2
num_neurons_in_layer = [4, num_outputs] # num. neurons in each layer from inputs up to output
# The num. neurons in the top (output) layer is the same as the num. output ports
output_layer = len(num_neurons_in_layer)-1 # Layer number

Net = []
for L in range(len(num_neurons_in_layer)):
    if L==0: # Input layer
        i = num_inputs
    else:
        i = num_neurons_in_layer[L-1] # (Fully connected to lower layer)
    print_info(1, "Create layer %d with %d neurons and %d inputs" % (L, num_neurons_in_layer[L], i))
    Net.append(NeuronLayer(num_neurons = num_neurons_in_layer[L], num_inputs = i))

def FeedForward_Net(inputs, Net):
    for L in range(len(Net)): # Up through all layers
        print_info(3, "  Feed-Forward layer Net[%d]" % L)
        if L==0:
            y = FeedForward_Layer(inputs, Net[L])
        else:
            y = FeedForward_Layer(y, Net[L])
    return y

print("#########################################")
print("Testing")
print("#########################################")

prompted_pause("Continue?")
def Test_Network(Net, Training_Set):
    print("Test Network:")
    for i in range(8):
        Training_Input, Training_Output = Training_Set[i]
        print("  %d+%d+%d" % (Training_Input[0], Training_Input[1], Training_Input[2]), end="")
        result = FeedForward_Net(Training_Input, Net)
        rounded_result = [round(result[0]), round(result[1])]
        print(" = %d%d"       % (rounded_result[CARRY], rounded_result[SUM]), end="")
        print(" (%.3f, %.3f)" % (        result[CARRY],         result[SUM]), end="")
        if rounded_result == Training_Output:
            print(" correct")
        else:
            print(" bad")

Test_Network(Net, Training_Set)

print("#########################################")
print("Training")
print("#########################################")

prompted_pause("Continue?")

def calc_dEdy(target, output):
    return -(target - output)

# Derivative of the sigmoid function:
def dydz(y):
    return y * (1 - y)

def calc_dEdz(target, output):
    # Are these vectors or scalars?
    return calc_dEdy(target, output) * dydz(output);

def dzdw(x):
    return x # z=sum(w[i].x[i]) therefore dz/dw[i]=x[i]

LEARNING_RATE = 0.5 # Often denoted by some Greek letter - often epsilon

# Uses 'online' learning, ie updating the weights after each training case
def Train_Net(Net, training_inputs, training_outputs):
    # 0. Feed-forward to generate outputs
    print_info(2,"  Feed-forward")
    FeedForward_Net(training_inputs, Net)

    for L in reversed(range(len(Net))): # Back through all layers
        print_info(2,"  Back-prop layer Net[%d]" % (L))
        if L == output_layer: # Output layer
            # 1. Back-propagation: Set Output layer neuron dEdz
            for o in range(len(Net[L].Neuron)): # For each output layer neuron
                print_info(3,"    Back-prop Net[%d].Neuron[%d]" % (L, o))
                print_info(3,"    %d" % (training_outputs[o]))
                print_info(3,"    calc_dEdz(%.3f, %.3f)" % (training_outputs[o], Net[L].Neuron[o].Y))
                Net[L].Neuron[o].dEdz = calc_dEdz(training_outputs[o], Net[L].Neuron[o].Y)
        else:
            # 2. Back-propagation: Set Hidden layer neuron dE/dz = Sum dE/dz * dz/dy = Sum dE/dz * wih
            for h in range(len(Net[L].Neuron)):
                print_info(3,"    Back-prop Net[%d].Neuron[%d]" % (L, h))
                dEdy = 0
                for output_neuron in range(len(Net[L+1].Neuron)):
                    dEdy += Net[L+1].Neuron[output_neuron].dEdz * Net[L+1].Neuron[output_neuron].W[h]
                Net[L].Neuron[h].dEdz = dEdy * dydz(Net[L].Neuron[h].Y)
    # 3. Update output layer neuron biases and weights: dE/dw = dE/dz * dz/dw
    for L in range(len(Net)): # Up through all layers
        print_info(2,"  Update weights in layer Net[%d]" % (L))
        for n in range(len(Net[L].Neuron)):
            dEdb = Net[L].Neuron[n].dEdz * 1.0
            # dE/db = dE/dz * dz/db; dz/db=1 (the bias is like a weight with a constant input of 1)
            Net[L].Neuron[n].B -= LEARNING_RATE * dEdb # db = epsilon * dE/db
            for w in range(len(Net[L].Neuron[n].W)):
                dEdw = Net[L].Neuron[n].dEdz * dzdw(Net[L].Neuron[n].X[w])
                Net[L].Neuron[n].W[w] -= LEARNING_RATE * dEdw # dw = epsilon * dE/dw

""" For reporting progress (to see if it working, or how well it is learning)"""
def Total_Error(Net, training_sets):
    Etotal = 0
    """ Use the first 8 training vectors as the validation set """
    num_validation_vectors = 8 # There are only 8 vectors in the Full-Adder example
    for t in range(num_validation_vectors):
        training_inputs, training_outputs = training_sets[t]
        FeedForward_Net(training_inputs, Net)
        Etotal += 0.5*(training_outputs[0] - Net[output_layer].Neuron[0].Y)**2
        Etotal += 0.5*(training_outputs[1] - Net[output_layer].Neuron[1].Y)**2
    return Etotal

Etotal = 0.0
for i in range(0, 100000):
    Training_Input, Training_Output = random.choice(Training_Set)
    if i%100==99:
        print_info(1,"Training iteration %d" % i, end="")
        print_info(3,"  %d+%d+%d" % (Training_Input[0], Training_Input[1], Training_Input[2]), end="")
        print_info(3,"  =  %d%d" % (Training_Output[CARRY], Training_Output[SUM]), end="")
        print_info(1,"")
    Train_Net(Net, Training_Input, Training_Output)
    if i%100==99:
        Etotal = Total_Error(Net, Training_Set)
        print_info(1,"  Validation E = %.3f" % Etotal)
        if Etotal < 0.02:
            break

"""
See how it behaves now, after training.
"""
Test_Network(Net, Training_Set)
Posted in Uncategorized | Tagged , , , , , | Leave a comment

Firing and Wiring

Brains essentially are ‘just a bunch of neurons’  which are connected to one another by synapses. A neuron will ‘fire’ when there is enough activity (firing) on its synapses. The network learns by modifying the strengths of those synapses. When both sides of a synapse are active around the same time, the synapse will be strengthened. When they are out of sync, the synapse will weaken.

This is summarized by Donald Hebb’s  famous slogan:

‘neurons that fire together, wire together’

often continued as

‘and out of sync, fail to link.’

Artificial Neural Nets are inspired by the real Neural Nets that are our brains. Hopfield Networks were an early form of artificial neural network – one in which

‘neurons that fire together, wire together’

is the central concept.

Here I provide some Python code to demonstrate Hopfield Networks.

Unapologetically Unpythonic

As noted elsewhere, the code here is very ‘unpythonic’. It does not use library functions and vectorizing to make the code efficient and compact. It is written as a C programmer learning Python might write it, which highlights the underlying arithmetic operations and complexity within the nested for loops. Conversion to efficient Python code is ‘left as an exercise for the reader’.

Alternatively, you could just look at ‘code-affectionate’s posting that I gratefully acknowledge, which similarly introduces Hopfield Networks but with pythonic code.

An Online Python Interpreter

Another beginner’s approach to Python is to use an online interpreter rather than downloading and installing one.

Open https://repl.it/languages/python3 in a new window…

https://repl.it/languages/python3

The white region on the left hand side of the page is the ‘editor’ region where code can be written then run (click on ‘run’) with the output appearing in the ‘console’ region (black background) on the right hand side. Alternatively, code can be written directly into the console.

Running the ‘editor’ program resets everything in the console; any objects previously defined will be forgotten. So, where I introduce code below, it is easiest if you just copy and paste it at the end of the ‘editor’ code and then re-run the whole lot.

This interpreter is then a sandbox for you to play around in. You can make changes to the code or enter different commands into the console and see what happens.

MICR Application

We are going to train a tiny Hopfield network to recognize the digits 0…9 from an array of pixels where there is some noise affecting some of the pixels. This is like MICR (magnetic ink character recognition) where human-readable digits printed in magnetic ink on cheques (bank checks) were stylized such that they were also machine-readable.

E13B MICR font digits


The E13B MICR font digits for MICR (Magnetic Ink Character Recognition)

But here, to keep things simple, the character set is just built on a tiny 4 x 5 pixel array…

MICR-like characters in a tiny (4 x 5) array


MICR-like characters in a tiny (4 x 5) array

… and the resulting 20-neuron network will have a paltry learning ability which will demonstrate the limitations of Hopfield networks.

Here goes…

The digits are defined in Python as…

Num = {} # There's going to be an array of 10 digits

Num[0] = """
XXXX
XX.X
XX.X
XX.X
XXXX
"""

Num[1] = """
XX..
.X..
.X..
XXXX
XXXX
"""

Num[2] = """
XXXX
...X
XXXX
X...
XXXX
"""

Num[3] = """
XXX.
..X.
XXXX
..XX
XXXX
"""

Num[4] = """
X...
X...
X.XX
XXXX
..XX
"""

Num[5] = """
XXXX
X...
XXXX
...X
XXXX
"""

Num[6] = """
XX..
X...
XXXX
X..X
XXXX
"""

Num[7] = """
XXXX
..XX
.XX.
.XX.
.XX.
"""

Num[8] = """
XXXX
X..X
XXXX
X..X
XXXX
"""

Num[9] = """
XXXX
X..X
XXXX
..XX
..XX
"""

A function is used to convert those (easily human-discernable) 4 x 5 arrays into a 20-element list of plus and minus ones for the internal processing of the Hopfield network algorithm. (This pythonic code has been copied from ‘code-affectionate’)

import numpy
def Input_Pattern(pattern):
    return numpy.array([+1 if c=='X' else -1 for c in pattern.replace('\n','')])

digit = {}
for i in range(0, 10):
    digit[i]     = Input_Pattern(Num[i])

Typing ‘digit[1]’ into the console will show you how a ‘1’ is represented internally.

Another function converts that internal representation into a 20-bit number just for reporting purposes…

def State_Num(pattern):
    state_num = 0
    for x in range(0,20):
         if pattern[x]==1:
            state_num += (1 << x)          #print("x = %d; bit = %d; s = %d" % (x, pattern[x], state_num))     return state_num state_num = {} for i in range(0, 10):     state_num[i] = State_Num(digit[i])     print("Digit %2d state number 0x%x" % (i, state_num[i])) 

We are going to add random errors to the digits and see how well the network corrects them. That is, whether the network recognizes them as being one of the 10 particular digits upon which it has been trained.

 import copy import random def Add_Noise(pattern, num_errors):     # (We need to explicitly 'copy' because Python arrays are 'mutable'...)     noisy = copy.deepcopy(pattern)     if num_errors > 0:
        for i in range(0, num_errors):
            pixel = random.randint(0, 19) # Choose a pixel to twiddle
            noisy[pixel] = -noisy[pixel] # Change a -1 to +1 or vice versa
    return noisy
    # Note: It can choose the same pixel to twiddle more than once
    #       so the number of pixels changed may actually be less

And to help see what is going on, we are going to have a function to display patterns…

def Output_Pattern(pattern):
"""
Display a 4x5 digit array.
"""
for x in range(0,20):
if pattern[x]==1:
print("●", end="")
else:
print(" ", end="")
if x % 4 == 3 :
print("")
print("")

Putting these components together, we can see noisy patterns that we will use to test our Hopfield network…

for i in range(0, 10):
print("n = %d; s = 0x%5x" % (i, state_num[i]))
Output_Pattern(digit[i])</code>

print("A noisy digit 1 with 3 errors...")
Output_Pattern(Add_Noise(digit[1], 3))

Now onto the main event.

We have a 20-neuron network (just one neuron per pixel) and we train it with some digits. Each neuron is (‘synaptically’) connected to every other neuron with a weight.

At the presentation of each number, we just apply the Hebbian rule: we strengthen the weights between neurons that are simultaneously ‘on’ or simultaneously ‘off’ and weaken the weights when this is not true.

def Train_Net(training_size=10):
    weights = numpy.zeros((20,20)) # declare array. 20 pixels in a digit
    for i in range(training_size):
        for x in range(20): # Source neuron
            for y in range(20): # Destination neuron
                if x==y:
                    # Ignore the case where neuron x is going back to itself
                    weights[x,y] = 0
                else:
                    # Hebb's slogan: 'neurons that fire together wire together'.
                    weights[x,y] += (digit[i][x]*digit[i][y])/training_size
                    # Where 2 different neurons are the same (sign), increase the weight.
                    # Where 2 different neurons are different (sign), decrease the weight.
                    # The weight adjustment is averaged over all the training cases.
    return weights

training_size = 3 # just train on the digits 0, 1 and 2 initially
weights = Train_Net(training_size)

Whereas training was trivially simple, to ‘recall’ a stored ‘memory’ requires more effort. We inject an input pattern into the network and let it rattle around inside the network (updating due to the synchronous firing on neurons and dependent on the weights of the synapses between those neurons) until it has settled down…

def Recall_Net(weights, state, verbosity=0):
    for step in range(25): # 25 iterations before giving up
        prev_state_num = State_Num(state) # record to detect if changed later

        new_state = numpy.zeros(20) # temporary container for updated weights
        for neuron in range(0,20): # For each neuron
            # Add up the weighted inputs from all the other neurons
            for synapse in range(0, 20):
                # (When i=j the weight is zero, so this doesn't affect the result)
                new_state[neuron] += weights[neuron,synapse] * state[synapse]
        # Limit neuron states to either +1 or -1
        for neuron in range(0,20):
            if new_state[neuron] < 0:                 state[neuron] = -1             else:                 state[neuron] = 1         if verbosity >= 1:
            print("Recall_Net: step %d; state number 0x%5x" % (step, State_Num(state)))
        if verbosity >= 2:
            Output_Pattern(state)
        if State_Num(state) == prev_state_num: # no longer changing
            return state # finish early
    if verbosity >= 1:
        print("Recall_Net: non-convergence")
    return state

We now test this recall operation …

print("Recalling an error-free '1'...")
Recall_Net(weights, digit[1], verbosity=2)

And we now test this recalling when there is some added noise. In this example, the noise is added deterministically rather than randomly so that you can get the same results as me.

I use a ‘1’ digital but set all the pixels on the top row to +1…

●●●●
 ●
 ●
●●●●
●●●●

…and this does the recall of this character…

print("Recalling a '1' with errors...")
noisy_digit = Add_Noise(digit[1], 0)
noisy_digit[1]=1
noisy_digit[2]=1
noisy_digit[3]=1
Output_Pattern(noisy_digit)
Recall_Net(weights, noisy_digit, verbosity=2)

This shows the state of the network over successive iterations, until it has settled into a stable state.

Recall_Net2: step 0; state number 0xfbba3
●●
 ● ●
●● ●
●● ●
●●●●

Recall_Net2: step 1; state number 0xfbbaf
●●●●
 ● ●
●● ●
●● ●
●●●●

Recall_Net2: step 2; state number 0xfbbbf
●●●●
●● ●
●● ●
●● ●
●●●●

Recall_Net2: step 3; state number 0xfbbbf
●●●●
●● ●
●● ●
●● ●
●●●●

Unfortunately, it is the wrong stable state!

As an example of how this recall function can be expressed more pythonically

def Recall_Net_Pythonically(weights, patterns, steps=5):
    from numpy import vectorize, dot
    sgn = vectorize(lambda x: -1 if x<0 else +1)
    for _ in xrange(steps):
        patterns = sgn(dot(patterns,weights))
    return patterns

(This is not quite a fair comparison as it cannot output any debug information, controlled by the ‘verbosity’ flag.)

Wrapping training and recall into an ‘evaluation’ function allows us to test the network more easily…

def Evaluate_Net(training_size, errors, verbosity=0):
    # Training...
    weights = Train_Net(training_size)
    # Usage...
    successes = 0
    print("Tsize = %2d   "  % training_size, end="")
    print("   Error pixels = %2d    " % errors, end="")
    for i in range(training_size):
        noisy_digit = Add_Noise(digit[i], errors)
        recalled_digit = Recall_Net(weights, Add_Noise(digit[i], errors), verbosity)
        if State_Num(digit[i]) == State_Num(recalled_digit):
            successes += 1
            if verbosity == 0: print("Y", end="")
            else: print(" Correct recall")
        else:
            if verbosity == 0: print("N", end="")
            else: print(" Bad recall")
    print("   Success = %.1f%%" % (100.0*successes/training_size))

Training the network with 3 numbers with 1 bad pixel or without any bad pixels works OK…

print("Training 3 digits with no pixel errors")
Evaluate_Net(3, 0, verbosity=0)
print("Training 3 digits with just 1 pixel in error")
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)

… whereas trying with 2, 3 or 4 errors only works some of the time…

print("Training 3 digits with 2 pixels in error")
print("Training 3 digits with 2 pixels in error")
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
print("Training 3 digits with 3 pixels in error")
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
print("Training 3 digits with 4 pixels in error")
Evaluate_Net(3, 4, verbosity=0)
Evaluate_Net(3, 4, verbosity=0)
Evaluate_Net(3, 4, verbosity=0)
Evaluate_Net(3, 4, verbosity=0)
Evaluate_Net(3, 4, verbosity=0)
Evaluate_Net(3, 4, verbosity=0)

But the big problem here is trying to train the network with more digits.

It doesn’t work even with error-free input for just one more digit…

print("Training more digits but with no pixel errors")
Evaluate_Net(training_size=4,  errors=0, verbosity=0)
Evaluate_Net(training_size=5,  errors=0, verbosity=0)
Evaluate_Net(training_size=6,  errors=0, verbosity=0)
Evaluate_Net(training_size=7,  errors=0, verbosity=0)
Evaluate_Net(training_size=8,  errors=0, verbosity=0)
Evaluate_Net(training_size=9,  errors=0, verbosity=0)
Evaluate_Net(training_size=10, errors=0, verbosity=0)

The network just doesn’t have the capacity to learn more digits. Learning new digits results on old ones getting forgotten. This is the problem with Hopfield networks. They need around 7 or more neurons per training item. The network here just doesn’t have enough neurons and has a limit consistent with this.

More typical neural nets are ‘non-recurrent’ and employ back-propagation:

  • There are no loops in the network. Paths through the network run from inputs through one or more neurons to outputs but never back on themselves.
  • Usage (‘recall’) is easy and literally straight-forward: the calculations are performed from inputs, forward, through to the outputs.
  • Training is more complex, using the back-propagation algorithm to determine synaptic weights (more on that later).

In contrast, learning in Hopfield networks is easy and recall requires more effort.

Hopfield networks are more obviously in keeping with the biological brain:

  • They are recurrent.
  • Recall is performed by presenting a stimulus to which the network responds, eventually settling down on a particular state.
  • There is a process that is obviously analogous to Hebbian learning, in which ‘neurons that fire together – wire together’.
Posted in Uncategorized | Tagged , , , | Leave a comment

Fish, Chips and Ketchup

Fish, chips and the International Herald Tribune

Fish, chips and the International Herald Tribune

During his PhD years in Edinburgh, Geoffrey and his experimental psychology chums would often stop by the chippy after a night on the town. Geoffrey would queue up with his order of x1 pieces of fish, x2 lots of chips and x3 sachets of ketchup (yes, they charge for ketchup in Edinburgh!). Unable to focus his blurry eyes on the price list, he would estimate what the total would come to in order to ensure he had enough cash.

If he had been able to remember all the previous ordering history (‘first occasion: 3 pieces of fish, 4 lots of chips and 2 sachets of ketchup cost £1.10’), he would have been able solve the problem exactly after a few visits. But he didn’t – he just remembered the best guesses after the previous visit to the chippy.

But no worries. He treated the problem as a linear neural network and knew how to modify his best guesses after each visit well. He was also lucky in choosing a learning rate, ε, of 0.05 and so it only took 18 visits to the chippy before he was within tuppence of the right amount which he thought was good enough.

This almost certainly doesn’t bear any resemblance to the reality of why Prof Hinton (the ‘Godfather of Deep Learning’) chose to teach linear neural networks with an introductory example of fish, chips and ketchup.

But explaining how it works through a mathematical explanation of ‘the delta rule’ for fast ‘gradient descent’

∆wi= ε xi (t-y)

…is beyond most people whereas a large number of school child now learn to program in Python. I think playing around with some Python code would be a demystifying introduction to neural networks for many. So here is some code to help with this…

############################
# fish_chips_and_ketchup.py
############################
"""
A very simple example of the learning of a linear neural network
"""
# This is coded explicitly for fish, chips and ketchup
# for teaching clarity rather than being generalized.

from numpy  import exp      # For setting the learning rate
from random import randint  # For generating random chippy orders
MAX_ITERATIONS = 2000 # Number of visits to the chippy before giving up.
START_PRINTS   = 10   # Number of iterations reported on at the start.
STOP_ERROR     = 0.03 # Error margin - good enough to stop
cost = {'fish': 0.20, 'chips': 0.10, 'ketchup': 0.05} # This is the menu

def print_status_line(iteration, price, error): # Reporting of results at each iteration
    print ("%4d  Fish £%.2f, Chips £%.2f, Ketchup £%.2f, error £%.2f"
           % (iteration, price['fish'], price['chips'], price['ketchup'], error))

for e in range(1,7):
   # Set the learning rate 'epsilon' to exponentially slower values at each iteration
   epsilon = exp(-e)
   print ("Case %d: learning rate = %.3f" % (e, epsilon))

   weight = {'fish': 0.30, 'chips': 0.05, 'ketchup': 0.02} # Initial guesses
   error = (abs(weight['fish']-cost['fish'])
          + abs(weight['chips']-cost['chips'])
          + abs(weight['ketchup']-cost['ketchup']))
   print_status_line(0, weight, error)

   for n in range(1, MAX_ITERATIONS+1):
      # Just randomly set what this particular menu order is...
      portions = {'fish': randint(1, 5), 'chips': randint(1, 5), 'ketchup': randint(1, 5)}
      target_price = (weight['fish']*portions['fish']
                    + weight['chips']*portions['chips']
                    + weight['ketchup']*portions['ketchup'])
      actual_price = (portions['fish']*cost['fish']
                    + portions['chips']*cost['chips']
                    + portions['ketchup']*cost['ketchup'])
      # Difference in output...
      residual_error = target_price - actual_price
      # Condition for halting loop...
      prev_error = error
      error = (abs(weight['fish']-cost['fish'])
             + abs(weight['chips']-cost['chips'])
             + abs(weight['ketchup']-cost['ketchup']))
      # Adjust the weights
      for i in ['fish', 'chips', 'ketchup']:
         delta_weight = epsilon * portions[i] * residual_error
         weight[i] -= delta_weight

      # Output display and automatic halting on divergence or convergence...
      if abs(error) > 4.0*abs(prev_error):
          print_status_line(n, weight, error)
          print ("      Halting because diverging")
          break
      if (error <= STOP_ERROR) :
          print_status_line(n, weight, error)
          print ("      Halting because converged")
          break
      if (n <= START_PRINTS):
          print_status_line(n, weight, error)
      if (n == MAX_ITERATIONS) :
          print_status_line(n, weight, error)
          print ("      Halting but not yet converged")

Note: this Python code is written for clarity – for understanding by people not intimately familiar with the Python language – rather than for conciseness and efficiency. It is unapologetically ‘unpythonic’.

Running it produces the output…

Case 1: learning rate = 0.368
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.29, Chips £0.03, Ketchup £0.01, error £0.18
   2  Fish £0.32, Chips £0.06, Ketchup £0.05, error £0.19
   3  Fish £-0.71, Chips £-0.14, Ketchup £-0.78, error £0.16
   4  Fish £12.15, Chips £12.72, Ketchup £15.30, error £1.98
      Halting because diverging
Case 2: learning rate = 0.135
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.32, Chips £0.08, Ketchup £0.04, error £0.18
   2  Fish £0.28, Chips £0.05, Ketchup £-0.04, error £0.15
   3  Fish £0.24, Chips £0.03, Ketchup £-0.06, error £0.23
   4  Fish £0.36, Chips £0.60, Ketchup £0.51, error £0.22
   5  Fish £-1.41, Chips £-2.35, Ketchup £-1.26, error £1.12
      Halting because diverging
Case 3: learning rate = 0.050
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.22, Chips £0.00, Ketchup £0.00, error £0.18
   2  Fish £0.29, Chips £0.17, Ketchup £0.17, error £0.16
   3  Fish £0.12, Chips £-0.04, Ketchup £0.13, error £0.28
   4  Fish £0.33, Chips £0.13, Ketchup £0.17, error £0.29
   5  Fish £0.22, Chips £0.02, Ketchup £0.10, error £0.29
   6  Fish £0.22, Chips £0.02, Ketchup £0.10, error £0.15
   7  Fish £0.20, Chips £0.01, Ketchup £0.07, error £0.15
   8  Fish £0.21, Chips £0.07, Ketchup £0.12, error £0.12
   9  Fish £0.18, Chips £0.05, Ketchup £0.04, error £0.11
  10  Fish £0.19, Chips £0.06, Ketchup £0.06, error £0.08
  18  Fish £0.21, Chips £0.11, Ketchup £0.04, error £0.02
      Halting because converged
Case 4: learning rate = 0.018
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   2  Fish £0.29, Chips £0.04, Ketchup £0.01, error £0.18
   3  Fish £0.30, Chips £0.07, Ketchup £0.04, error £0.18
   4  Fish £0.26, Chips £0.06, Ketchup £0.03, error £0.14
   5  Fish £0.25, Chips £0.06, Ketchup £0.03, error £0.11
   6  Fish £0.25, Chips £0.06, Ketchup £0.03, error £0.11
   7  Fish £0.26, Chips £0.07, Ketchup £0.04, error £0.11
   8  Fish £0.26, Chips £0.08, Ketchup £0.04, error £0.10
   9  Fish £0.26, Chips £0.08, Ketchup £0.04, error £0.09
  10  Fish £0.26, Chips £0.08, Ketchup £0.04, error £0.09
  44  Fish £0.22, Chips £0.09, Ketchup £0.05, error £0.03
      Halting because converged
Case 5: learning rate = 0.007
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   2  Fish £0.30, Chips £0.06, Ketchup £0.02, error £0.18
   3  Fish £0.30, Chips £0.06, Ketchup £0.03, error £0.17
   4  Fish £0.30, Chips £0.06, Ketchup £0.02, error £0.17
   5  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.17
   6  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.17
   7  Fish £0.29, Chips £0.05, Ketchup £0.02, error £0.18
   8  Fish £0.29, Chips £0.05, Ketchup £0.02, error £0.18
   9  Fish £0.29, Chips £0.04, Ketchup £0.02, error £0.18
  10  Fish £0.29, Chips £0.04, Ketchup £0.01, error £0.18
 152  Fish £0.21, Chips £0.09, Ketchup £0.04, error £0.03
      Halting because converged
Case 6: learning rate = 0.002
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   2  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   3  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   4  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   5  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   6  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   7  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   8  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   9  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
  10  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
 389  Fish £0.21, Chips £0.09, Ketchup £0.04, error £0.03
      Halting because converged
http://www.cs.toronto.edu/~hinton

Hinton and python

… …which shows:

  1. How the error (the total mismatch between ‘Geoffrey’s’ best guesses and the actual costs) generally (but not always) decrease, leading towards the correct answer,
  2. Fewer iterations are required for faster learning rate (higher values of ε) but that the guesses actually diverge when ε increases beyond some particular point.

Incidently, Prof Hinton was also introduced to python at an early age…

Posted in Uncategorized | Tagged , , , | 1 Comment

Consciousness and Zombies

 

Common Sense Consciousness

There are common-sense notions of what consciousness is about which tell us:

  • We are consciousness when we are awake,
  • We are not consciousness if we are asleep except when we are dreaming,
  • People under anaesthetic are not consciousness.
  • People in a coma are not consciousness but those suffering from ‘locked in’ syndrome are.
  • People have a single consciousness. It is not that there are multiple consciousnesses within them.
  • There is no higher consciousness – groups of people are not conscious.
  • Machines are not conscious.

But these can be wrong. For example, to take the last point, there is the danger of us being ‘biochauvinist’, failing to recognize that non-biological stuff can be conscious in any way.

We Need a Theory

Much has been said on the nature of consciousness by philosophers but, as with much of philosophy, it is pre-scientific. We are still grappling with the problem to find a way to make it scientific where we can progress beyond speculating by testing hypotheses – predicting and quantifying them. It is like we are at the same stage as the ancient Ionian philosophers were when speculating about the physical nature of the universe. For example:

  • Thales speculated that ‘everything is water’ and provided reasons for his argument,
  • Anaximenes speculated that ‘everything is air’ and provided reasons for his argument, and
  • Heraclitus speculated that ‘everything is change’ and provided reasons for his argument.

No amount of speculation on its own could have ever led anyone to our current understanding of the physical world, involving quantum theory and relativity. Our understanding has developed through a long series of theories that have all been refuted as being ‘wrong’ but were necessary steps to make progress.

We have been lacking theories which would provide the first step towards a scientific understanding of the fundamentals of consciousness. This is ‘proto-science’ – at the start of the scientific process. We need to have a theory that is scientific in that it describes consciousness in wholly physical terms and that, given a specific physical state, can predict whether there is consciousness. As there is progress, theories and methods get established into what we normally understand as ‘science’. It can then provide useful applications. For example, a good theory would provide us with 100% success rate in avoiding ‘anaesthesia awareness’. It must agree with our common-sense understanding of consciousness to some degree but it may surprise us. For example, it might tell us:

  • We are consciousness throughout the time we are asleep – the difference is that our experiences are not laid down in memory.
  • In some specific circumstances, machines and/or groups of people can be conscious.

Integrated Information Theory

Giulio Tononi’s 2004 ‘Integrated information theory’ (IIT) of consciousness has been described by Christof Koch as

“the only really promising fundamental theory of consciousness”

blah

In it, Tononi proposes a measure named after the Greek letter φ (‘phi’) which is the amount of ‘integrated information’ of a system. Consciousness is a fundamental property of the universe which arises wherever φ > 0. It is therefore a form of ‘panpsychism’ – consciousness can arise anywhere. The higher the value of φ, the larger the amount of consciousness. Consciousness is a matter of degree. Humans have large brains and very large φ and are highly conscious. Small rodents have smaller φ and are therefore less conscious. But sleeping humans must have a lower φ than wakeful rodents.

I have previously posted about Tononi’s theory, by providing an overview of his book ‘Phi: A voyage from the Brain to the Soul’. The book is a curious fusion of popular science and fiction and so, disappointingly avoids all technicalities involved with the theory and the calculation (quantification) of φ.

In one form of the ‘Integrated Information Theory’, φ is calculated as:

formula_phi

where

formula_expected_information

Simples!

In short, φ is a measure of the information flow within a system. It is essentially formulated back from wanting (!) the following:

  • The information flow between humans is much much less than the information flow within a human brain.
  • The distinguishing indicator between wakefulness and REM sleep versus non-REM sleep is that there is a large drop in ‘long’ range’ communication in the latter – information flow is much more localised.

And this (necessarily) leads to the conclusions we ‘want’:

  • We are not conscious in non-REM sleep or in a coma but are at other times, including if suffering from locked-in syndrome.
  • There is not a consciousness associated with a group of people.

A positive φ requires the mutual flow of information within the system – between parts of the system, there is flow in both directions. In short, there are loops and ‘internal states’ i.e. memory. Tononi provides a metaphor of a digital camera. A 10-megapixel camera sensor provides 10 megabits of information but there is no integration of that information and no memory. In contrast:

  • The human visual system combines information from neighbouring rod and cone photo-receptors in the retina before the information gets to the cortex of the brain, and
  • There are more connections in the brain going from the ‘higher’ levels down towards the retina than there are going in the opposite direction.

A camera sensor has zero φ. so there is no consciousness. But a thermostat has memory (precisely 1 bit capacity) and a loop because of its hysteresis. It has some small positive value of φ. Hence is has some (absolutely minimal) degree of consciousness!

This all sounds like a crack-pot theory but it is being taken seriously by many. Tononi’s academic specialization is on sleep but he has worked at Gerald Edelman’s Neurosciences Institute, La Jolla, working with Gerald Edelman on metrics for brain complexity. This has evolved into his metric for consciousness. (Incidentally, he has also worked with Karl Friston who was also at the Neurosciences Institute at the same time). Christof Koch is now collaborating with Tononi on the theory. My point: he is not someone on the fringes of this academic field.

Cynically, we might say that the theory has credibility because there is so very little else of substance to go on. We need to recognize that this is all still just ‘proto-science’.

IIT 3.0

The ‘Integrated Information Theory’ has gone through two major revisions. The original ‘IIT 1.0’ from 2004 was superceded by ‘IIT 2.0’ in 2008 and ‘IIT 3.0’ in 2014.

‘IIT 1.0’ and ‘IIT 2.0’ based measures of ‘effective information’ (ei) on entropy – the effective information was an average ‘Kullback–Leibler divergence’ (alternatively termed ‘relative entropy’). This may sound familiar: entropy and the Kullback–Leibler divergence also feature in Karl Friston’s ‘Variational Free Energy’ theory of generalized brain function.

But ‘IIT 3.0’ uses a different metric for ‘effective information’. The basis of this is known:

  • in mathematical circles by the formal term of the ‘Wasserstein distance’, and
  • in computer science circles by the (literally) more down-to-earth term of the ‘Earth Mover’s Distance’ (EMD)

Imagine the amount of earth that a digger would have to move to make a pile of earth of a particular shape (‘distribution’) into the shape of another (these piles of earth represent probability distributions). When applied to simple binary distributions, this just reduces to the ‘Hamming distance’ used in Information Theory for communication systems.

Two Circuits

Unlike previous editions, ‘IIT 3.0’ explicitly provided an example that I find rather incredible.

Figure 21 of ‘IIT 3.0’ shows 2 circuits, A and B (see below). The circuits consist of circles connected together with red and black arrows. The circles are ‘nodes’. The arrows are signals which are inputs to and outputs from the nodes. My interpretation of these diagrams is as follows:

  • Black arrows mark ‘excitatory’ connections.
  • Red lines with a dot at one end mark ‘inhibitory’ connections (going to the end with the dot).
  • At each node, the input values are added (for excitatory connections, effectively scaled by 1) or subtracted (for inhibitory connections, effectively scaled by -1). If they meet the criterion marked at the node (e.g ‘>=2’) then each output will take the value 1 and otherwise it will be 0.
  • Time advances in fixed steps (let us say 1 millisecond, for convenience) and all nodes are updated at the same time.
  • The diagrams colour some nodes yellow to indicate that the initial value of a node output is 1 rather than 0 (for a white node).

blah

Figure 21. Functionally equivalent conscious and unconscious systems.

 The caption for the figure reads:

(A) A strongly integrated system gives rise to a complex in every network state. In the depicted state (yellow: 1, white: 0), elements ABDHIJ form a complex with ΦMax = 0.76 and 17 concepts. (B) Given many more elements and connections, it is possible to construct a feed-forward network implementing the same input-output function as the strongly integrated system in (A) for a certain number of time steps (here at least 4). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain. The transition from the first layer to the second hidden layer in the feed-forward system is assumed to be faster than in the integrated system (τ << Δt) to compensate for the additional layers (A1, A2, B1, B2)

The caption concludes with a seemingly outrageous statement on zombies and consciousness which I will come back to later on.

Unfortunately, in the figure:

  • With the ‘integrated system’, I cannot reproduce the output sequence indicated in the figure!
  • With the ‘feed-forward system’, it is difficult to determine the actual directed graph from the diagram but, from my reasonable guess, I cannot reproduce the output sequence indicated in this figure either!

But there are strong similarities between Tononi’s ‘integrated system’ versus ‘feed-forward system’ and ‘IIR filters’ versus ‘FIR filters’ in Digital Signal Processing that are more than coincidental. It looks like Tononi’s two ‘complexes’ as he calls them are derived from IIR and FIR representations. So I am going to consider digital filters instead.

IIR Filters

An input signal changes over time, but only at discrete time intervals. For the purposes of this example, assume there is a new sample every millisecond. There is an input stream of samples around time t:

X[t], X[t+1], X[t+2], X[t+3], X[t+4] and on.

And there is an output stream of samples:

Y[t], Y[t+1], Y[t+2], Y[t+3], Y[t+4] and on.

A simple filter that smoothes out changes in input ‘samples’ can be formed by averaging the input with the previous output value:

Ya(t) = ½.Xa(t) + ½.Ya(t-1)

This is a filter of a type called an ‘infinite impulse response’ (IIR) filter. A diagram for an IIR filter is shown below:

blah

A ‘z-1’ indicates a delay of 1ms. The b, a0 and a1 boxes are multipliers (b, a0 and a1 are the constant values by which the signals are multiplied) and the ‘Σ’ circle sums (adds). The diagram shows a ‘second order’ filter (two delays) but I will only consider a first order one:

b = 1/2

a1 = 1/2

a0 = 0

A single non-zero value within a series of zero values is called an ‘impulse’:

X = … 0, 0, 0, 0, 1, 0, 0, 0, 0, …

If this impulse is fed into a filter, the resulting output from that impulse is called the ‘impulse response’. For the IIR filter it will be as follows:

Y = … 0, 0, 0, 0, 0.5, 0.25, 0.125, 0.0625, …

that is:

Y(1) = 1/2

Y(2) = 1/4

Y(3) = 1/8

Y(4) = 1/16

and in general form:

Y(t) = 2t.

so there is some non-zero (but infinitesimally small) output at very high t – the response carries on infinitely and this is why the filter is called an ‘infinite impulse response filter’.

If we put a ‘step’ into the IIR filter…

X = … 0, 0, 0, 0, 1, 1, 1, 1, 1 …

we get a ‘step response’ out, which shows the smoothing of the transition:

Y = … 0, 0, 0, 0.500, 0.750, 0.875, 0.938, 0.969, 0.984, 0.992, …

This IIR filter is the equivalent to Tononi’s ‘integrated system complex’.

FIR Filters

The DSP equivalent to Tononi’s ‘feed-forward system complex’ is a ‘finite impulse response’ (FIR) filter:

Y(t) = b0.X(t) + b1.X(t)1) + b2.X(t-2) + b3.X(t-3) + … + bN-1.X(t-N+1))

A diagram corresponding to this FIR filter (of ‘order N-1’) is shown below:

blah

Here, the triangles are multipliers and the ‘+’ circles obviously add.

Now, we can try to get a FIR filter to behave very similarly to an IIR filter by setting its coefficients

b0 , b1 , b2 , b3 … bN-1

to be the same as the first N terms of the IIR’s impulse response. The values after t=5 are quite small so let’s set N=6:

b0 = 1/2

b1 = 1/4

b2 = 1/8

b3 = 1/16

b4 = 1/32

b5 = 1/64

so the transfer equation is:

Y(t) = (1/2).X(t) + (1/4).X(t -1) + (1/8).X(t -2) + (1/16).X(t -3) + (1/32).X(t -4) + (1/64).X(t -5)

and the step responses is then:

Y = … 0, 0, 0, 0.500, 0.750, 0.875, 0.9375, 0.96875, 0.984375, 0.984375,  …

The FIR’s ‘impulse response’ only lasts for 6 samples – it is finite, hence why the filter is called a ‘finite impulse response filter’. The output is not dependent on any input value from more than 6 samples prior.  but the first 6 output samples following an impulse will be the same as that of the IIR’s and so behave in a very similar way.

(Note: The output never gets any higher than 0.984375 – the sum of all the coefficients)

IIR and FIR Filters are alike but not the same

This is exactly the same situation as described by Tononi:

Reiterating Tononi’s figure caption:

Given many more elements and connections, it is possible to construct a ‘feed-forward’ network implementing the same input-output function as the ‘integrated system’ in (A) for a certain number of time steps (here at least 4). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain. …

And then there is the punchline that I omitted previously…

… Despite the functional equivalence, the ‘feed-forward system’ is unconscious, a “zombie” without phenomenological experience.

So it is true with the digital filters:

Given more elements and connections, it is possible to construct a FIR filter implementing the same input-output function as the IIR filter for a certain number of time steps (here 6). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain.

and hence

… Despite the functional equivalence, the FIR filter is unconscious, a “zombie” without phenomenological experience (unlike the IIR filter)!

For the FIR filter, there are no loops in the network – the arrows all point south/east and the value is then φ=0, in contrast with the non-zero φ for the IIR filter which does have loops.

To anyone that understands digital signal processing, the idea that an IIR filter has some consciousness (albeit tiny) whereas an equivalent FIR filter does not is absurd. This is an additional absurdity beyond that of the panpsychist idea that any filter could have consciousness in the first place.

Could Androids Dream of Electric Sheep?

In a previous talk (‘Could Androids Dream of Electric Sheep?’) I considered whether something that behaved the same way as a conscious human would also be conscious.

If something behaves the same way as a conscious human, we can still deny that it is not conscious because it is just an imitation. We would not credit a computer running Joseph Weizenbaum’s famous ELIZA program as being genuinely conscious in the same way as we are (although the Integrated Information Theory would grant it as having some lower value of φ, but one that is still greater than zero).

A narrower philosophical question is whether a computer running a simulation (‘emulation’) of a human brain would be conscious. (Yes – ‘whole brain simulation’  is not possible – yet.) A simulation at a sufficiently low level can show the same phenomenon as the real object (such as ‘getting wet’ in a rainstorm in a weather simulation.) In this case, the ‘same thing’ is going on, but just implemented in a different physical substrate (electronic transistors rather than gooey biological stuff); a functionalist would say that the simulation is conscious by virtue of it being functionally the same.

The yet narrower argument is if the physical construction of the ‘simulation’ was the same. It would no longer be a simulation but a direct (atom-by-atom) copy. Anyone insisting on this can be accused of being ‘bio-chauvinist’ in denying that computer simulations are conscious. But it is still possible that consciousness is not duplicated. For example, if whatever it is that causes consciousness is at a sub-atomic level, an atom-for-atom copy might miss this out. How would we know?

I took a functionalist position.

However, the example above shows that, according to the ‘Integrated Information Theory’, it is possible for two systems to be functionally the same (caveat: almost) but for one to be conscious whilst the other is not. In short – that (philosophical) zombies can exist.

But any ‘system’ is just a component in a larger system. It is not clear to me whether, if one component with φ>0 is substituted with a functionally identical one with φ=0, that the φ of the larger system is reduced. In a larger system, the loop-less φ=0 implementation ends up with loops around it.

To be continued (eventually, hopefully).

Posted in Uncategorized | Tagged , , , , , , , , , | 2 Comments

Brexit and the Brain

 

On this blogsite up to now, I have touched on many of the sub-fields of philosophy – the philosophy of mind, consciousness, epistemology, philosophy of science and, most recently, ethics. The biggest sub-field not covered is politics.

But then came ‘Brexit’.

Thinking about Brexit has reminded me of many of the ideas within past posts. So here, in a bit of a departure from the normal, I try to relate Brexit to these ideas. It is not really a foray into political philosophy. It is about the cognitive processes behind the political event. It might provide you with some food for thought about Brexit. And the Trump phenomenon too, for that matter.

I’ll start by summarizing apposite ideas from past posts:

 

Intelligence and Knowledge

Intelligence is about adapting and responding appropriately to circumstances, particularly when they are complex and changing. An important aspect is the ability to make predictions.  A central topic of this blogsite is that of the idea of the brain as a hierarchy of predictors  (Hohwy’s ‘predictive brain’ thesis and Friston’s ‘variational free energy’ theory) that is continuously trying to minimize of surprise, through action and perception. These brain theories are closely related to ideas around bio-inspired ‘artificial neural networks’ that are now making significant strides in various artificial intelligence applications (threatening to take away many white-collar jobs in the near-future).

Our ability to predict events in the world outside improves over our lifetime. Knowledge grows. In the early stages of life, the forest of neurons is very plastic hence highly adaptable but very ‘impressionable’ to stimulus. When mature, the brain has become wise – good at anticipating events in the environment that it has grown up in. But it can get ‘stuck in its ways’ if that environment has now changed. Keynes is famously supposed to have said:

“When the facts change, I change my mind. What do you do, sir?”

But the difficulty is in accepting that the new facts are valid, because they do not cohere with everything else you know.

I have related this mechanistic learning process to Susan Haack’s epistemological ‘foundherentist’ theory which is a synthesis of the competing correspondence and coherence theories of truth. New information modifies one’s knowledge if it both (i) corresponds to how things seem to behave in the outside world and (ii) if it coheres with the other knowledge within one’s head.

 

Worldviews

Embedded within the totality of our knowledge is our worldview – the big picture of how the world appears to us. It is cultural. We grow up within the culture of our parents’ environment and it evolves within us. Our worldview is a bit different from that of our parents. Our children’s will be a bit different too. But only a bit. If it changes too much, the culture is broken.

The traditional Western philosophy has been one of a non-material Cartesian mind acting within an absolutist objective world of facts; we should be perfectly rational. But our modern understanding is of an evolved, physical mind. Our understanding of how knowledge works has been influenced by the reactions to the horrors of totalitarianism central Europe by Kuhn, Feyerabend, Polanyi and Lakatos.

People are separately building models within their brains of the same (shared) environment – but those models are not the same. People do not believe in things that are objectively right or wrong. They do not believe in just anything. They believe in things because they work – they correspond and cohere. Their knowledge, embodied within the connectome, is neither objective/absolutist nor subjective/relativist. It is a middle course. But still, some brains make better predictions in particular circumstances than others.

 

Cognitive Biases

So it seems that our thinking falls short of the simple, pure, logical rationality required for decision-making the 21st Century world.  We have cognitive biases that seem to distort our thinking. For example, there is ‘anchoring’ (already hinted at), in which early information (when ‘impressionable’) has a disproportionate influence on our thinking compared with later information (when ‘mature’).

From the work of Tversky, Kahneman, Gigerenzer and Tetlock (focussed on politics and economics decision-making but generally applicable), we understand that these biases are the result of evolution and have endowed us with a cognitive toolbox of tricks that can make decisions in a timely manner that are ‘good-enough’. Much of this is intuitive. Our thinking is more complex, more efficient but less rational.

In our search for meaning, we tend to want to pull our ideas together to create some greater ‘truth’. Experts are liable to focus on a learnt ideology of grand overarching principles – of too much coherence than is warranted. Computers can deal with the mass of data to maintain correspondence between events in the outside world and their predictions and hence can outperform the experts. But straightforward heuristic tricks (such as the ‘recognition heuristic’ – that things we haven’t heard of will tend to be less important than those we have) mean that amateurs can often outperform the theories of experts!

.

Emotion

So, much of our thinking is irrational and intuitive. But our thinking is also affected by emotion.

A most basic emotion is fear. The basic animal state of nature is continuous anxiety – to be constantly alert, fearfully anticipating potential life-threatening events.  But we need to balance risk. We cannot be completely risk-averse (hiding in a dark room). We must explore the world around us when the risk is low in order to have learnt what to do for when the risk is high.

 

Social Cohesion

And well-being is improved by cooperation with others around us. Biological mechanisms of motherhood (such as the neurotransmitter oxytocin) give rise to caring for those immediately around us. Knowing our place within the hierarchy of society reduces physical violence within our community (but the potential for violence means that we do not have an improved feeling of well-being). The flip-side of the empathy that we feel towards those within our ‘in-group’ community who are like ourselves is that it emboldens us against the ‘out-group’ beyond.

Over time, we learn how those around us behave. Through familiarity, we can predict how others will behave in particular circumstances and can imagine how they see us. We have a ‘theory of mind’ – an ability to recognise that others may think differently from you. We formulate how reputable others are and understand that other do that to us. We have a reputation. With established reputations, we can cooperate, able to trust one another. However, we have no knowledge of how reputable strangers from outside our community are. Hence we treat them with suspicion. But that suspicion reduces with more frequent contact. Strangers become less strange, particularly if they are associated with reputable institutions. This allows societies to grow beyond the size where everyone knows everyone else. To act morally is to balance our wants with those of others – to get inside the mind of others to understand what they want and to take that into consideration.

 

Unpraiseworthiness

Classic case examples such as Phineas Gage and Charles Whitman show that physical effects on the brain cause different behaviour. This challenges our traditional notions of free will and responsibility. We are a product of our environment. In a classic legal case example, murderer Richard Loeb was spared the death penalty because it was successfully argued that did not choose the (privileged) environment in which he grew up.

But if transgressors cannot be blamed for their deeds, then equally the successful cannot be praised for their achievements. They feel proud of their achievements that are a result of their personal abilities. Little is credited to fortunate circumstances in which are born and grow up.

(Note: a lack of traditional responsibility does not mean that a transgressor is not sanctioned in some way and it does not mean we do not promote positive examples.)

 

Affluent Societies

Various research indicates that (i) moral behaviour and reasoning of those at the top of the social tree differs from that of the rest of us, and (ii) individuals in affluent societies behave differently from those in less affluent ones.

In short, the affluent are less empathetic. They are more likely to prioritize their own self-interests above the interests of others (simple example:  they are less likely to stop for pedestrians at crossings) Piff calls this ‘the asshole effect’! In contrast with traditional intuitive, emotional responses, they favour more ‘rational’ utilitarian choices such as being more prepared to take resources from one person to benefit several others. They have a higher sense of entitlement.

Charitable donations are one indicator of the consideration given to others. Being rich does not generally confer greater generosity. But being married, older, living in rural rather than urban areas or living in a mixed rather than segregated social neighbourhood all correlate with high donations. So does regular attendance of religion services which can simply be attributed to being reminded of the needs of others on a regularly basis.

A general picture emerges of how affluent ‘Western’ societies differ from those with lower GDPs. There is less empathy for those immediately around us. People are more individualistic and self-indulgent. Relationships have less commitment. People live in an urban environment in which social interaction is anonymous and transactional rather than proximate (‘up close and personal’). There is higher monetization.  (Regardless of status, just thinking about money decreases empathy, shifting the balance from others to oneself.) We are less dependent on other specific people and their goodwill. If we want something, we can just buy it with the minimum of personal interaction, from an anonymous provider. There is a high degree of social connectedness but this is not with those outside our own social spheres and there is less interaction with those living in our immediate vicinity. It is a case of ‘out of sight; out of mind’.

But the flip-side of this is that the affluent are more likely to interact with members of the out-group – to be less xenophobic.

 

Brexit

Now, applying all these ideas to Brexit…

 

Confirmation Bias

It is generally agreed that the quality of the political debate during the referendum campaign was dire. Leave campaigners appealed to those with a Leave worldview. Remain campaigners appealed to those anchored with a Remain worldview. These worldviews were formed long before the referendum; they were as good as instinctive. Remain arguments did not fit into the Leave worldview and Leave arguments did not fit into the Remain worldview. Confirmation bias reigned. Arguments became increasingly coherent, but this was because of reduced correspondence to reality! There would be no £350 million a week and there would be no World War III. There may have been undecideds to be swayed from an unconscious worldview to a conscious voting intention but I suspect that it actually changed the minds of very few.

 

The Failure of Empathy

Recasting what was said above in terms of Brexit, Remainers were more affluent, more self-sufficient and less empathetic than Leavers. They were more likely to prioritize their own self-interests above the interests of others. In contrast to the traditional intuitive, emotional responses of poorer Leavers, they favoured more ‘rational’ choices. The Remain argument was that of the financial impact of Brexit. It was in terms of money, and monetization decreases feelings of empathy. Being older and living in rural rather than urban areas correlate with empathy – and correlated with Leavers. But this empathy was for those within the in-group. The flip-side of this empathy effect (such as the effect of Oxytocin) is that Leavers are less trusting of those in the out-group.

 

The Failure of Trust

From within a Leave worldview, a vote to Remain was a self-interested vote to maintain the status quo. Remain voted as ‘Homo economicus’ – as rational self-interested agents, without caring about the opinions of others. Leavers heard the Remain campaigners’ claims about the bad economic consequences but rejected them because of a failure of trust. The bad reputation of individuals campaigning for Remain was inherited from the institutions with which they were associated with – the institutions of the elite. These were the politicians and ‘greedy banksters’ of the Establishment whose reputations had been destroyed in the eyes of the public as self-interested in the extreme.

 

The Failure of Experts

Part of this Establishment were the ‘experts’ whose reputation was now tarnished by their inability to predict. Among them were the inability to predict the failure of the banking system and the inability to predict election outcomes. It may be that their expertise was based on a world which has now changed. Some scepticism about expert opinion was justified.

 

The Failure to Think

Too many Leavers did not think. They accepted things to be true because they wanted them to be true. They did not question them. It was a failure to think for themselves. The stereotypical view from within the Remain worldview was that a vote to Leave was a vote based on ignorance and stupidity; there is some truth in this.

But too many Leavers did not think either – or think much. A large proportion of the Remain vote will not have given much thought to the vote because the correct way to vote was obvious and no further thought was deemed necessary. They did not question whether there might be any merits of Brexit.

 

The Failure of Morality

I have defined morality as being about balancing our wants against those of others – to get inside the mind of others to understand what they want and to take that into consideration. To want to do the balancing requires intellect and for us to care about the other.

Leavers tended to see the issue in terms of the others – as an issue of inequality. The ‘elite’ others did not seem to care about them. They could see that it would be in the interest of the others to vote Remain. They balanced their wants against those of the other and came down firmly on the side of their own faction’s wants. (When might they have another opportunity for this cri de cœur?)

It was noted previously that there are no issues that are purely moral. A moral aspect is just one of many aspects of a problem. Brexit had moral aspects and well as economic and other aspects. In short:

  • Leavers saw the moral aspect., but
  • Remainers (skewed towards higher intellect) saw only the economic aspect.

Remainers may well find this assertion to be outrageous!

 

Mindlessness and Heartlessness

So, Leavers were mindless and Remainers were heartless. Remainers did not empathize, or did not think that they should be empathizing. Leavers engaged in apparently mindless political vandalism. But it was not necessarily mindless. One telling comment on a blog after 23 June asked ‘what if voting Leave was the rational thing to do?’ To answer that, Remainers would be forced to think of what the other was thinking. And they might conclude it was not mindless political vandalism after all; it was just political vandalism.

 

The environment

We are all products of our environment. If we were brought up in a Remain environment (e.g. Cambridge) or Leave environment (e.g. Sunderland), would we have voted differently? Probably. If we recognize this, we will not demonize the other.

 

Conclusion

I have tried to fit one story into another – to fit a story about the epistemological and ethical aspects of a philosophical worldview into the political story of Brexit! It is far from a perfect match. I have not talked about economics or immigration or identity or globalization or other issues central to Brexit because they do not fit into the story of the brain here. But it is hopefully interesting and food for thought.

Returning to my favourite piece of graffiti:

“So many heads, so few brains.
So many brains, so little understanding.”

The first line is about a failure to think. The second line is about a failure to think about others. The first can be levelled against many Leavers. The second can be levelled against many Remainers.

We must look more to the future than the past. We must look backwards not to blame but to understand why people voted the way they did so that we might understand what might satisfy them. We need to get inside their minds (and the easiest way of doing that is to ask them!).

We can then look forwards – to how we can create a solution that is acceptable for a large majority of us (much more than 52%) – both Leavers and Remainers. Then we will heal the rift. We will see.

 

Mrs Varoufakis (allegedly) trying but failing to see one standpoint from the position of another.

Posted in Uncategorized | 1 Comment

Some Good Reason

 

This is the 19th part of the ‘Neural Is to Moral Ought’ series of posts. The series’s title comes from Joshua Greene’s opinion-piece paper

‘From Neural Is To Moral Ought: What are the moral implications of neuroscientific moral psychology?’

Here, I pick through Greene’s paper, providing responses to extensive quotes of his which refer back to a considerable number of previous parts of the series. His paper divides into 3 sections which I will examine in turn:

  1. The ‘is’/‘ought’ distinction
  2. moral intuition
  3. moral realism vs relativism

 

The ‘Is’/‘Ought’ Distinction

The paper’s abstract is:

Many moral philosophers regard scientific research as irrelevant to their work because science deals with what is the case, whereas ethics deals with what ought to be. Some ethicists question this is/ought distinction, arguing that science and normative ethics are continuous and that ethics might someday be regarded as a natural social science. I agree with traditional ethicists that there is a sharp and crucial distinction between the ‘is’ of science and the ‘ought’ of ethics, but maintain nonetheless that science, and neuroscience in particular, can have profound ethical implications by providing us with information that will prompt us to re-evaluate our moral values and our conceptions of morality.

and the body of the paper then starts:

Many moral philosophers boast a well cultivated indifference to research in moral psychology. This is regrettable, but not entirely groundless. Philosophers have long recognized that facts concerning how people actually think or act do not imply facts about how people ought to think or act, at least not in any straightforward way. This principle is summarized by the Humean dictum that one can’t derive an ‘ought’ from an ‘is’. In a similar vein, moral philosophers since Moore have taken pains to avoid the ‘naturalistic fallacy’, the mistake of identifying that which is natural with that which is right or good (or, more broadly, the mistake of identifying moral properties with natural properties).

This naturalistic fallacy mistake was committed by the now-discredited ‘Social Darwinists’ that aimed to ground moral philosophy in evolutionary principles. But:

.. the idea that principles of natural science might provide a foundation for normative ethics has won renewed favour in recent years. Some friends of ‘naturalized ethics’ argue, contra Hume and Moore, that the doctrine of the naturalistic fallacy is itself a fallacy, and that facts about right and wrong are, in principle at least, as amenable to scientific discovery as any others.

Only to a certain extent, I would say. It is true that the ‘ought’ is not logically bound to the ‘is’. We are free to claim that anything ought to be done. But ‘ought’ is substantially restricted by ‘is’. Moral theories cannot require us to do things which are outside of our physical control. ‘This is how we ought to think’ is constrained by ‘This is how we think’. For Greene,

… I am sceptical of naturalized ethics for the usual Humean and Moorean reasons.

Continuing, with reference to William Casebeer’s opinion piece in the same journal issue:

in my opinion their theories do not adequately meet them. Casebeer, for example, examines recent work in neuroscientific moral psychology and finds that actual moral decision-making looks more like what Aristotle recommends and less like what Kant and Mill recommend. From this he concludes that the available neuroscientific evidence counts against the moral theories of Kant and Mill, and in favour of Aristotle’s. This strikes me as a non sequitur. How do we go from ‘This is how we think’ to ‘This is how we ought to think’? Kant argued that our actions should exhibit a kind of universalizability that is grounded in respect for other people as autonomous rational agents. Mill argued that we should act so as to produce the greatest sum of happiness. So long as people are capable of taking Kant’s or Mill’s advice, how does it follow from neuroscientific data — indeed, how could it follow from such data — that people ought to ignore Kant’s and Mill’s recommendations in favour of Aristotle’s? In other words, how does it follow from the proposition that Aristotelian moral thought is more natural than Kant’s or Mill’s that Aristotle’s is better?

The ‘Neural Is to Moral Ought’ series started with an examination of (Mill’s) Utilitarianism, (Kant’s) Deontological ethics and (Aristotelian) Virtue Ethics in turn. All three approaches have their merits and deficiencies. Of the three, I am disinclined towards the dogmatism of Deontological ethics and particularly inclined towards Virtue Ethics because of its accounting for moral growth. The latter is more ‘natural’ because it is in keeping with how our brains physical learn as opposed to being treated as idealized reasoners or rule-followers.

Whereas I am sceptical of attempts to derive moral principles from scientific facts, I agree with the proponents of naturalized ethics that scientific facts can have profound moral implications, and that moral philosophers have paid too little attention to relevant work in the natural sciences. My understanding of the relationship between science and normative ethics is, however, different from that of naturalized ethicists. Casebeer and others view science and normative ethics as continuous and are therefore interested in normative moral theories that resemble or are ‘consilient’ with theories of moral psychology. Their aim is to find theories of right and wrong that in some sense match natural human practice. By contrast, I view science as offering a ‘behind the scenes’ look at human morality. Just as a well-researched biography can, depending on what it reveals, boost or deflate one’s esteem for its subject, the scientific investigation of human morality can help us to understand human moral nature, and in so doing change our opinion of it.

But this is too vague. It says virtually nothing. Greene suggests that something might be profound but provides no idea of how things might actually look ‘behind the scenes’.

Let’s take a step back to ask- what is the purpose of morality? Ethics is about determining how we ought to behave, but to answer that requires us to decide upon the purpose of human existence. Such metaphysical meaning has proved elusive except for religious communities. Without any divine purpose, we are left with deciding meaning for ourselves and the issue then is that our neighbour may find a different meaning which will then determine different behaviour. The conclusion is that the purpose of morality is the balancing of the wants of others against that of ourselves. But this requires us to consider:

  1. What do we want?
  2. How can we understand the wants of others?
  3. How can we cognitively decide?

All three considerations are ultimately grounded in the physics of our brains:

  1. We are free to want whatever we want, but we are all physically very similar so it should come as no surprise that we will have similar wants (food, water, shelter, companionship…).
  2. We need a ‘theory of mind’ (second-order intentionality) in order to understand that others may have wants of their own. We need an understanding of ‘reputation’ (third-order intentionality) to want to moderate our behaviour.
  3. We need a cognitive ability to deliberate in order to make moral choices (in short, to be able to make rational decisions).

(Even the religion opt-out eventually leads us back to the physical brain – how people learn, know and believe is rooted in the physical brain.)

In principle there is no connection between ‘is’ and ‘ought’ and a philosopher can propose any moral theory. But when they do, others provide counter-examples

which lead to prescribing absurd responses. All too often, the difficulty lies not in what should be done in practice but in trying to codify their moral theory and they end up modifying their theory rather than their action!

What if we try to combine the best elements of the three (Utilitarianism, Deontological ethics and Virtue Ethics) main moral theories in order to provide practical moral guidance? Such a synthesis was presented. Ignoring the details here, an extremely brief summary is:

  • We imagine the consequences of potential actions in terms of its effect on the collective well-being of all.
  • In the early stages of growth, we respond with the application of (learnt) simple rules.
  • The less clear-cut those rules are to the particular situation, the less confidence we have in them and we apply more conscious effort into assessing consequences.
  • This provides us with an ability to respond both to the ‘simple’ moral problems quickly and efficiently and to complex problems with considerable attention.
  • We gradually develop more subtle sub-rules that sit upon the basic rules and we learn to identify moral situations and then apply the rules and sub-rules with greater accuracy and speed. This is moral growth.

The resulting ‘mechanistic’ account of moral reasoning is remarkably similar to the ‘hierarchy of predictors’ (‘predictive brain’, ‘variational free energy’) theory of what the brain is doing generally. So, what the brain is doing when there is moral deliberation is basically the same as when there is non-moral deliberation. There is nothing particularly special about moral thinking.

 

Moral Intuition

Greene acknowledges the role of methods of determining judgements other than just ‘Pure Reason’:

There is a growing consensus that moral judgements are based largely on intuition — ‘gut feelings’ about what is right or wrong in particular cases. Sometimes these intuitions conflict, both within and between individuals. Are all moral intuitions equally worthy of our allegiance, or are some more reliable than others? Our answers to this question will probably be affected by an improved understanding of where our intuitions come from, both in terms of their proximate psychological/neural bases and their evolutionary histories.

He contrasts two moral dilemmas (both due to Peter Unger): Firstly, Case 1:

You are driving along a country road when you hear a plea for help coming from some roadside bushes. You pull over and encounter a man whose legs are covered with blood. The man explains that he has had an accident while hiking and asks you to take him to a nearby hospital. Your initial inclination is to help this man, who will probably lose his leg if he does not get to the hospital soon. However, if you give this man a lift, his blood will ruin the leather upholstery of your car. Is it appropriate for you to leave this man by the side of the road in order to preserve your leather upholstery? Most people say that it would be seriously wrong to abandon this man out of concern for one’s car seats.

And then Case 2:

You are at home one day when the mail arrives. You receive a letter from a reputable international aid organization. The letter asks you to make a donation of two hundred dollars to their organization. The letter explains that a two-hundred-dollar donation will allow this organization to provide needed medical attention to some poor people in another part of the world. Is it appropriate for you to not make a donation to this organization in order to save money? Most people say that it would not be wrong to refrain from making a donation in this case.

Now, most people think there is a difference between these scenarios:

  • the driver must give the injured hiker a lift, but
  • it would not be wrong to ignore the request for a donation.

In fact, we can imagine doing a Utilitarian calculation, trading off the benefits between the two situations, and concluding from that that it is more Utilitarian to donate the money it would cost to repair the leather upholstery to charity instead of helping the hiker. But we are then more likely to actually help the hiker anyway and refine the Utilitarian calculus somehow. We override our codified system because it feels like there is ‘some good reason’ why the decision is right. But Greene, like Peter Singer before him, thinks that, whatever that reason is, it is not a moral reason.

And yet this case and the previous one are similar. In both cases, one has the option to give someone much needed medical attention at a relatively modest financial cost. And yet, the person who fails to help in the first case is a moral monster, whereas the person who fails to help in the second case is morally unexceptional. Why is there this difference? About thirty years ago, the utilitarian philosopher Singer argued that there is no real moral difference between cases such as these two, and that we in the affluent world ought to be giving far more than we do to help the world’s most unfortunate people. (Singer currently gives about 20% of his annual income to charity.) Many people, when confronted with this issue, assume or insist that there must be ‘some good reason’ for why it is alright to ignore the severe needs of unfortunate people in far off countries, but deeply wrong to ignore the needs of someone like the unfortunate hiker in the first story. (Indeed, you might be coming up with reasons of your own right now.) Maybe there is ‘some good reason’ for why it is okay to spend money on sushi and power windows while millions who could be saved die of hunger and treatable illnesses. But maybe this pair of moral intuitions has nothing to do with ‘some good reason’ and everything to do with the way our brains happen to be built.

Green identifies the difference as being between ‘personal’ and ‘impersonal’ situations:

The dilemma with the bleeding hiker is a ‘personal’ moral dilemma, in which the  moral violation in question occurs in an ‘up-close-and-personal’ manner. The donation dilemma is an ‘impersonal’ moral dilemma, in which the moral violation in question does not have this feature. To make a long story short, we found that judgements in response to ‘personal’ moral dilemmas, compared with ‘impersonal’ ones, involved greater activity in brain areas that are associated with emotion and social cognition. Why should this be? An evolutionary perspective is useful here. Over the last four decades, it has become clear that natural selection can favour altruistic instincts under the right conditions, and many believe that this is how human altruism came to be. If that is right, then our altruistic instincts will reflect the environment in which they evolved rather than our present environment. With this in mind, consider that our ancestors did not evolve in an environment in which total strangers on opposite sides of the world could save each others’ lives by making relatively modest material sacrifices. Consider also that our ancestors did evolve in an environment in which individuals standing face-to-face could save each others’ lives, sometimes only through considerable personal sacrifice. Given all of this, it makes sense that we would have evolved altruistic instincts that direct us to help others in dire need, but mostly when the ones in need are presented in an ‘up-close-and-personal’ way. What does this mean for ethics? Again, we are tempted to assume that there must be ‘some good reason’ why it is monstrous to ignore the needs of someone like the bleeding hiker, but perfectly acceptable to spend our money on unnecessary luxuries while millions starve and die of preventable diseases. Maybe there is ‘some good reason’ for this pair of attitudes, but the evolutionary account given above suggests otherwise: we ignore the plight of the world’s poorest people not because we implicitly appreciate the nuanced structure of moral obligation, but because, the way our brains are wired up, needy people who are ‘up close and personal’ push our emotional buttons, whereas those who are out of sight languish out of mind.

This is just a hypothesis. I do not wish to pretend that this case is closed or, more generally, that science has all the moral answers. Nor do I believe that normative ethics is on its way to becoming a branch of the natural sciences, with the ‘is’ of science and the ‘ought’ of morality gradually melding together. Instead, I think that we can respect the distinction between how things are and how things ought to be while acknowledging, as the preceding discussion illustrates, that scientific facts have the potential to influence our moral thinking in a deep way.

But again, this is all rather vague.

Relating this to what I have previously discussed…

  • The ‘hierarchy of predictors’ model describes the way in which many levels compete with one another to influence behaviour (spreading from reflex to rational, via sensorimotor, emotional, subconscious and conscious levels . Lower levels will dominate action in familiar moral situations. But in unfamiliar circumstances or when the problem consists of two familiar reactions with contradictory actions, lower levels will less confident about their response and control will effectively be passed upwards for (slower) rational judgement. In a decision between helping the bleeding hiker and donating to charity, rational deliberation gets shut out by the lower level emotional and intuitive response.
  • Patricia Churchland shows that our caring originates in our brain, such as in the way that the greater density of oxytocin receptors in the nucleus accumbens and a greater density of vasopressin receptors in the ventral pallidum (both nucleii are part of the basal ganglia at the base of the forebrain) makes the significant difference in behaviour between the otherwise-similar (monogamous) Prairie Vole and Montane Voles. The ‘up-close-and-personal’ proximity effect of alloparenting expands this beyond the family to the ‘In-Group’. But oxytocin is not a magic bullet. It improves empathy with the In-Group but it actually works against Out-Group members.

The physical construction of the brain seems to provide one ‘some good reason’ why immediate ‘up close and personal’ situations elicit a moral response in the way that slowly-rationalized situations do not. (A frequent rational response of worldwide charities to appeal to us is not by presenting facts about the suffering of many, many thousands but it is to present an image of a single individual suffering, furnishing them with a name and a story of misfortune – to make the problem ‘up-close-and-personal’.)

If we truly do want to have a morality that does not prioritize those ‘up close’, then we need to provide some compensation mechanisms to our decision making – consciously equalizing out our emotions. But our emotions can play an important positive role. Empathy is a very significant factor in creating habits that underpin the balancing of the wants of others against the wants of oneself. Yes, we must learn the virtue of balancing others against ourselves, but we must also learn the virtue of balancing reason against our emotions.

 

Moral Realism

Greene then shifts attention to Moral Realism:

According to ‘moral realism’ there are genuine moral facts, whereas moral anti-realists or moral subjectivists maintain that there are no such facts. Although this debate is unlikely to be resolved any time soon, I believe that neuroscience and related disciplines have the potential to shed light on these matters by helping us to understand our common-sense conceptions of morality. I begin with the assumption (lamentably, not well tested) that many people, probably most people, are moral realists. That is, they believe that some things really are right or wrong, independent of what any particular person or group thinks about it. For example, if you were to turn the corner and find a group of wayward youths torturing a stray cat, you might say to yourself something like, “That’s wrong!”, and in saying this you would mean not merely that you are opposed to such behaviour, or that some group to which you belong is opposed to it, but rather that such behaviour is wrong in and of itself, regardless of what anyone happens to think about it. In other words, you take it that there is a wrongness inherent in such acts that you can perceive, but that exists independently of your moral beliefs and values or those of any particular culture.

I think torturing cats is not just wrong but universally wrong. Universally wrong means that it is wrong in all societies. Across societies, we understand sufficiently the same about what ‘wrongness’ and ‘morality’ actually mean that, when presented with a clear (black and white) moral case, we can all agree on whether that case is right or wrong. It is not that there is some absolute truth of the matter, just that similar agents understanding of common concepts leads to common knowledge. Universally wrong is not the same as absolutely (‘real-ly’) wrong.

Surveying cultures around the world across all civilisations, we find that they have surprisingly similarly moralities. It is not that one society accepts stealing but not murder and another accepts murder but not stealing! The differences are predominantly down to how liberal or conservative a society is. Liberal societies have a shorter list of vices than conservative ones. For example, the way an individual dresses is seen as a matter of aesthetics or custom for liberal (e.g. U.S) societies but a matter of morality for conservative (e.g. Muslim) societies.

There are clear cases of what is right and wrong that apply across most if not all human civilizations. It is in the less clear-cut cases that they differ and hence moral problems arise.

This realist conception of morality contrasts with familiar anti-realist conceptions of beauty and other experiential qualities. When gazing upon a dazzling sunset, we might feel as if we are experiencing a beauty that is inherent in the evening sky, but many people acknowledge that such beauty, rather than being in the sky, is ultimately ‘in the eye of the beholder’. Likewise for matters of sexual attraction. You find your favourite movie star sexy, but take no such interest in baboons. Baboons, on the other hand, probably find each other very sexy and take very little interest in the likes of Tom Cruise and Nicole Kidman. Who is right, us or the baboons? Many of us would plausibly insist that there is simply no fact of the matter. Although sexiness might seem to be a mind-independent property of certain individuals, it is ultimately in the eye (that is, the mind) of the beholder.

I have previously looked at how aesthetics and moral knowledge are just particular forms of knowledge. Moral knowledge is neither uniquely nor totally separate from the physical world of what ‘is’. Aesthetics is the same; it is dependent on things like our (neural) ability to perceive and on our emotions (such as disgust).

The big meta-ethical question, then, might be posed as follows: are the moral truths to which we subscribe really full-blown truths, mind-independent facts about the nature of moral reality, or are they, like sexiness, in the mind of the beholder?

Elsewhere, I have examined how truth is ‘in the mind of the beholder’ – that knowledge (crudely ‘facts’) grows within our brains, building upon earlier ‘facts’ such that it both corresponds with our personal experience and coheres with what else we know. The apparent universality of ‘facts’ (including moral knowledge) arises because we grow up:

  • in the same (or very similar) environment as others, and
  • in a shared culture, meaning that we (more explicitly) learn the same as others.

For our ‘rational’ upper levels, our lower levels (including our emotional urges) are just part of the environment in which we grow up (a very immediate part, mind you).

One way to try to answer this question is to examine what is in the minds of the relevant beholders. Understanding how we make moral judgements might help us to determine whether our judgements are perceptions of external truths or projections of internal attitudes. More specifically, we might ask whether the appearance of moral truth can be explained in a way that does not require the reality of moral truth. As noted above, recent evidence from neuroscience and neighbouring disciplines indicates that moral judgement is often an intuitive, emotional matter. Although many moral judgements are difficult, much moral judgement is accomplished in an intuitive, effortless way.

In my worldview, the appearance of moral truth does not require the reality of moral truth!

With the ‘hierarchy of predictors’ model of the brain, it should be expected that moral judgements, like judgements of other forms of knowledge, are typically accomplished in an intuitive, effortless way – by the lower levels of the hierarchy. It is what we do with the exceptional, difficult decisions that is interesting – those decisions that are propagated up to the higher levels that have our conscious attention.

We are limited by the specifics of our physiology and neurology associated with the instruments that our senses  (although we can now build external instruments to extend our senses). We cannot like or dislike what we cannot sense.

An interesting feature of many intuitive, effortless cognitive processes is that they are accompanied by a perceptual phenomenology. For example, humans can effortlessly determine whether a given face is male or female without any knowledge of how such judgements are made. When you look at someone, you have no experience of working out whether that person is male or female. You just see that person’s maleness or femaleness. By contrast, you do not look at a star in the sky and see that it is receding. One can imagine creatures that automatically process spectroscopic redshifts, but as humans we do not.

All of this makes sense from an evolutionary point of view. We have evolved mechanisms for making quick, emotion-based social judgements, for ‘seeing’ rightness and wrongness, because our intensely social lives favour such capacities, but there was little selective pressure on our ancestors to know about the movements of distant stars. We have here the beginnings of a debunking explanation of moral realism: we believe in moral realism because moral experience has a perceptual phenomenology, and moral experience has a perceptual phenomenology because natural selection has outfitted us with mechanisms for making intuitive, emotion-based moral judgements, much as it has outfitted us with mechanisms for making intuitive, emotion-based judgements about who among us are the most suitable mates.

Or much as natural selection has outfitted us with mechanisms for making intuitive, emotion-based judgements about anything.

Therefore, we can understand our inclination towards moral realism not as an insight into the nature of moral truth, but as a by-product of the efficient cognitive processes we use to make moral decisions. According to this view, moral realism is akin to naive realism about sexiness, like making the understandable mistake of thinking that Tom Cruise is objectively sexier than his baboon counterparts.

Both intuition and emotion play an important part in moral deliberation just as it does in other forms of deliberation.

Greene has just been making vague comments so far. But then he makes a comment that is acute:

Others might wonder how one can speak on behalf of moral anti-realism after sketching an argument in favour of increasing aid to the poor

to which his reply is

giving up on moral realism does not mean giving up on moral values. It is one thing to care about the plight of the poor, and another to think that one’s caring is objectively correct.

I have emphasized the importance of caring in creating a moral society and looked at its biological foundations. It is largely true that we act morally because we care.

… Understanding where our moral instincts come from and how they work can, I argue, lead us to doubt that our moral convictions stem from perceptions of moral truth rather than projections of moral attitudes.

A case has been presented of how our neurology promotes caring to extend, via oxytocin, alloparenting, group behaviour and institutional trust, to very large societies in which we care for complete strangers. This is how our moral convictions arise. Our morals are contingent on culture and environment and not on absolute moral truths. Our moral instincts that make us to help the injured hitchhiker (emotionally, quickly) and ignore the appeal through the letterbox (deliberatively, slowing, consciously) are built upon the ‘up close and personal’ origins of our caring. It could not be otherwise. Our logical/rational/deliberative higher levels of cognition are built (evolved) upon lower, quicker instinctive levels.

Some might worry that this conclusion, if true, would be very unfortunate.

First, it is important to bear in mind that a conclusion’s being unfortunate does not make it false.

This is true for moral determinism as well as moral instincts (our instincts are that we are free but the scientific evidence points towards determinism). The unfortunate conclusion of determinism all too often made is that the lack of free will and therefore cannot punish transgressors for actions they could not have avoided. And hence moral order dissolves.

Second, this conclusion might not be unfortunate at all.

I have argued elsewhere that we might not have ‘free will’ as conventionally understood but that will still have freedom and can still be held responsible. The moral order can be maintained. But furthermore, recognizing that some individuals do not have the control they are traditionally purported to have, we will be less retributive and we will be more prepared to intervene in order to design a society that further improves well-being (yes, in a scientific way).

 

Posted in Uncategorized | Tagged , , , , , , , , , , | Leave a comment

Getting Started on Deep Learning with Python

 

An Introduction to Deep Learning

In Karl Friston’s wonderfully entitled paper ‘The history of the future of the Bayesian brain’, he recalls his working with Geoffrey Hinton, how Hinton emphasized Bayesian formulations and generative models, and how Friston developed his biological minimization of ‘Variational Free Energy’ theory from Hinton’s ideas, adopting Hinton’s references to Free Energy, Kullback–Leibler divergence and Helmholtz and Boltzmann Machines within the field of artificial neural networks.

Hinton (co-)invented ‘Boltzmann Machines’ which are recurrent  artificial neural networks that have randomized weights or neuron function (i.e. ‘stochastic’) and he also invented fast learning algorithms for  ‘Restricted Boltzmann Machines’ (where neurons have connections to neurons in other layers but not to those in the same layer).

He modestly claims that his efforts over the decades led to a 10-fold increase in performance but that, during this time, Moore’s Law increased computing power by 100,000! Added to that was the new availability of large data sets with which to train networks.

But the result of all this was that ‘deep’ neural networks (those with more than 1 hidden layer i.e. those with more than 3 layers in total) were able to perform very good feature extraction in a reasonable time. Lower layers in the hierarchy extra simple features uppon which the higher layers can extract more and more elaborate features. This then resulted in a rapid commercialization of such algorithms for applications like speech recognition, as used in Google Voice search and Apple’s Siri.

So now the emeritus Professor Hinton is a founding father of ‘Deep Learning’ and works part-time at Google.

A new strand of posts here will look at Deep Learning and how it works. These will be based around the Python computer language. This ‘Introduction to Deep Learning with Python’ video by Alec Radford at indico talks through some Python code for optical character recognition. Below, I cover installing all the code and applications to be able to run the code shown in the video, to get us started.

 

Overview of Installing Python

To get this code running on a Windows PC, we need:

  1. The python source code.
  2. Python itself
  3. The NumPy maths package, required by the source code.
  4. The Theano numerical methods Python package, required by the source code.
  5. ‘Pip’ (‘Pip Installs Python’) – for installing Python packages!
  6. The ‘MinGW’ gcc compiler, for compiling the Theano package for much faster execution times.
  7. The MNIST data set of training and usage character bitmaps.

 

Installing Anaconda

Anaconda2 provides 3 of the above:

  • Python 2.7
  • NumPy
  • Pip

Go to:

https://www.continuum.io/downloads

and go to the ‘zipped Windows installers’ (to work whether behind a firewall or not).

Download the latest 32-bit version for Python 2:

Anaconda2-2.5.0-Windows-x86.zip

Double-clicking on the downloaded ZIP file automatically pushes through to the Anaconda2-2.5.0-Windows-x86 application (Windows understands ZIP compression format). Double-click on this Anaconda2-2.5.0-Windows-x86  application to install Anaconda. Selecting to install ‘just for me’ will probably be easier hence install to the user area – C:/Users/User/Anaconda2_32. (Add the ‘_32’ suffix as in case we need to install a 64-bit installation later on.)

Have ‘yes’ ticked for adding Anaconda to PATH. Have ‘yes’ ticked for Anaconda to have the default Python 2.7. Installation then takes a while.

 

Installing the Main Python Packages

Locate the ‘Anaconda Prompt’ – easiest through the Windows search. This opens a command shell.

Go to the Anaconda2_32\Scripts directory:

cd Anaconda2_32\Scripts

‘Pip’ (pip.exe0 and ‘Conda’ (conda.exe) will be in here.

Installation will generally use Conda rather than Pip. Ensure you have the latest packages to install, but first ensure you have the latest Conda to install them!:

conda update conda

Select ‘y’ if not up to date. Continue:

conda update –all

Finally, install the desired packages:

conda install scipy

conda install numpy

 

Installing GCC for Compiling the Theano Package

The Theano numerical methods package can be interpreted but this will be very slow. Instead, the package should be compiled. For this, the MinGW (‘Minimalist Gnu for Windows’) compiler should be installed. Follow the link from:

http://www.mingw.org/wiki/Getting_Started

to SourceForge to automatically download the setup executable:

mingw-get-setup.exe

into the Downloads directory.

Double-click this and install this. Select

C:\User\Users\MinGW

as the install directory (for consistency with the Anaconda2-32 installation).

 

Setting the Path to point to GCC

To ensure that Conda will ‘see’ the compiler when doing the Theano installation, confirm that the PATH environment variable compiler points to it. Select:

Start -> Control Panel -> System -> Advanced -> Environment Variables

(Alternatively, in the Search window, type Environment and select ‘Edit the Environment Variables’.)

Double-click on ‘PATH’ and add MinGW to the start/top of the list. It should point to:

C:\Users\User\MinGW\lib

C:\Users\User\Anaconda2_64

C:\Users\User\Anaconda2_64\Scripts

C:\Users\User\Anaconda2_64\Library\bin

 

Installing the Theano Package

Then install the Gnu c++/g++ compiler to speed-optimize the Theano library. In the ‘Anaconda Prompt’ shell, ensure that you are in the correct directory:

cd \Users\User\Anaconda2_32\Scripts

and type:

conda install mingw libpython

And finally install the numerical methods python library ‘Theano’:

pip install theano.

 

Download the Example Python Code

The text with the YouTube video points to the code at:

https://github.com/Newmu/Theano-Tutorials

and click ‘Download ZIP’. Double click on the downloaded ZIP and copy the Theano-Tutorials directory to C:\Users\User\Anaconda2.

 

Downloading the MNIST Character Dataset

The MNIST character dataset is available through Yann LeCun‘s personal website:

Windows cannot unzip ‘gzip’ (*.gz) files directly. I you don’t have an application to do this, download and run ‘7zip’:

http://www.7-zip.org/

Gzip (*.gz) need to be associated with ‘7zip’. Then double-click on each gzip file in turn and ‘extract’ the uncompressed files from them. These should all be installed under:

C:\Users\User\Anaconda2_32\Theano-Tutorials-master\media\datasets\mnist

There is a mismatch between the filenames in the MNIST dataset and the file references in the Python code. Using the Windows Explorer, change the ‘.’ in all the filenames to a ‘-‘ e.g. rename train-images.idx3-ubyte to train-images-idx3-ubyte.

 

Running the Code

The Anaconda installation includes the ‘Spyder’ IDE for Python. Search for ‘Spyder Desktop App’ and run.

Browse to set the working directory (top right) to:

C:\Users\User\Anaconda2_32\Theano-Tutorials-master

An open the first Python script (File -> Open):

0_multiply.py

This shows the source code.

Select  Run -> Run (F5) to execute this code.

Selecting other programs are likely to result in either a ‘memory error’ or ‘No module named foxhound.utils.vis’.

The memory error issue can be overcome by running the code from the Anaconda Prompt:

cd C:\Users\User\Anaconda2_32\Theano-Tutorials-master

python 4_modern_net.py

This still means that 3_net.py and 5_convolutional_net.py cannot be run, and what the other programs are actually doing hasn’t been discussed. That is left for another time.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

The Great and the Good

Why Do the Rich Have a Different Moral Calculus?

 

Albert Loeb

Pinterest

Albert Loeb, father of Richard

The traditional system of justice rests on the foundation that the minds of individuals generally all have the same ability of choosing courses of action and hence they can all be equally blamed when those courses of action are wrong.

But with a modern, Physicalist worldview, we recognise that our behaviour is dictated by circumstances beyond our choosing. To return to a previous example, lawyer Clarence Darrow appealed to the compassion of the judge to spare the death penalty on Richard Loeb:

“What had this boy to do with it? He was not his own father; he was not his own mother; he was not his own grandparents. All of this was handed to him. He did not surround himself with governesses and wealth. He did not make himself and yet he is to be compelled to pay.”

Now, if this applies to blame then it applies equally to its opposite, praise.

If transgressors cannot be blamed (in the traditional, direct sense) for their deeds, then the successful cannot be praised for their achievements either.

Consider Richard Loeb’s father, Albert Loeb (1868-1924), as an example of a high achiever. After enjoying a good education, he set up a law practice in Chicago that quickly gained Sears, Roebuck & Company as a client for whom he went on to work for directly, eventually becoming vice president. He had reached the heights of social standings and was able to surround himself with wealth: a mansion in an affluent part of Chicago, a Model Farm in Michigan with a schoolhouse for the workers’ children. And governesses for his own children.

Hyde Park Herald

The home of Alfred Loeb and family, 5017 South Ellis Avenue, Kenwood, Chicago. (Barack Obama’s house is on the adjacent street, South Greenwood Avenue.)

As with other high-achievers, he was presumably proud of his achievements in life and felt that he had achieved his rewards as a result of his personal abilities without very much being credited to his fortunate circumstances in which we was born and grew up

With a Physicalist worldview, It is not just that…

Some people are born on third base and go through life thinking they hit a triple

But it is also that:

‘If you can hit a triple, that automatically puts you on third base to start with!’

 

 

How the Rich Behave

chxhistory.com

Loeb’s Model Farm, Charlevoix MI

There has recently been much general media coverage on research about how the moral behaviour and reasoning of those at the top of the social tree differs from that of the rest of us. For example, from research by Paul Piff:

  • They are more likely to lie and cheat when gambling or negotiating,
  • They are more likely to endorse unethical behaviour in the workplace.
  • They exhibit reduced empathy, favouring ‘rational’ utilitarian choices (rather than more intuitive, emotional responses) such as being more likely to take resources from one person to benefit several others.

That last is from `trolleyology’ experiments . Another ‘method’ is to equate high-status cars with high-status drivers and observe behaviour. For example, drivers of high-status cars are more likely to cut other drivers up and not stop for pedestrians at crossings.

Piff et al: 'Higher social class predicts increased unethical behavior'

‘Mean machine’: Another BMW driver fails to stop for a pedestrian.

Elsewhere, I have defined morality as being about balancing the wants of oneself with those of others. Piff frames the behaviour of the rich in terms of such a balance:

‘the rich are way more likely to prioritize their own self-interests above the interests of other people.’

(He calls this ‘the asshole effect’!)

Kathleen Vohs is another high-profile researcher in this area. Experiments of hers concluded that just thinking about money decreases empathy, shifting the balance from others to oneself. But she believes this effect is a result of a lack of interest rather than malicious. For ‘money-primed’ individuals:

“It’s not a bad analogy to think of them as a little autistic.”

In the relationship between affluence and selfishness, which is the cause and which is the effect? The cause can be one of:

  • The environment: Being rich makes you less empathetic, or
  • The agent: Being less empathetic makes you rich.

Others have questioned the quality of research like this – for its subjectivity and inadequate sample size. (Far worse is the case of Diederick Stapel, who faked the data for similar research papers.)

But even if the data is frail or faked, we are inclined to go along with their conclusions because either:

  1. they ring true with our own anecdotal experience (e.g. that BMW drivers tend to be inconsiderate of other road users) – the ‘science’ only confirms ‘what we already knew’, or
  2. we want them to be true.

 

Charitable Giving

Looking at donations to charity is another way of assessing how much people think of others. Crucially, for this there is a vast amount of data available to analyse, from tax returns. One study analysed donation data from 30% of U.S. tax returns, a huge set. This is not without its problems but it does overcome sample size problems. Ranking the largest 50 U.S. metropolitan areas based on the percentage of people’s income given to charity, Salt Lake City was at the top, accompanied by the Bible Belt cities of the South East. The affluent Silicon Valley cities, San Francisco and San Jose, were nearly at the very bottom. Silicon Valley has long had a reputation for low level of charitable donations. (It has also been associated with a high prevalence of the diagnosis of autism/Asperger’s syndrome.)

The story is similar in the UK, Scotland and the Midlands donate more generously (proportionately) than those from more affluent London and the South East.

philanthropyroundtable.org

Charitable giving as a function of income

Major factors that influence charitable generosity are

  • being married, and
  • regular attendance of religion services.

Religion is the factor that transforms the graph of percentage-giving-versus-income from one that declines with increasing income to a ‘U’ curve (see above). But it is only a relatively small proportion of the very wealthy that are doing the giving.

The use of charitable donations as an indicator of generosity is not straightforward – the relationship is obscured by including donations to political / ideological causes as well as traditional charitable ‘good causes’. But even after compensating for this, those who regularly attend religious services still donate more to secular ‘good causes’ than those who don’t. But this can simply be attributed to the habit of being regularly reminded of others needs at those services. The relative meanness of those who do not attend regular religious services can be attributed to not being made consciously aware of others’ needs so frequently – ‘out of sight; out of mind’.

Other factors affecting charitable giving include:

  • Living in rural rather than urban areas. (Note: those in cities are generally better educated.)
  • Increasing age (ignoring the effect of bequests).
  • Living in mixed rather than ‘gated’ communities.

It would also appear that conservatives are more generous than liberals but there is no statistically significant difference between them per se; the high level of donations of conservatives can be accounted for by their higher religious attendance.

 

Affluent Societies

Taking what has been said above, an overall picture emerges. Compared with more ‘traditional’ societies, in modern Western societies:

  • People are more likely to be single. Relationships have less commitment.
  • There is less attendance of religious services: less social connectedness to those living in the vicinity. Less regular exposure to those less fortunate.
  • The majority of the population now live in an urban environment: day-to-day interactions with others are more likely to be anonymous rather than with those you know personally.
  • People are better educated: moral deliberation is done with a wider perspective than the local/immediate/emotional.
  • People are more individualistic: Occupations are more specialised and there is more leisure time to define oneself by.
  • People are more affluent: they have more material goods to ‘play’ with and use, with consequent reduced contact with others. Particularly relevant here is car ownership, isolating people when tranversing between home and work.
  • People are more isolated from one another: they are likely to living in ‘good’ or ‘bad’ neighbourhoods where people are more like themselves. Their interaction tends to be more with those of their own age. This is all particularly acute for ‘gated communities’.
  • There is less dependency and there is higher monetization: we are less dependent on other specific people, and their goodwill. If we want something, we can just buy it with the minimum of personal interaction and generally from one of a number of anonymous providers.

All these factors lead to reduced empathy towards people around us. This is an effect of the environment.

However, it must be emphasized that this is a local effect. Modern Western society supports a huge population, becoming a more homogeneous ‘global village’ whereas ‘traditional’ societies tend to be small and much less tolerant to outsiders.

On balance, a reduction in local empathy might not be a problem if society was quite uniformly affluent. But there are huge societal differences. The reduced empathy of the powerful leads to narcissism and insensitivity and works to the detriment of the weak.

As already said, morality is about balancing the wants of the individual against those of others within society.

  • A ‘traditional’ environment is likely to be physically harsh. This balancing must be skewed towards the wider needs of the group. The community needs religion to bind itself together. There must be strongly codified acceptable behaviours.
  • A modern, Western environment is physically benign and can support greater independence and the moral balancing can shift towards the individual.

This shift is most pronounced for the most affluent.

 

Entitlement and Narcissism

https://suzie81speaks.com/tag/first-world-problems

In extreme cases, the balance is completely shifted towards the self. Such people have:

  • An affluence which means that all ‘basic’ worldly needs are easily met: food, shelter, safety, belonging and self-respect.
  • A lack of empathy.
  • A ‘cold’ application of reason that directs action.

Plus:

  • A preparedness to sacrifice others (dispassionately) for a greater good, or
  • Completely no personal regard for others.

 

The former case of sacrificing others is one of ‘extreme Utilitarianism’ – a preparedness or a sense of entitlement to act. Moreover it is an entitlement to act alone (based just on one’s own perceptions of reality). There is a gradual transition from personal morality to political morality here. A government department is entitled to take actions that impersonally sacrifice some people for others (buy drugs for one medical condition at the expense of others for another). A political leader, supported by the institution of government is entitled to take actions that impersonally sacrifice some people for others (wage war). But when a group of insufficient size thinks it is entitled to impersonally sacrifice some people for others, it is terrorism.

(The problem with the classic ethical thought experiments such as

is that these scenarios apply ordinarily to groups, not individuals.)

An example is the case of Anders Behring Breivik, responsible for the 2011 terrorism in Oslo and on Utøya. Before his killing spree, he released a 1500-page account of his worldview concerning the preservation of European culture against Islamisation. Although delusional (and homophobic and misogynistic and …), there is an intellectualized dimension to his cause, and a willingness to enforce significant sacrifices in order to further that cause (incarceration for himself but death for many others). Breivik would probably diagnose his motivations as part of his personal self-actualization. Psychiatrists on the other hand attributed his acts to narcissistic personality disorder (exacerbated by Asperger’s).

The latter case of having no regard for others is one of megalomania, for which there are plenty of examples throughout history. Its juvenile form is one of insufficient competence, such as with the case of Richard Loeb.

 

(This is the twentieths part of the ‘From Neural Is to Moral Ought’ series.)

Posted in Uncategorized | 1 Comment