People’s Capitalism

News articles and reports appear almost daily on the subject of how technological developments in Artificial Intelligence and robotics will cause dramatic changes to employment over the next few decades. The stereotypical article will refer to Frey and Osborne’s claim that 47% of US jobs will be affected (see my previous post) and will take one of 2 lines:

  • We have been scared about technological unemployment before but we should not be so arrogant to believe that just because we can’t think of what new jobs might come along to replace them then no one else will either. We will just have to educate our workforce to make them flexible for whatever comes along.
  • ‘This time it’s different’. Jobs will be lost and we will have to adjust our entire economic system to deal with this. We don’t really know what we should be doing, but it is possible that Universal Basic Income might provide the answer.
James Albus

James Albus

James Albus’s book ‘Peoples’ Capitalism: The Economics of the Robot Revolution’ is different though. It surprises in a number of ways:

Unlike others, he provides detailed proposal of economic changes to gradually transform the economic system to one in which the majority of work has been automated.

Yet he was not an economist himself. He was an engineer, interested in the neuroscience that inspired his area of technology and interested in the consequences of his area of technology on society:

Perhaps most surprisingly, given what it says, is that it was published over 40 years ago – in 1976! It is timely and remarkably prescient.

Below, I provide not a summary but an abridgement of the book (down to about 20% of its original size) so that it still predominantly retains the voice of the author. Bold emphasis is mine – for particularly interesting phrases, biased toward concerns for the individual and the environment.

James Albus's ‘People's Capitalism - The Economics of the Robot Revolution’

James Albus’s ‘People’s Capitalism – The Economics of the Robot Revolution’

Preface: Epilogue to Scarcity

We are now on the brink of a new industrial revolution, based on the substitution of electronic computers for the human brain which will change the history of the world every bit as profoundly as the first. A new generation of machines will create wealth unassisted by human beings, and so allow the human race to free itself from the dehumanizing demands of mechanization. It will free people from having to structure their lives around daily employment in factories and offices so that they can choose their own lifestyles from a much wider variety of possibilities.

Human benefit is the ultimate measure of goodness for any social or economic system. Unfortunately, the present economic system is not structured to deal with the implications of a robot revolution –  automated factories would threaten jobs and undermine the financial security of virtually every family. America is a nation of wage earners, and in a very real sense, ‘wage-slaves’.

This book is an attempt to address some of the fundamental problems of income distribution and capital ownership in a society where most goods and services could be produced by machines.  We have an outmoded system of incentives that does not make use of what is available to produce what is needed. If we properly utilized our scientific knowledge and our industrial capacity, we could eliminate poverty and guarantee personal financial security to everyone, in an environmentally and balanced way.

The great challenge will be the development of an economic system to achieve this.

The ‘People’s Capitalism’ proposal here proposed three new institutions:

  1. A National Mutual Fund (NMF) to finance capital investment in socially beneficial industries. The NMF would be authorized by Congress to borrow money from the Federal Reserve System. Profits from investments would be paid by the NMF to the general public in the form of dividends so that the average citizen would receive income quite independently of employment. Every citizen would become a capitalist in the sense of deriving a substantial percentage of their income from dividends paid on invested capital.
  2. A Demand Regulation Policy (DRP) would be instituted to provide sufficient savings to offset NMF investment spending. This would prevent short term demand-pull inflation by withholding income (graduated according to income) from consumers by mandatory payroll deductions and convert it into savings bonds.
  3. A Federal Department of Science and Technology would focus modern technology more directly on problems relevant to human needs.

Within three decades, these proposals would lead to:

  1. A society where every citizen would derive a significant fraction of his or her income from invested capital.
  2. A society where industrial ownership and economic power would be distributed widely enough so that every citizen would be financially independent.
  3. A society where people would work primarily for pleasure or for supplemental monetary benefits. Noone would be forced to work out of economic necessity.
  4. A society where a diversity of lifestyles would flourish and rewards for achievement would be high.
  5. A society in which prices would be stable and prosperity could be maintained without planned obsolescence, make-work, waste, or pollution.

Without any significant changes in our constitutional form of government, it would revitalize the free-enterprise system to mobilize the full creative resources of our scientific and industrial capacity in a national effort to solve our most pressing human problems.

I am not a professional economist. As a scientist, I tend to ask what is possible, not what is customary. I have been trained to ask simple questions and to distrust complicated answers.

The questions to be addressed are:

  • If robots eventually do most of the economically productive work, how will people receive an income?
  • Who will own these machines?
  • Who will control the powerful economic and political forces they will represent?

In this book, I have attempted to go beyond simply asking questions and have proposed some solutions. I do not claim that my solutions are the only possible ones, or even the best. I do believe they are a step in the right direction.



Preface: Epilogue to Scarcity

I: America: The Affluent Society?

  • The Inadequacy of Conventional Economics

II: The Paradox of Poverty Amidst Plenty

III: How We Distribute Wealth

  • Pressures for Full Employment
  • Pressures for Unemployment
  • Handcraftsmanship and Personal Services
  • Women’s Liberation
  • The Importance of Our Cultural Heritage

IV: The Threat of Productivity

  • The Effect of Investment
  • The Threat to Jobs
  • Automation and Power: Economic and Political
  • The Concentration of Ownership
  • Computers and Robots

VI: Peoples’ Capitalism: An Alternative

  • The Employee Society
  • The National Mutual Fund
  • The NMF and Free Enterprise
  • Incentives for Diversity

VII: Peoples’ Capitalism and the Individual

  • Financial Security and Personal Freedom
  • The NMF and Individual Incentive
  • The Effect on Political Freedom
  • A Bigger Pie with Bigger Slices

VIII: The Quest for Stable Prices

  • Productivity and Prices
  • A Different Strategy
  • Investment Payback Delay
  • Monetary Policy
  • Tax Policy
  • Budgetary Policy
  • Time for a Change

IX: A Formula for Price Stability

  • Part 1: Dealing with Excess Demand
  • Part 2: Dealing with Insufficient Demand

X: A Department of Science and Technology

  • A Role for the Federal Government
  • Science at the Cabinet Level

XI: Peoples’ Capitalism in a Finite World

  • The NMF and Limits to Growth
  • An Alternative to Urbanization
  • Continued Growth and the Environment

XII: From Throughput to Storehouse Economics

  • The NMF and Storehouse Economics


I: America: The Affluent Society?

We produce fantastic quantities of almost everything imaginable and are clearly capable of producing much more, but we distribute this output so poorly that almost twenty percent of our population lives either near or below the poverty line. Millions of Americans are undernourished and without adequate medical care. Millions more live in dilapidated homes and slum tenements. Our cities are dying from neglect and decay. Public transportation is inadequate or non-existent. Streets are lined with abandoned buildings inhabited only by dope addicts and alcoholics. Urban neighborhoods are terrorized by muggers and racketeers. Garbage fills streets and alleyways.

Very few people feel that they have any significant margin of financial security. The lifestyle of the average middle-class family could most accurately be described as affluent poverty. Most families are heavily in debt. In many households both the husband and wife are forced to work.

The Inadequacy of Conventional Economics

We possess the agricultural capacity to feed our hungry children many times over. We have a construction industry easily capable of rebuilding our cities. We have the technological and intellectual resources to improve medical care, reduce pollution, and make our communities safe, clean, and livable. The wealth-producing potential inherent in modern physics, electronics, chemistry, nuclear engineering, semiconductor technology, and computer-based automation are awesome and totally unprecedented.

Unfortunately, they cannot be fully exploited for the benefit of all until some means other than wages and salaries is found for distributing the additional wealth they could create to the average citizen.

The existing system has no adequate mechanism for organizing or financing a really serious effort at eliminating the wretched conditions under which a large number of American citizens still live. It depends on mass consumption to sustain prosperity. If poverty is to be eliminated, some new system must be devised wherein the emphasis could be placed on conservation rather than consumption.


II: The Paradox of Poverty Amidst Plenty

Ours is an age of cynicism. Utopian dreams went out of style just when science and technology had reached a level where the elimination of physical poverty had become a real possibility. How could we have so seriously mismanaged our resources that tens of millions of Americans are officially classified as poor?

The conventional wisdom is that the poor are different from other members of society and that this difference is the basic cause of their poverty. Most people will admit that, at least to some extent, the poor are victims of their environment. Poor people are often deprived of important advantages and excluded from opportunities, but in the final analysis, most observers have ascribed the blame for poverty to the personal deficiencies of the poor themselves.

The traditional view, exemplified by Michael Harrington (‘The Other America’) and J. K. Galbraith (‘The Affluent Society’) is that cultural deprivation is the cause and the lack of income is the effect. But it is just as reasonable to conclude that lack of income is the cause and cultural deprivation is the effect. F. Scott Fitzgerald is reported to have remarked, “The rich are different,” to which Ernest Hemingway replied, “Yes, they have money.”

Schemes to relieve poverty by cultural enrichment programs has been spectacularly unsuccessful. Bennet Harrison: ‘instead of concentrating government money on so-called ‘defects’ in the poor people, it would be more profitable to focus first on defects in the labor market.’ The only way to deal realistically with poverty is to change the income distribution system so as to narrow the extremes of income inequality.


III: How We Distribute Wealth

Pressures for Full Employment

One of the inevitable effects of distributing income almost exclusively through wages is that it generates overwhelming pressures for full employment.

There are enormous incentives to get and hold a job. The results are that make-work projects of every type and description are created, some of which are not only useless, but positively harmful.  Growth means jobs, and we have written tax laws and zoning ordinances to encourage and foster growth. Marketing and advertising programs are promulgated to create demand for absurd or trivial products. Goods are deliberately designed to quickly become obsolete, either through normal wear or changes in style. America has become a throwaway culture; a society of ‘Waste Makers’ manufacturing disposable products that cannot be repaired or reused.

And much of the memo writing, paper shuffling, and red tape that goes on both in private industry and in government serves no other purpose than to provide work for otherwise unnecessary managers and bureaucrats.

Because virtually the only way to get income is to have a job, a system has been created that is enormously wasteful both in terms of natural resources and human creativity.

Pressures for Unemployment

Paradoxically, the political pressure for full employment, create conditions that virtually guarantee serious unemployment. The increasing ratio of capital to labor has brought continuously rising output per man-hour. This increased output might be attributed to increased skill or increased physical effort on the part of workers, but, in the overwhelming majority of cases, the increased output has been wholly due to more sophisticated machines or more efficient process technology.  More has been produced and thus more must be distributed.

An employer must minimize labor costs to survive and so strive to hire as few persons as possible, not because he has nothing for additional employees to do, but simply because labor is such a significant cost factor that every effort must be made to keep the payroll at a minimum.

But since wages are virtually the only means available for placing purchasing power in the hands of consumers, wages have had to rise to consume this increased output. This means that many useful tasks are simply too expensive to be done. Streets need cleaning, buildings need repair, community health and recreation facilities need to be maintained, but the cost of labor is too high. And similarly in universities and research laboratories.

Milton Friedman has for years argued against the minimum wage laws, not on the basis that they are ideologically repulsive but because they virtually guarantee a high level of unemployment among low-skilled persons. We have an abundance of useful work that needs doing and a surplus of people willing and able to do it. Yet nothing can be accomplished because employers cannot afford to hire people for jobs that are not absolutely necessary.

Handcraftsmanship and Personal Services

The distribution of most of the nation’s income through wages and salaries distorts the nation’s production priorities by constricting the flow of income to just labor that is capital-intensive. The only way for persons with ordinary skills and talents to obtain a decent income is to work for  industries with a high capital-to-labor ratio. The unaided human craftsman or service person simply cannot create wealth as fast as a complex piece of automated machinery.

Handcrafted goods and personal services have virtually disappeared from all technologically advanced societies. We are told that they are victims of progress. But is this really progress?

Where is the gain in forcing people out of nearly self-sufficient lifestyles in rural areas and small towns and crowding them together in urban ghettos where unemployment is epidemic and welfare is the principal source of income? There must be something basically wrong with the system that produces these results.

Women’s Liberation

The distortion of social priorities resulting from constricting the flow of income to the capital-intensive sector discriminates against housewives. A great deal of the wealth that society enjoys is created by the labor of housewives.  It has been estimated that housewives’ work amounts to roughly one quarter of the Gross National Product. Yet the economic system does not pay wages for these services and so they share none of the prestige of having “earned” their money

Yet, their work is critical to the stability of the social order and is certainly more important than much of the paper shuffling and petty office politics that passes for work in offices and executive suites.

The Importance of Our Cultural Heritage

Modern society is complex: almost everything depends on everything else.  It is completely arbitrary to distribute wealth through wages and salaries as if the presently employed labor force were solely responsible for all the wealth created. The entire industrial-technological economic system rests upon a foundation of social stability and most of the output of modern industry is not due to the presently employed labor force at all, but to the capital stock, the scientific, technical, and managerial knowledge, the educational training, and the social and cultural behavior patterns that have accumulated and developed over the past three centuries or more.

Most of the increase in productivity many sectors of the industrial economy, are almost all due to increases in the amount of capital equipment and the sophistication of the machinery and techniques used in the manufacturing process. They are hardly ever the result of any specific efforts of the currently employed labor force.

To distribute the wealth that this society produces almost exclusively through wages and salaries unjustly ignores the contribution of millions of persons who work outside of the formally recognized labor force and grossly distorts the system of values that society places on various types of culturally beneficial activity.

The narrow dependence on wages and salaries virtually guarantees high levels of unemployment and makes poverty inevitable. It wastes a large percentage of our available resources and productive capacity on makework and unnecessary trivia. It leads to the demise of handcraftsmanship and personal services and discriminates against those who work outside the regular labor force.

This economic system, with such fundamental defects, fails to produce up to its potential capacity.

IV: The Threat of Productivity

The most serious cost to society may be the loss of wealth that can never be produced because of the threat to jobs posed by increasing productivity through technological innovation. Productivity is a measure of how much wealth can be produced from a given amount of labor, capital, and raw materials. Increasing productivity means getting more output from less input.

The history of the industrial revolution is a chronology of the development of better and more productive machines for increasing the amount of goods and services that can be produced from a given input of labor, capital, and raw  materials.

During the 18th and 19th centuries, the substitution of machines for hand labor brought a degree of material prosperity to the average citizen. The use of mass production, interchangeable parts, and automatic machines (known as the “American System.”) was not so much to make luxury items for the rich, but to satisfy the demands of the average worker.

Today, increased productivity has become more of a necessity than a luxury. If it were not for high productivity in agriculture, virtually the entire world’s population would be reduced to malnutrition and starvation. If we ever hope to advance beyond our present quality of life toward any of the costly but socially desirable goals such as better health care, more livable cities, and a cleaner environment, major new increases in productivity will be needed.

The Effect of Investment

In the short term, productivity tends to fluctuate with the business cycle. But the vast majority of long-term productivity increases are due to more capital, economies of scale, and improved technology. Investment, of course, is the source of all of these. Increased productivity is ultimately derived from new technology.

We produce more and better cars, ships, planes, dishwashers, computers, and TVs today than 50 years ago not because we work harder but because we know more and we use that knowledge to build machines and factories that produce more output with less input. It is often said that “they don’t build things like they used to” and that is true. If they did, either most workers would have to take a 90% pay cut, or most goods would cost ten times what they do today.

There are numerous reasons for believing that over the next 20 years new technology in the field of computers and robots will make productivity even more sensitive to the rate of investment than was the case during the 1960-1972 period.

The Threat to Jobs

From the very beginning of the industrial revolution, increased productivity has derived principally from the substitution of machines and mechanical energy for human labor in the production process. Machines are essentially helpers or servants that work for free. But the practice of distributing almost all income through wages and salaries virtually assures that automatic machines will sooner or later change roles from helpers to competitors. Human workers typically own no part of the machines with which they work; they benefit from the wealth-producing capabilities of automation only so long as they remain employed.

It is small consolation to know that productivity has risen a fraction of a percentage point if you have just lost your job.

Automation and Power: Economic and Political

Popular science-fiction literature and movies typically depict future hordes of robots threatening their human masters but they completely miss the point of the real danger: the concentration of economic and political power that will fall into the hands of machine owners.

A highly paid but functionally superfluous work force is vulnerable to pressures from the employer establishment. Such a workforce, even though prosperous, is politically impotent, for its prosperity exists solely at the pleasure of the machine owners.

The Concentration of Ownership

1% of the families in the United States presently own over 50% by value of all corporate stock. Less than 5% of American families own more than two-thirds of all stock but control almost all corporate assets. This concentration of economic power in the hands of a tiny super rich elite, accountable to hardly anyone but themselves, shows no significant tendency towards decreasing. The next generation of automation could reduce the entire economic system to complete domination by few super-rich families.

The average citizen simply does not see himself as the beneficiary of massive capital investments in big business. The multinational corporations and the big conglomerates are perceived more as threats than as benefactors.


V: The Advent of Super-Automation

The development of the electronic computer will be viewed by future historians as one of the great milestones in human history. It is qualitatively different from all other machines in several important respects:

  1. Its mechanisms are electronic rather than mechanical, operating many orders of magnitude faster than other devices.
  2. A computer does not wear out in any normal sense of the word.
  3. Most importantly, a computer can store and manipulate large quantities of information and make decisions.

In theory, if not yet in fact, computers are capable of performing almost all of the decision and control functions currently done by humans in the basic manufacturing industries.

Computers and Robots

Almost surely, if computers and robots are cast in the role of competitors to human labor, then human workers will lose just as surely as John Henry was eventually replaced by the steam drill. However, if the ownership of future automatic factories is shared by a large percentage of the population and if the wealth created by automated industries is distributed so as to increase the income of everyone, then the benefits of automatic manufacturing may completely eliminate poverty.

Unfortunately, the existing income distribution system contains no mechanisms designed to prevent direct competition between robot and human labor. so there exists no public support for a major national effort to accelerate the pace of robot development. Hence we cannot reap the  rewards that would arise from the resulting productivity gains. But a second industrial revolution is certainly coming whether the average American wants it or not. The world economic system is structured such that automatic factories are inevitable.

Robot technology, like computer technology, has military as well as economic implications. But even assuming that this technology were never used for military production, the country that possessed such a large surplus of efficient production facilities could easily dominate the world economically simply by selling manufacturing capacity at rates far below what countries using less efficient methods could hope to match.

The development of machines that can create wealth unattended by human workers and, in a sense even reproduce themselves, will have profound effects on human history at least as great as any scientific discovery or political revolution that has ever taken place. Whether this results in unprecedented benefits or economic chaos depends largely on whether we can devise satisfactory answers to the questions: “Who owns these machines? Who controls them, and who gets the wealth they create?”

These are questions that go to the very heart of the income distribution system. As long as we have a system in which only a tiny minority of the people own or control virtually all of the wealth creating capital stock, and the rest of the population must rely on selling their labor for income, we will have a situation where automatic machines and advanced technology will inevitably threaten the security and personal dignity of the average person. Only if we can devise a means by which everyone can share in the control of modern technology, as well as in the wealth that it creates, will the fantastic capacities of the coming generation of super-automation be released to assist mankind in solving the urgent problems of our society.


VI: Peoples’ Capitalism: An Alternative

The great tragedy of the present economic crisis is that it is physically and technologically avoidable. The United States, and indeed the world, has more wealth and power at its disposal today than at any previous time in history. But we have many more jobs that need doing than there are unemployed persons seeking work.

America made more progress against poverty between 1941 and 1945 than ever before or since. If workers prosper during wartime despite the fact that most of what they produce is destroyed, then certainly they should prosper even more if the fruits of their labors were distributed so as to benefit themselves and society. Clearly, if our industrial capacity were mobilized for the benefit of mankind in the way that we know it can be for war, the problems of poverty, pollution, and economic stagnation would cease to exist.

There is something desperately wrong with the fundamental principles of an economic system that allows such overwhelming need to persist while unused capacity sits idle. Establishment economists  cannot answer the central economic question of the industrial era. Why can’t we use what we have to produce what we need?

The simple fact is that most of the truly fantastic capacities of modern technology and industrial power have never been focused on the really important problems of hunger, pollution, and human suffering. We have wasted our resources on trivia and allowed the talents of millions to languish in underemployment.

What are lacking are the appropriate social and political institutions. We must somehow reorganize our system of rewards, incentives, and methods of wealth distribution so that they encourage individual behavior that is beneficial to society and societal behavior beneficial to the individual.

The Employee Society

The genius of free-market capitalism in its early days was the symbiosis between private and public interests that Adam Smith called an “Invisible Hand”. But this has largely disappeared from the present economic system.

A first step in restoring symbiotic harmony to our economic system would be to make our institutions for capital financing and income distribution correspond more closely with reality. We claim to be a capitalist society; i.e., a society based on the concept that private ownership of wealth-producing capital is a legitimate source of personal income. Yet the overwhelming majority of Americans, even in the middle and upper-middle income brackets are simply employees.

America is not a capitalist society at all; it is an employee society. We are wage earners and, in a very real sense, wage slaves. Our economy is choked with makework, featherbedding, mass advertising of trivia, and wasteful use of natural resources and human talent. This is the inevitable result of distributing most income through wages and salaries in an economy where most wealth is created by capital.

If we were really capitalists, then the benefits of productivity increases would be distributed primarily through dividends instead of through wages and salaries. Industrial robots, automatic factories, and computerized offices would then be no threat to jobs. Increased efficiency would benefit everyone.

People do not work any harder now than they did a thousand years ago and they are not inherently any more intelligent. The productivity of the existing labor force today is due to modern equipment, improved knowledge, and more efficient process technology. Human labor has long since ceased to be the most important ingredient in the industrial process; indeed, in many industries, human workers are the principal cause of production defects.

If we admit that machines can run industries just as well, if not better than, people, then we could devise an income distribution system based on something other than employment. We would then have a society where machines provide the fundamental economic base and people are free to develop their creative talents to the fullest.

There will always be some necessary work requiring human effort even in the most automated society. Medical care, teaching, counseling, entertainment, and personal services can never be satisfactorily automated in their entirety. Furthermore, there will probably always be large numbers of people who receive great satisfaction from regular employment.

Nevertheless, it is quite possible to have a hybrid economic system where a basic minimum income would accrue to everyone out of the profits from automatic industries while, at the same time, those who wished to work could supplement their basic income with a salary.

How could such a system be practical? What new institutions would be necessary to implement the distribution of income through public dividends?

The National Mutual Fund

A semi-private investment corporation, the National Mutual Fund (NMF), could be formed. The NMF would earn a profit by investing money in stocks but differ from an ordinary mutual fund in four important respects:

  1. Every citizen would be a shareholder by virtue of his or her citizenship.
  2. The NMF would borrow the necessary investment capital from the Federal Reserve Bank, rather than obtain its investment funds from its shareholders.
  3. The NMF would concentrate its investments on long-term productivity growth, financing the modernization of technically backward industries and the building of new automated factories.
  4. The NMF would distribute the profits from its investments directly to the public on a biweekly basis.

The NMF and Free Enterprise

The National Mutual Fund would not be a branch of government; it would be a profitmaking business institution operated for the primary purpose of earning dividends for its stockholders. The distribution of NMF profits to the public would increase incentives for businesses to weed out sloppy management and poor service. This is a situation that is vastly different from that existing in socialist economies where state-owned and operated businesses have few incentives to be efficient.

Incentives for Diversity

The existence of NMF financing would increase diversity and competition within the private sector and, providing a counter force against the concentration of economic power in the hands of a few enormous corporations. It would reduce the advantage of simply being big through being a ready source of investment capital to small firms as well as large.


VII: Peoples’ Capitalism and the Individual

The most important effect of the NMF would be to increase the personal freedom of the individual citizen.

In America, the physical environment is determined to a large and ever increasing degree by the major corporations. A significant percentage of what we eat, what we wear, what we listen to, what we see, what we live in, what we work at, what we use to get from one place to another is manufactured by the top 100 corporations. The power of the individual citizen to influence this process is virtually nil. We generally either go along with it or drop out.

The NMF would help reverse this trend. Businesses owned by the NMF would belong to the people and thus would be sensitive to pressure from public opinion. Corporate management would be ultimately responsible to the public. This would affect public attitudes toward business, making the average citizen much more aware of the importance of efficient business practices. This might create an atmosphere more conducive to cooperation between labor and management or, at least, help reduce the intensity of the adversarial relationship. The immense wealth and power of the major corporations would gradually be brought under democratic control.

There are many to whom the concept of subjecting  corporate power to democratic control seems revolutionary. But only two hundred years ago, the concept of subjecting governmental power to democratic control was considered by most people in this country to be revolutionary. Fortunately for us, our forefathers had enough confidence in the average citizen to entrust the enormous power of the national government to the democratic process.

Future generations may regard democratic control of industrial power to be as essential to their freedoms as we believe democratic control of the government to be to ours today.

Financial Security and Personal Freedom

NMF dividends would give to every individual a degree of personal independence and freedom from economic constraints. They would have a financial cushion and be more selective than otherwise in choosing their employment such that it offered them a personal sense of accomplishment and fulfillment. They would have more freedom to seek additional education, to choose where they wish to live and to structure their own lives according to their own tastes .

Many individuals would go into business for themselves. Quite likely, there would be a revival of such personally satisfying occupations as handcraftsmanship. The reason why the skilled artisan disappeared was not that people developed a distaste for working with their hands. This is obvious from the fact that many persons today pursue handcrafts as a hobby. Handcraftsmanship as a source of income was effectively destroyed by the advent of machine-made goods that made it impossible to earn an adequate living by hand labor. There is, and always has been, a market for handmade goods. The result would almost surely be a great revival in handcraftsmanship.

Family farming is another example of an occupation that very likely would exhibit a strong resurgence. Sociologists for decades have deplored the urban migration that has led to overcrowded city slums, as well as to depopulated and depressed rural communities. If the citizens of remote rural areas had some source of income from the technological/industrial sector, these regions would easily be self-supporting and revived.

An NMF income might stimulate an increased interest in the arts and in science for its own sake. Great art is sometimes born of adversity, but it is more often a product of affluence. The same holds true of scientific endeavors. The NMF could also be expected to cause a great upsurge in volunteer work of all kinds. If a particular endeavor is interesting enough, people will do it for nothing.

The present day job environment subjects the human body to many stresses (or lack of stresses) for which it is not particularly well adapted. Supplemental NMF income would be much more conducive to mental and physical health than are present jobs.

The NMF and Individual Incentive

It might be argued that an NMF payment would cause a significant percentage of the population to quit working and simply atrophy. But there is evidence to the contrary.

Studies showed that women had a tendency to quit their jobs and return to their homes, elderly men changed to shorter, less demanding jobs requiring fewer hours of work, and persons in poor health stopped working altogether. But these were offset by increased work incentives in other groups. The income subsidies evidently gave individuals enough financial security to quit working for a while to search for better jobs. This applied to the young and relatively well educated in particular.

Money is not the only incentive, or even the principle incentive, that causes people to pursue productive lives. The principal factor that causes people to work would appear to spring more from a psychological need to feel useful and achieve success.

Of course, in our present system, money is closely associated with success. But even where money is an important incentive, the total amount of money received is not nearly so important as the amount of money relative to what other people are being paid. The compulsive workers among us are motivated by something much deeper than a weekly paycheck. The primary incentives for work would remain what they are today; i.e., the need to socialize, to compete, to achieve, and to escape boredom.

No one’s talent would go undeveloped for lack of opportunity.

The Effect on Political Freedom

The increased personal freedoms resulting from NMF income would be indistinguishable from political freedoms. If people cannot live where they wish, cannot travel where they want to go, and are prevented from providing their families with proper food and clothing, they are not free, and to some degree it is academic whether such restrictions are economic or political.

There is a direct relationship between personal economic security and political freedom. Where a population is economically powerless, political freedom is almost meaningless.

When the wealth of a nation is controlled by any small minority of the population, whether that group be made up of feudal barons, a ruling politburo, or the boards of directors of the major corporations, true democratic government is impossible.

The history of the American Revolution is a classic example of the critical link between financial security and political freedom. People who are financially secure, especially through the ownership of the means of production, do not readily submit to political pressure or lightly forfeit their personal liberty.

A Bigger Pie with Bigger Slices

It has sometimes been suggested that the benefits of the NMF could be achieved equally well by simply extending the welfare system or instituting a negative income tax. To the extent that these measures would redistribute the nation’s income and raise benefits to the poor, this may be true.

But redistribution of income through the tax system merely changes the way the pie is sliced; it does not increase its size. Increases in the welfare state or the institution of a negative income tax discourage innovation and retard individual excellence. They tend to homogenize society, to hold back achievers.

In contrast, the NMF would create wealth, increase productivity and encourage innovation. The total pie would get larger and everyone would share in the increase.

An economy based on the NMF would distribute most income from high technology industries equally, but the rest of the economy would be fair game for competition.


VIII: The Quest for Stable Prices

Consumption (the using up or the wearing out of goods and services) is regulated by the amount of money that is available to individuals, businesses, and government for spending. Production, on the other hand, is regulated by the level of investment, by the availability of labor and raw materials, and by the efficiency or productivity of the techniques and methods used in the productive process.

Inflation is nature’s way of maintaining a balance between consumption and production. Modern economists classify the causes for inflation into two categories:

  • ‘Demand-pull’ inflation is the classical form caused by too much money chasing too few goods.
  • ‘Cost-push’ inflation is where increasing costs in the production process itself forces the price of goods and services upward.

Cost-push inflation responds poorly or may actually be exacerbated by the classical remedies of monetary and fiscal restraint.

Productivity and Prices

The difference between wage increases and productivity increases is strongly correlated with the inflation rate over the past quarter century which strongly suggests that a primary cause of inflation is that wage increases exceed productivity increases.

Contrary to popular political rhetoric, budget deficits seem to have no clear relationship to inflation at all. There appears to be a slight tendency for inflation to precede budget deficits, indicating that deficits may be caused by rising prices, but there is certainly no evidence for the reverse.

There is little correlation between inflation and deficit spending. This strongly suggests that the fundamental cause of inflation is that of wage increases that exceed productivity gains. The only hope for a permanent cure to inflation is to close the gap between wages and productivity,.

The ‘Phillips curve’ in every modern economics textbook pretends to show the rate that wages can be expected to rise each year for any given rate of unemployment. The principal result of policies designed to create unemployment has been simply that — unemployment.

A Different Strategy

Why not try raising productivity instead?

The increase in investment required to improve productivity would reduce unemployment and end recession. The construction of new plants, machines, and transportation facilities would create jobs and stimulate business. Through increased investment we could mobilize our nation to overcome shortages, feed the hungry, house the poor, and, in general, make this land a delightful place in which to live.

The low rate of United States productivity growth is a direct result of our low rate of capital investment. NMF investment would stimulate business and reduce unemployment.

Investment Payback Delay

One major problem of fighting inflation through increased investment is the ‘investment payback delay’. With any investment, there is an unavoidable delay between making the investment and seeing its effects. During this interim period, investment spending tends to create short-term demand-pull inflationary pressures.

The time lag between investment-created demand and investment-created supply has historically been responsible for the classical oscillations in economic activity known as business cycles, or alternating periods of boom and bust.

Unfortunately, all of the techniques that are presently used for price stabilization operate on the basic principle of reducing demand by limiting investment. If the NMF were to embark on a policy of drastically increasing investment spending (through money borrowed from the Federal Reserve Bank), it would be working at complete cross-purposes with all of the existing price-stabilization mechanisms.

Some new mechanisms for limiting short-term demand will be required – a new mechanism will be proposed.

Monetary Policy

Monetary policy is the regulation by the nation’s banks of the amount of money in circulation.

Monetary restraint will produce not only recession and unemployment, but continued or even increased inflation. Even when successful, it exacts a terrible price. Short-term price stability is achieved at the cost of a long-term decline in the production of wealth.

Tight money and slow growth make it difficult to start new businesses. These all work in favor of established wealth. The social costs of high interest rates and high unemployment fall most heavily on the poor.

Tax Policy

In the ‘New Economics’ of Keynes, there is the concept of regulating consumer demand through raising or lowering taxes. Taxes should be lowered to stimulate demand when overall demand is sluggish and should be raised to reduce demand when overall demand is excessive.

Lowering taxes is popular. Raising taxes, on the other hand, is not! It is particularly unpopular when consumers are feeling the pinch of rising prices. Such a policy is therefore almost impossible to administer successfully.

Budgetary Policy

A third method used in attempting to stabilize prices is budgetary policy; i.e., the regulation of government expenditures. Budgetary and tax policy are sometimes lumped together under a single heading entitled fiscal policy.

Unfortunately, one of the few areas of the federal budget that is readily subject to budgetary control is research and development expenditures. Research monies are usually among the first casualties of any serious budget-cutting attempts. Thus, new technology, that is the long-term source of most productivity gains, is typically curtailed at the very beginning of any program of fiscal restraint.

The reduction of government expenditures as a method for combating inflation is often self-defeating. For example, cuts in poverty programs often mean that potential taxpayers are thrown into welfare or, worse, into a life of crime. Hence reductions in government expenditures may actually contribute more to the overall cause of inflation than to its prevention.

Time for a Change

None of the current inflation-control techniques are capable of dealing with cost-push inflation. They all attempt to close the gap between wage increases and productivity increases by holding down wages.

Today Western civilization is in a state of arrested progress. We are being tested. If the Western nations cannot solve the basic problem of stable economic progress, other nations, perhaps in the Far East, the mid-East, or Africa, eventually will.

Inflation will recede whenever we produce as much as, or more than, we consume. That this can be done by increasing productivity, as well as by reducing wages, seems clear. The secret lies in increased investment.

IX: A Formula for Price Stability

One of the reasons that increasing productivity through investment spending has never been seriously considered as a cure for inflation is because of the problem of the investment payback delay.

If NMF investment is ever to be practical on a large scale, it will be necessary to complement that investment with a savings program of sufficient magnitude to prevent increased investment from producing any net increase in demand. Savings is the key to increasing investment without inflation. Savings takes money out of circulation and reduces both demand and consumption. Savings, of course, is only deferred spending.

The Demand Regulation Policy (DRP) is designed to accomplish this purpose. It consists of two parts: one that deals with excess demand and the other deals with insufficient demand.

Part 1: Dealing with Excess Demand

The DRP would effectively balance the money equation by taking out of circulation about as much as the NMF put in through its investment policies. It would reduce consumer purchasing power during periods of inflation by diverting some fraction of consumer income into savings bonds, held in escrow until increased supply resulted from increased productivity.

The savings-bond money would be returned to the same individuals from whom it was withheld. This would be far more palatable to the public than tax increases because income would not actually be lost, but only temporarily converted into savings.

The technique of indexing interest rates on savings to the inflation rate has been advocated for use in the United States by the conservative economist Milton Friedman (‘There’s No Such Thing As a Free Lunch’) for years.

Part 2: Dealing with Insufficient Demand

Any tendency for aggregate prices to decline due to excess supply, the DRP would encourage redemption of the special bonds by declaring DRP bonds mature earlier than normal.

The DRP could also maintain demand in equilibrium with supply by directing the Federal Reserve Bank to create new money and distribute it directly to the public in the form of bonus payments.

To some, the notion of printing money and distributing it directly to the public seems an impossible utopian fantasy. To others, it simply sounds like fiscal irresponsibility. It is neither. Maintaining demand in equilibrium with supply ensures that prices will remain stable which is an eminently responsible economic goal.

There is nothing particularly revolutionary about distributing newly created money to the public. That is exactly what happens whenever the government shows a budget deficit. In order to finance deficit spending, the government borrows money by selling bonds. Printing bonds is not essentially different from printing money.

People tend to spend more than 90% of their disposable income on goods and services. This means that giving money to people to spend would create almost exactly the same number of jobs as giving money to the government to spend.

There are several reasons why distributing new money by direct cash bonuses would be better than the present method of deficit government spending:

  1. The distribution of benefits would be more equitable.
  2. Fluctuations would affect everyone equally and would be clearly and simply related to consumer prices.
  3. Direct cash bonuses could easily be adjusted on a monthly basis so that the monthly variation would be quite small hence no severe hardships would be experienced when bonuses were cut.

The administration of the (politically sensitive) National Mutual Fund would be independent from that of the Demand Regulation Policy (which should be isolated from immediate political pressures).

The classical economist may argue that this violates the free market. Classically, the capital market sets interest rates that make savings attractive and that is what provides the capital for investment. But historically, this mechanism has proven itself disastrously inadequate time and time again.

The United States economy is operating nowhere near its full capacity today (capital equipment is typically operated only 40 hours per week), and probably never has except for a few years during World War II.

Making investment independent of the propensity to save (i.e., making it possible for investment to be increased without deferring present consumption) is a revolution in economic thought. It frees the industrial system from the artificial constraints of the classical capital markets and makes it possible for production to be increased up to the maximum rate physically and technologically possible. Working together, the NMF and the DRP would enable society to invest freely in whatever enterprises were deemed to be both profitable and socially beneficial.

Working together, the NMF and the DRP could release modern technology to fulfil its potential for benefiting mankind.


X: A Department of Science and Technology

Although the National Mutual Fund would increase the production of wealth, there are many segments of the economy that require much more than just investment. Areas of the economy most in need of improvement do not typically produce high profits. In general, such things as public transportation, housing, and health services are not sectors of high growth based on automation.

Public transportation and low-cost housing are money-losing businesses today and will remain so only so long as these areas remain technologically backward. If new technology were introduced into these areas, they would develop many profitable opportunities. Technology in the field of housing construction has been stagnant for several centuries. Alvin Toffler, in his book ‘Future Shock’, describes housing as a preindustrial craft. The basic structure of the housing industry is modeled after the 16th century system of craft guilds. Houses are still built by itinerate artisans who migrate from one job to the next. Modern methods of computer-aided design and automated assembly of houses are strictly in the realm of EXPO exhibits and experimental demonstrations.

There is nothing inherently unprofitable in building houses or transporting people. But until these industries find ways to reduce costs and improve their products and services, additional investment will simply produce more overpriced housing of inferior quality and additional trains that no one wants to ride. When businesses are technologically stagnant, increased investment merely enables them to lose money faster.

Profits are to be found in greater productive efficiency.

A Role for the Federal Government

Conducting such a research program is the proper role of the federal government. There are several reasons why this is so.

  • Much of the research that needs to be done is expensive and of a high-risk nature.
  • There is very little incentive for private industry to go into sectors that are most in need of research.
  • The benefits from an invention to society typically exceed the profits received by the innovative company. For example, the benefits to society of the transistor, or penicillin, or even Scotch tape far exceed the profits to the companies that originally developed these products.

Unfortunately, the federal government has never had a consistent policy for developing socially beneficial technology. Little money is spent on technological development in other areas of social need.

Science at the Cabinet Level

The United States Government should establish a Department of Science and Technology to conduct and encourage research into areas of technology beneficial to the society as a whole.

A meeting of the National Academy of Engineers concluded that socially beneficial technology was woefully underfunded in this country and as a result productivity was lagging far below what could be achieved.

A Department of Science and Technology would remedy these shortcomings.

Socially beneficial industries, that presently are technologically stagnant, would become profitable investment opportunities. These could then be exploited by both NMF and private capital. Thus, the Department of Science and Technology would provide technological development, the NMF would provide capital resources, and the DRP economic stability. Working together, these three agencies would produce economic prosperity and human well-being far beyond what is now considered possible.


XI: Peoples’ Capitalism in a Finite World

The NMF and Limits to Growth

Planet Earth is clearly finite. There are limits to growth. Affluence has historically led to increased levels of certain kinds of pollution and wasteful consumption of natural resources.

If the result of the NMF were to simply increase the disposable income of the entire population so that everyone could engage in wasteful consumption then the NMF would quickly lead to worldwide catastrophe.

This is a problem of considerable magnitude since it pits the interests of the ‘have’ nations against the ‘have-nots.’ How can persons living in air-conditioned houses and driving gas-guzzling automobiles communicate their concern about the environment to people whose children are starving.

This problem is virtually insolvable within the constraints of classical economics.

The path of classical industrialization is extremely costly both in terms of physical and human resources.

An Alternative to Urbanization

Classical industrialization requires urbanization.

Automatic robot factories could be built in under-developed countries near sources of energy, raw materials. There would be no need to uproot the population from the countryside and concentrate it in cities. This would allow economic development without the social upheaval that ordinarily accompanies industrialization.

Peoples’ Capitalism thus offers a means by which non-industrialized countries might completely leapfrog the first industrial revolution.

Continued Growth and the Environment

Peoples’ Capitalism could also reduce the environmental impact of continued economic growth in industrialized countries.

Distribution of income through public dividends would make income from high technology industries available to rural residents as well as urban. This would reduce incentives for the rural poor to migrate to city slums in search of high-paying employment or, as it often turns out, of more liberal welfare payments.

NMF income could free people from the tyrannies of mechanization and allow them to live more by their own internal rhythms. Lifestyles quite likely would move closer to nature, as people divorced themselves and their families from the congestion and frustrations of the industrialized world.

Through instrumentality such as the NMF, increased affluence would not be incompatible with the environmental constraints of a finite planet. Peoples’ Capitalism thus offers hope for a resolution of the fundamental conflict between the interests of the ‘have’ and the ‘have-not’ peoples that today represents such a strong potential threat to world stability.


XII: From Throughput to Storehouse Economics

The purpose of an economic system should not be merely to produce clothing, food, and houses, but to clothe, feed, and house people. Human beings are, after all, what the economic system was created to serve, not vice versa.

Modern industrialized economies do not make the satisfaction of human needs a number-one priority. Income is derived from wages and salaries and, as a result, every effort must be made to assure that there is never any shortage of jobs. Products must wear out or be consumed so that they may be replaced. Styles must be changed so that whatever does not wear out is discarded anyway. People must be dissatisfied so that they want more. Resources must be exploited. Growth is essential.

But on spaceship Earth, where resources are limited and pollution is a serious threat, an appropriate economic system would be one that concentrated on the satisfaction of human needs rather than on the rate of production and consumption.

The key to making such a basic shift is the elimination of the virtually exclusive role of wages and salaries in the income distribution system. So long as job employment is a prerequisite to obtaining income, any significant shift from throughput to storehouse economics would create chaos. If products were made more durable, if mass advertising of trivia were eliminated, and if all unnecessary jobs were discontinued, unemployment would soar, recession would occur, and millions would be without income altogether. Throughput economics depends on continuous growth to create enough jobs to keep everyone employed. Storehouse economics would eliminate unnecessary jobs and seek to satisfy human needs with as little effort and expenditure of resources and energy as possible.

The NMF and Storehouse Economics

The NMF would provide the mechanisms to make the shift from throughput to storehouse economics. As dividends increased, many persons would voluntarily leave the labor force, many would transfer to more satisfying occupations. Unnecessary jobs could be eliminated with no hardship.

Whilst not all industries could be automated, robot factories would not require large numbers of employees to be concentrated within commuting distance, reducing commuting, congestion and pollution. People would be freer to live wherever they wished, adopting less resource-consuming lifestyles.

Robot factories would cope with fluctuating production requirements without causing labor dislocations so there would be no need to artificially stimulate additional consumption through mass-media advertising, style changes, or planned obsolescence.

If cars and appliances were made more durable, then production would fall because few people would need to buy new ones. This would reduce NMF profits and, hence, public dividends. But it would at the same time increase DRP payments to prevent a decline in the price index. Thus, a

Reductions in NMF income would be more than compensated by the combination of increased DRP payments and more durable products at a constant price, leading to a more self-sufficient, less resource-consuming lifestyles.

Once NMF dividends and DRP bonuses became a substantial fraction of the average family’s income, conservation would become as economically beneficial as new development; restoration would increase incomes as much as new construction.

Development for its own sake would no longer completely dominate the economy.

Attitudinal changes towards the environment are counter to the basic goals of growth and exploitation that are so fundamental to the present economic system. The NMF and DRP could provide the institutional framework under which a shift from throughput to storehouse economics could occur without severe economic dislocations, increasing the personal financial security of every individual. By this means the NMF and DRP could reconcile the environmental goals of conservation and preservation with legitimate desires of human beings everywhere for participation in the good life.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , , , , , , | Leave a comment

How Susceptible are Jobs to Computerisation?

News articles and reports appear almost daily on the subject of how technological developments in Artificial Intelligence and robotics will cause dramatic changes to employment over the next few decades. (Artificial Intelligence includes techniques such as ‘machine learning’, ‘deep learning’, artificial neural nets and ‘data mining’.) A high proportion of these articles refer back to a 2013 study by Carl Frey and Michael Osborne called ‘The Future of Employment: How Susceptible are Jobs to Computerisation?’ in which they asserted that 47% of total US employment is at risk.

Here, I go back to this original source and provide a summary.

The Method

Starting with a US Department of Labor list of employment categories, Frey and Michael Osborne produced estimates for the probability of computerisation for 702 occupations. (Throughout, reference to ‘computerisation’ means to automation by Artificial Intelligence, which is underpinned by computer technology.) This estimate was derived by assessing occupations in terms of the following factors:

  • Dexterity: The ability to make precisely coordinated movements to grasp, manipulate, or assemble objects.
  • Creative Intelligence: The ability to come up with original ideas, develop creative ways to solve a problem or to compose, produce, and perform works of music, dance, visual arts, drama, and sculpture.
  • Social Intelligence: Being perceptive of others’ reactions and understanding why they react as they do. Being able to negotiate to reconcile differences and persuading others to change their minds or behaviour. Providing personal assistance, medical attention, emotional support, or other personal care to others such as co-workers, customers, or patients.

They then examine the relationship between an occupation’s probability of computerisation and the wages and educational attainments associated with it.

Included in their analysis is a history of the 19th and 20th Centuries in terms of the effect of technological revolutions on employment and contrast this with the expected effect in the 21st Century.

The Results

Whilst the probabilities of automation is listed for all 702 occupations, the results are most succinctly presented in the figure (their ‘Figure III’) below:

Frey and Osborne: The Future of Employment: How Susceptible are Jobs to Computerisation?, Figure III

How Likely is it that your job can be automated?

In the figure, they have organized those 702 occupations into various categories and demarcated based on the probability of computerisation:

  • High: probability over 70%.
  • Medium: probability between 30% and 70%.
  • Low: probability under 30%.

In the table below, I have extracted just some of the 702 probabilities related to some of the categories:

  • Management / financial / legal
  • Engineering and technical
  • Education
  • Healthcare, and
  • Food

…to provide examples that support the above graphs. They clearly show healthcare and education as low-risk categories. Professional engineering jobs are low-risk but technician jobs are spread across the middle-risk and high-risk. Food-related jobs are firmly high-risk. There are a few surprises here for me. ‘Cooks, Restaurant’ and ‘Bicycle repairers’ are going to be almost completely automated and ‘Postsecondary teachers’ are going to be untouched. Will all restaurant meals be microwave-reheated?! Will robots strip down and reassemble bikes? Will online teaching have no impact on teaching roles?

Rank Prob.% Occupation Type
6 0.4% Occupational Therapists HEALTH
11 0.4% Dietitians and Nutritionists HEALTH
14 0.4% Sales Engineers HEALTH
15 0.4% Physicians and Surgeons HEALTH
17 0.4% Psychologists, All Other HEALTH
19 0.4% Dentists, General HEALTH
25 0.5% Mental Health Counsellors HEALTH
28 0.6% Human Resources Managers MGMNT
40 0.8% Special Education Teachers, Secondary School EDU
41 0.8% Secondary School Teachers, Except Special and Career/Technical Education EDU
46 0.9% Registered Nurses HEALTH
53 1.1% Mechanical Engineers TECH
54 1.2% Pharmacists HEALTH
63 1.4% Engineers, All Other TECH
70 1.5% Chief Executives MGMNT
77 1.7% Chemical Engineers TECH
79 1.7% Aerospace Engineers TECH
84 1.9% Civil Engineers TECH
82 1.8% Architects, Except Landscape and Naval TECH
98 2.5% Electronics Engineers, Except Computer TECH
104 2.9% Industrial Engineers TECH
112 3.2% Postsecondary Teachers EDU
115 3.5% Lawyers MONEY
120 3.7% Biomedical Engineers TECH
152 6.9% Financial Managers MONEY
153 7% Nuclear Engineers TECH
163 8.4% Childcare Workers EDU
188 14% Optometrists HEALTH
191 15% Kindergarten Teachers, Except Special Education EDU
192 15% Electricians TECH
226 25% Managers, All Other MGMNT
249 35% Plumbers, Pipefitters, and Steamfitters TECH
253 36% Computer Numerically Controlled Machine Tool Programmers, Metal and Plastic TECH
261 38% Electrical and Electronics Repairers, Powerhouse, Substation, and Relay TECH
263 38% Mechanical Engineering Technicians TECH
290 48% Aerospace Engineering and Operations Technicians TECH
317 56% Teacher Assistants EDU
386 70% Avionics Technicians TECH
398 72% Carpenters TECH
422 77% Bartenders FOOD
435 79% Motorcycle Mechanics TECH
441 81% Cooks, Fast Food FOOD
442 81% Word Processors and Typists MONEY
443 81% Electrical and Electronics Drafters TECH
453 82% Sheet Metal Workers TECH
460 83% Cooks, Institution and Cafeteria FOOD
477 84% Lathe and Turning Machine Tool Setters, Operators, and Tenders, Metal and Plastic TECH
489 85% Nuclear Technicians TECH
514 88% Semiconductor Processors TECH
522 89% Bakers FOOD
583 93% Butchers and Meat Cutters FOOD
596 94% Bicycle Repairers TECH
625 95% Postal Service Clerks MONEY
629 96% Office Clerks, General MONEY
641 96% Cooks, Restaurant FOOD
657 97% Cashiers MONEY
671 98% Bookkeeping, Accounting, and Auditing Clerks MONEY
688 98% Brokerage Clerks MONEY
698 99% Insurance Underwriters MONEY

De-Skilling: The First Industrial Revolution

Frey and Osborne provide some historical perspective, looking at the impact of past technological revolutions.

They start with the case of William Lee who invented the stocking frame knitting machine in 1589. But Queen Elizabeth I refused to grant him a patent: “Consider thou what the invention could do to my poor subjects. It would assuredly bring to them ruin by depriving them of employment, thus making them beggars”.

But by 1688, protection of workers in Britain had declined. The property owning classes were politically dominant and the factory system began to displace the artisan shop. The Luddite riots of 1811-1816 were a prominent example of the fear of technological unemployment. It was the inventors, consumers and unskilled factory workers that benefited from mechanisation. Arguably, unskilled workers have been the greatest beneficiaries of the Industrial Revolution.

An important feature of nineteenth century manufacturing technologies is that they were largely “de-skilling”. Eli Whitney, a pioneer of interchangeable parts, described the objective of this technology as “to substitute correct and effective operations of machinery for the skill of the artist which is acquired only by long practice and experience; a species of skill which is not possessed in this country to any considerable extent”.

Up-Skilling: The Second Industrial Revolution

In the late nineteenth century, electricity replaced steam and water-power and manufacturing production shifted over to mechanised assembly lines with continuous-process and batch production methods. This reduced the demand for unskilled manual workers but increased the demand for skills – there was demand for relatively skilled blue-collar production workers to operate the machinery and there was a growing share of white-collar non-production workers.

This shift to more skilled workers continued:

“the idea that technological advances favour more skilled workers is a 20th century phenomenon.”.

“the story of the 20th century has been the race between education and technology”

The Computer Revolution

Office machines reduced the cost of information processing tasks and increased the demand for educated office workers. But the supply of better educated workers filling these roles ended up outpacing the demand for their skills and this led to a sharp decline in the wage premium of clerking occupations.

Educational wage differentials and overall wage inequality have increased sharply since the 1980s. The adoption of computers and information technology explains some of the growing wage inequality of the past decades. Computerisation has eroded wages for (middle-income manufacturing) labour performing routine tasks and so workers have had to switch to relatively low-skill, low-income service occupations, pushing low-skilled workers even further down (and sometimes off) the occupational ladder. This is because the manual tasks of service occupations are less susceptible to computerisation, as they require a higher degree of flexibility and physical adaptability.

Educational wage differentials and overall wage inequality have increased sharply since the 1980s. The adoption of computers and information technology explains some of the growing wage inequality of the past decades. Computerisation has eroded wages for (middle-income manufacturing) labour performing routine tasks and so workers have had to switch to relatively low-skill, low-income service occupations which are less susceptible to computerisation as they require a higher degree of flexibility and physical adaptability. This has increasingly led to a polarised labour market, with growing employment in the high-income cognitive jobs and low-income manual occupations (the ‘lovely jobs’ and ‘lousy jobs’ as Goos and Manning have called them), accompanied by a hollowing-out of middle-income routine jobs.

Off-shoring is the other big factor affecting wage inequality. It is having a similar effect on jobs as automation. Alan Blinder (who used the same Department of Labor database that Frey and Osborne subsequently used) examined the likelihood of jobs going offshore, and concluded: that 22% to 29% of US jobs are or will be offshorable in the next decade or two.

The Automation of Routine Tasks

Frey and Osborne consider cutting the jobs cake in two ways:

  • Between routine and non-routine jobs, and
  • Between cognitive and non-cognitive jobs.

Previously, the tasks that have been automated have been routine, non-cognitive ones. Routine tasks are ones that follow explicit rules – behaviour that can be codified (and then coded). New Machine Learning technologies open up routine, cognitive tasks to automation and computers will quickly become more productive than human labour in these tasks. Non-routine tasks, whether cognitive on non-cognitive, are more difficult to codify and their automation would have to follow later – gradually, as the technology develops.

But Machine Learning improves the ability of robots to perceive the world around them and so it also helps automate routine, non-cognitive (manual) tasks that have not been possible previously.

Robots are becoming more advanced, and cheaper too (Rethink Robotics’s ‘Baxter’ only costs about $20,000). They can already perform many simple service tasks such as vacuuming, mopping, lawn mowing, and gutter cleaning and will likely continue to take on an increasing set of manual tasks in manufacturing, packing, construction, maintenance, and agriculture. It must be expected that they can gradually replace human labour in a wide range of low-wage service occupations – which is where most US job growth has occurred over the past decades.

The Automation of Non-Routine Tasks

More advanced application of Machine Learning and Big Data will allow non-routine tasks to be automated. Once technology has mastered a task, machines can rapidly exceed human labour in both capability and scale. Machine Learning algorithms running on computers are commonly better able to detect patterns in big data than humans. And they are not subject to human bias. Fraud detection is already almost completely automated. IBM’s Watson is being applied to medical diagnoses. Symantec’s Clearwell acquisition (now Veritas ‘eDiscovery’) can extract general concepts from thousands of legal documents. And this intelligence is made more accessible with improved voice Human-Computer Interfaces such as Apple’s Siri and Google Now.

Education is one sector that will be affected by this. Universities are experimenting with MOOCs (Massive Open Online Courses). From what they are learning about how students react to these online courses, they will be able to create interactive tutors that adjust their teaching to match each individual student needs.

And there are ways of automating non-routine manual tasks not through new technology but just by restructuring the tasks. For example, in the construction industry, on-site tasks typically demand a high degree of adaptability. But prefabrication in a factory before transportation to the site provides a way of largely removing the requirement for adaptability.

Employment in the Twenty-First Century

Over the years, the concern over technological unemployment has proven to be exaggerated because increased productivity has led to increased demand for goods, enabled by the better skills of the workforce. But Frey and Osborne cite Brynjolfsson and McAfee: as computerisation enters more cognitive domains, it will become increasingly difficult for workers to outpace the machines.

Frey and Osborne’s headline is that 47% of total US employment is in the ‘high risk’ category; this will affect most workers in production, transportation and logistics and office administrative support in a first wave of changes.

Wary of the difficulties of making predictions, they have restricted themselves to just analysing the likelihood of jobs that currently exist being automated as a result of near-term technological breakthroughs in Machine Learning and Robotics. Regarding timescales of the effects, they only go as far as saying ‘perhaps a decade or two’ for the first wave to take effect. And they are not wanting to forecast future changes in the occupational composition of the labour market or how many jobs will actually be automated. Many jobs will disappear completely but many roles will be modified because the offloading of automated tasks just frees-up time for human labour to perform other tasks. For example, while it is evident that much computer programming can be automated, Frey and Osborne say there are ‘strong complementarities’ in science and engineering between the power of computers and the high degree of creative intelligence of the scientists and engineers.

Beyond this first wave, they say there will be slowdown in labour substitution, which will then be driven by incremental technological improvements. All told, a ‘substantial share’ of employment, across a wide range of occupations, is at risk in the near future.

There is a strong negative correlation between a job’s risk of automation and wages/educational attainment. For example, paralegals and legal assistants are in the high risk category whereas the highly-paid, highly-educated lawyers are in the low risk category.

This marks a profound change in the balance of jobs. Whereas the nineteenth century manufacturing technologies largely substituted for skilled labour through the simplification of tasks and the Computer Revolution of the twentieth century caused a hollowing-out of middle-income jobs (splitting the jobs market into high-wage, high-skill and low-wage, low-skill occupations), Frey and Osborne predict that, as technology races ahead, the Machine Learning and Robotics revolution will take out the bottom of the market, requiring the low-skill workers to acquire creative and social skills and reallocate to tasks that are non-susceptible to computerisation!

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , , , , , , , | 1 Comment

From Neural ‘Is’ to Moral ‘Ought’

This talk takes its inspiration from Joshua Greene’s ‘From neural ‘is’ to moral ‘ought’: what are the moral implications of neuroscientific moral psychology?’

He says:

“Many moral philosophers regard scientific research as irrelevant to their work because science deals with what is the case, whereas ethics deals with what ought to be.”

but Greene (director of Harvard’s ‘Moral Cognition Lab’) continues:

“I maintain that neuroscience can have profound ethical implications by providing us with information that will prompt us to re-evaluate our moral values and our conceptions of morality.”

So: what are those profound implications?

In this talk I explore various ideas to try to present a neuroscientific perspective on morality.

Is to Moral Ought

We’ll start with some brief background to ethics (the ‘moral ought’ of the title) and then the ‘is to ought’ part. ‘Normative ethics’ is about the right (and wrong) way people should act in contrast to ‘descriptive ethics’ which, not surprisingly, just describes various ethical theories.

There are 3 major moral theories within normative ethics:

  • Deontology which emphasizes duties and the adherence to rules and is frequently associated with Immanuel Kant,
  • Consequentialism which emphasizes the consequences of an action in determining what should be done and is frequently associated with Jeremy Bentham’s and John Stuart Mill’s Utilitarianism that aims for “the greatest happiness of the greatest number”,
  • and the less familiar Virtue Ethics which emphasizes the goodness (good character) of the agent performing the action rather than the act. Virtue ethics is frequently associated with Aristotle but various other philosophers have produces lists of virtues that define a good person. For example, Plato defined the ‘4 cardinal virtues’ (Prudence, Justice, Courage and Temperance) and Aquinas defined the ‘3 theological virtues (Faith, Hope and Charity). Lawrence Kohlberg (who we will hear of later on) criticised Virtue Ethics in that everyone can have their own ‘bag of virtues’ but there is no guidance of how to choose those ethics.

Whilst it is true that:

 “… science deals with what is the case, whereas ethics deals with what ought to be.”

… it is technically possible to get from an ‘is’ to an ‘ought’. We might assert a fact that ‘murder decreases happiness’ (an ‘is’), perhaps asserted because of a neuroscientific way of measuring happiness. But it would not be logically true to derive the imperative ‘do not murder’ (an ‘ought’) from this. However, if predicated by the goal of ‘maximization of happiness’, it is true:

if goal then { if fact then imperative }

‘if our goal is to achieve the maximum happiness and murder decreases happiness then do not murder’

But this just shifts the problem one step back from specifics to wider philosophical questions. The issue is then:

  • What should our goal be?
  • What is the purpose of morality?
  • What is the purpose of life, mankind and the universe?

And there is the issue:

  • Who gets to decide?

The Cognitive Essence of Morality

For me, if I get to decide the purpose of morality, I think it comes down to this – everyone can decide what their own goals are, and the essence of morality is then:

The (deliberative) balancing the wants (goals) of oneself with those of (sentient) others.

It is about self-regulation.

Immediately, this casts the problem into cognitive terms:

  1. In order to balance goals, we need a faculty of reason.
  2. In order to understand the concepts of ‘self’ and ‘others’ we need a ‘theory of mind’.
  3. We feel that we can choose our wants but they are ultimately physiological i.e. neurological.
  4. (The issue of identifying sentience i.e. consciousness is not considered here.)

To be moral requires intelligence, a ‘theory of mind’ and maybe other things.

Iterated Knowings

What is ‘theory of mind’?

It is an ability to understand that others can know things differently from oneself. We must understand this if we are to balance their wants against ours.

The Sally Anne test

The classic test for a theory of mind is the ‘Sally Anne Test’ which presents a story:

  • Sally has a marble which she puts her marble into a basket. She then goes out for a walk. During this time, Anne takes the marble from the basket and puts in to a box. Sally then comes back.

The question is then:

Where will Sally look for her marble?

If we think Sally will look for her marble in the box then we have no theory of mind.

This theory fits neatly into a scale of ‘Iterated Knowings’ set our originally by James Cargile in 1970 but prominently discussed by Daniel Dennett and Robin Dunbar.

The scale starts at the zero-eth level: some information (‘x’). Information relates something to something else. If ‘some input’, then ‘some output’. Information can be encapsulated by rules.

At the first level, we have beliefs (‘I know x’) which we recognise can be different from reality (‘x’).

At the second level, we understand theory of mind: ‘I know you know x’. Knowing it is possible for others to not know things, it is possible to deceive them: ‘I know that Sally will not know the marble is in the box’.

At the third level, there is ‘communicative intent’: ‘I know you know I know x’. I can communicate information to you and know that you have received it. I am able to understand that you can understand that you have been deceived by me – I can understand reputation.

At the fourth level, it is possible to understand roles and narrative: ‘I know you know I know you know x’ where ‘you’ are an author, for example. In the 1996 film production of ‘Hamlet’, Kenneth Branagh’s Hamlet kills Richard Briers’s Polonius. A failure to understand roles would mean that we would think that Branagh has killed Briers.

At the fifth level, there is an awareness of roles and narratives that are distinct from the role or narrative. There is an awareness that others have their own narratives that are different from one’s own, even though the experiences are similar – there can be other cultures, myths, religions and worldviews. Many adults do not attain this level.

At each level, there is an awareness of the phenomenon at the lower level that is distinct from the phenomenon itself. It is possible to understand sentences at seemingly higher levels, for example:

“I know that Shakespeare wants us to believe that Iago wants Othello to believe that Desdemona loves Cassio”

but this is still really only a fourth-level phenomenon – that of understanding roles.

These levels of iterated knowings are also referred to as orders of intentionality.

Cognitive Theories of Moral Development

In order to:

balance the wants of oneself with those of others

we need rational intelligence and a theory of mind as already stated. But we also need an ability to work out what the ‘other’ wants. Judging from appearance, this requires ‘social cognition’ – an ability to read faces and body language, to understand what the other is feeling.

But there is another ingredient required for us to actually act morally – for us to care about the other.

By my definition, a moral agent tries to understand what the other wants – tries to apply the ‘Platinum Rule’:

‘Do unto others as they would want to be done by’

as opposed to the more common baseline of moral behaviour, the ‘Golden Rule’:

‘Do unto others as you would want to be done by.’

Having said that care is required, it is possible to manage without it by upping the order of intentionality.

A third-order agent understands reputation. It may not care about the other but it (sociopathically) balances its wants against the other to maintain a reputation which helps itself in the long term.

It is also possible to manage without social cognition through communication. A third-order agent may not be able to understand what you want but it may be able to ask you.

And finally, it is also possible to manage without either social cognition or a caring nature – by relying on communication and reputation.

We have here the basis of the theory of moral development in which there is increasing:

  • intelligence
  • level of intentionality
  • social cognition.
  • and care

and in which we are better with more of each characteristic. We could say that these are the cognitive moral virtues: intelligence, intentionality, social cognition and care!

Note that fifth-order intentionality is a level which many adults do not attain. All too often, moral conflict arises not because the others’ opinion differs from one’s own but because of an inability to understand that the other has a different worldview into which they fit knowledge. As Jacques Rancière has said:

“Disagreement is not the conflict between one who says white and another who says black. Rather, it is the conflict between one who says white and another who also says white but does not understand the same thing by it.”

A rather more famous theory of moral development based upon a theory of cognitive development is that of Lawrence Kohlberg’s, based upon Jean Piaget’s. It too has a 6-point scale, with the sixth being one which many do not attain:

  1. Infantile obedience: ‘How can I avoid punishment?’
  2. Childish self-interest: ‘What’s in it for me?’
  3. Adolescent group conformity (norms)
  4. Adult conformity to law and order
  5. Social contract / human rights
  6. Universal ethical principles / conscience

I will say no more about this other than to point out some similarity between my ‘Iterated Knowings’ theory and Kohlberg’s: the former’s characteristics of rules, deception, reputation and roles map approximately onto Kohlberg’s first 4 levels.

Up Close and Personal

Returning to Joshua Greene’s ‘From neural ‘is’ to moral ‘ought’’ paper, a significant part is devoted to two scenarios considered by Peter Unger:


You receive a letter asking for a donation of $200 from an international aid charity in order to save a number of lives. Should you make this donation?

Joshua Greene: ‘From neural ‘is’ to moral ‘ought’: what are the moral implications of neuroscientific moral psychology?’ – Nature reviews Neuroscience 4(10) pp.846-9 (2003)

The aid agency letter


You are driving in your car when you see a hitchhiker by the roadside bleeding badly. Should you take him to hospital even though this means his blood will ruin the leather upholstery of your car which will cost $200 to repair?

Joshua Greene: ‘From neural ‘is’ to moral ‘ought’: what are the moral implications of neuroscientific moral psychology?’ – Nature reviews Neuroscience 4(10) pp.846-9 (2003)

Should you take the injured hitchhiker to hospital?

The vast majority of us would not look badly upon anyone who did not donate the $200 but would consider the person who left the hitchhiker behind to die to be a moral monster.

But given $200 and a choice between the two scenarios, a Utilitarian should help the far-flung family rather than the hitch-hiker.

Greene says that we think there is

 ‘some good reason’

why our moral intuitions favour action when the choice is

‘up close and personal’

rather than far removed. He points out that the moral philosopher Peter Singer  would maintain that there is simply no good reason why we should.

I have proposed social cognition and caring for others as some of the essential characteristics of morality. These suggest our preference for the ‘up close and personal’. We care because we see.

I speculate that our caring stems from our need to identify between what is ourselves and what is not. In the rubber hand illusion, our eyes deceive us into thinking a rubber hand is actually our hand; momentarily we feel pain when the hand is hit before we work out that our sense of touch is not agreeing with our eyes. We unconsciously mimic others – when seeing someone with crossed arms, we may cross our own to reduce the discrepancy between our sense of proprioception and what we see. This is a weak connection (yawn contagion is much stronger – we cannot help ourselves). This makes a connection between seeing others in pain and having a deep sense of where it would hurt on ourselves. Again, we wince at the sight of others being hurt but this soon disappears as the recognition that ‘it is not me’ takes over. But at least there is this initial feeling of the pain at the sight of others in pain – the origins of empathy. (Some people claim  they literally feel the pain of others – that this sense does not quickly dissipate. This condition is called ‘mirror-touch synaesthesia’.)

Oxytocin and Vasopressin

Pair-bonded prairie voles

So I have provided a tentative a psychology story of the origins of care. But what does neuroscience tells us about this? In her 2011 book ‘Braintrust’ (sub-titled ‘What neuroscience tells us about morality’), Patricia Smith Churchland highlights some research in behavioral neurobiology into the very different behaviour between two very similar creatures. Prairie voles pair-bond for life whereas Montane voles are solitary. (The most prominent researchers on this topic are Thomas Insel (1992-), Sue Carter (1993-), Zuoxin Wang (1996-) and Larry Young (1999-).)

One physical difference is in two closely-located parts of the brain, the ventral pallidum  and the nucleus accumbens.

Compared with montane voles, prairie voles have much higher densities of neuromodulator receptors for Oxytocin and Vasopressin in these areas.

Larry Young

The Prairie vole brain. NAcc: Nucleus Accumbens, VP: Ventral Pallidum, PFC: Pre-Frontal Cortex, OB: Olfactory Bulb

What does this ‘higher density of neurotransmitters receptors’ mean? Well, neuromodulators are molecules that bind onto receptors on a neuron and control the firing of that neuron. A larger number of receptors on neurons for a particular neurotransmitter will increase the chance of that neuron firing when in the presence of such neurotransmitters. But a higher number of neurotransmitters will achieve the same result.

The most effective way of getting extra Oxytocin into the brain is via a nasal spray. Conversely, if an antagonistic drug is sprayed instead, these molecules with lock onto the receptors but they are the ‘wrong keys’ – the do not release proteins within the neuron that modulate the firing of the neuron. This effectively reduces the number of receptors. Put very simply, by increasing or decreasing the effects of these neuromodulators, researchers have found they can make Prairie voles behave more like Montane voles and vice versa.

This is an extremely simplistic view; the qualifying details do not matter here. The point is that we can experimentally control behaviour associated with these neurotransmitters – which is?…

Oxytocin and Vasopressin are primarily associated with reproduction in mammals including arousal, contractions and lactation. The ‘cousins’ of Oxytocin and Vasopressin have performed equivalent functions in other creatures for hundreds of millions of years.

From this reproduction starting point, these neurotransmitters have evolved to control maternal care for offspring, pair-bonding and allo-parenting. Allo-parenting is maternal care for young that is not by its parents, typically the ‘aunties’ of orphans. There is not any (magical) genetic mechanism for allo-parenting. It is just a result of seeing young physically close by needing care – from them being ‘up close and personal’.

And from human tests, it has been shown that they improve social cognition (at the expense of other learning) – the memory of faces, the recognition of fear and the establishment of empathy and trust.

This improved social cognition has led to interest from the autism community. Autism is sometimes thought of as lacking a ‘theory of mind’ but this is extreme. It is better characterized as having impaired social cognition. Tests with Oxytocin on autistic people show an improvement in eye gaze and the interpretation of emotions and a reduction in repetitive behaviour.

Oxytocin has also been connected with generosity. In the ‘Ultimatum game’ psychological test, the subject of the experiment proposes a split of money potentially given to them with another. The other person decides whether to accept the deal or to punish unfair offers so that neither party get anything; deals generally get accepted where the subject offers more than 30% of the stake. Oxytocin nasal sprays increases the proportion offered.

This all sounds fantastic. We just need everyone to spray some Oxytocin up our nostrils every morning and we will become more caring and considerate of others.

Oxytocin molecular structure

Paul Zak, an early researcher into the trust-related effects of Oxytocin, has zealously promoted the idea of the ‘Moral Molecule’ (as his book is called). But it has also been criticized as the ‘Hype Molecule’, particularly as more research was done which revealed some negative aspects of the neurotransmitter and its cousin.

Vasopressin has a conciliatory ‘tend-and-befriend’ effect on females but it will reduce ‘fight or flight’ anxiety in men and make them more aggressive in defence of the mate and of the young.

This may be the origin for behaviour that has been described as ethnocentric (even as ‘xenophobic’). For example, an early experiment based around Dutch, German and Muslim names found that German and Muslim names were less positively received when the Dutch subjects had been given Oxytocin.

Since we are considering morality as a balancing act, Oxytocin could be characterized as tilting the balance from ‘me’ more towards ‘you’ but also from ‘them’ towards ‘us’.

This and many practical matters means that we won’t be having our daily nasal sprays just yet.


Piff et al: 'Higher social class predicts increased unethical behavior'

Another BMW driver fails to stop for a pedestrian.

So far, I have characterized morality as balancing the wants of oneself with those of others and looked at how Oxytocin tips the balance towards others and can increase generosity.

Paul Piff (Berkeley) has devised various experiments to judge the generosity of the affluent. One test considered car type as an indicator of wealth and monitored which cars stopped at pedestrian crossings. High status cars were less likely to stop than other makes.

Another indicator of generosity is charitable giving. Various studies show that the most generous regions of a country are not the most affluent. In the USA, Utah and the Bible Belt stand out for higher generosity. Research indicates that it is not religious beliefs that are important here but regular attendance at services. These services involve moral sermons, donations and meeting familiar people.

Charitable giving in USA

Other factors that improve charitable giving include

  • being with a partner (‘pair-bonded’),
  • living in a rural community and
  • being less affluent (as suggested by Piff’s research).

There is a common theme here: being ‘up close and personal’ in meaningful relationships with others:

  • There is anonymity in an urban environment.
  • We are insulated from others in a car.

I have characterized morality as balancing the wants of oneself with those of others. Through psychology, we can understand why our preference for the ‘up close and personal’ has evolved. But this tells us nothing about how we should behave and this has nothing to do with neuroscience. But the neuroscience of Oxytocin & Vasopressin is one avenue towards a physical understanding of care and how it constrains us and how we might be able to control it in the future.

Reason vs Emotional Intuition

So, we emotionally feel a preference for the ‘up close and personal’ but our rational inclination is that this should not be. Just as there is the balance between self and others, there is a balance between emotion and reason – the two halves of psychology’s ‘dual process theory’. As described by Daniel Kahneman  in ‘Thinking, fast and slow’, ‘System 1’ is the fast, unconscious, emotional lower level and ‘System 2’ is the slower, conscious, reasoning higher level.

This split between rational and emotional decision-making corroborates well with Joshua Greene’s experiments in which his subjects answered trolleyology questions whilst in an fMRI scanner. Making decisions quickly was correlated with activity in the Amygdala and the Ventro-Medial Pre-Frontal Cortex (VM-PFC) whereas questions that caused longer deliberation was correlated with activity in the Dorso-lateral Pre-Frontal Cortex (DL-PFC). Both the Amygdala and the VM-PFC are associated with social decision-making and the regulation of emotion. In contrast, the DL-PFC is associated with ‘executive functions’, planning and abstract reasoning. We can say that the former regions are associated with ‘now’ and the latter region is associated with ‘later’.

The classic (Benthamite) form of Utilitarianism is ‘Act Utilitarianism’ in which an individual is supposed to determine the act which leads to the ‘the greatest happiness of the greatest number’. Such a determination is of course impossible but even practical deliberation to produce a reasonably good guess can often be too slow.

This has led to the ‘Rule Utilitarian’ approach of ‘pre-calculating’ the best response to typical situations to form rules. Then it is just a case of selecting the most applicable rule in a moral situation and applying that rule. That allows quite fast responses but these are often poor responses in retrospect.

Now, R. M. Hare proposed a ‘Two-Level Utilitarianism’ which is a synthesis of both Act- and Rule- Utilitarianism: apply the ‘intuitive’ rules but in the infrequent cases when there is a reduced confidence in the appropriate rules (such as more than one rule seeming to apply and those rules are in conflict), move on to ‘critical’ deliberation of the best action.

This looks a lot like ‘dual process theory’!

The Predictive Mind

We have a reasonable understanding of what goes on in the brain at the very low level of neurons, and we know what it is like at a very high level in the brain because we experience it from the inside every single day. But how we get from the small scale to the large scale is a rather difficult proposition!

‘Dual process theory’ is a crude but useful model upon which we can build psychological explanations but we now have a very promising theory of the brain that I have frequently mentioned elsewhere. Its most complete formulation is Karl Friston’s strangely-named ‘Variational Free Energy’ theory from as recently as 2005 but its pedigree can be traced back through Richard Gregory, William James to Hermann von Helmholtz in 1866, before the foundation of psychology as a discipline.

For the context here, I will not go over the details of this theory but the most basic behaviour of the brain is as a ‘hierarchy of predictors’, my preferred term for the theory that Jacob Hohwy calls ‘the Predictive Mind’, Andy Clark calls ‘predictive processing’ and yet others call ‘the Bayesian Brain’. All levels concurrently try to predict what is happening at the level below and provide prediction errors upwards on its confidence about its predictions. We then view the brain as multiple-level (more than 2) with lower levels dealing with the fast ‘small scale’ moving upwards to longer-term ‘larger scale’ levels. Psychology’s conceptual Dual Process theory becomes a subset of neuroscience’s physically-based Predictive Mind theory.

Felleman and Van Essen’s famous ‘wiring diagram’, showing the hierarchical organization from low levels (bottom) up to high levels (top)

This can inspire us to imagine a ‘multi-level Utilitarian’ moral theory which is superior to Hare’s ‘2-level Utilitarianism’. Noting that the ‘hierarchy of predictors’ operates:

  • continuously,
  • concurrently, and
  • dynamically

…we can produce a better moral theory…

Moral theories generally consider how to make a single decision based upon a particular moral situation, without revisiting it later.

We deal with the easy moral issues quickly, going back to the more complex that require more deliberation. This better consideration (prediction) of the consequences of possible actions may also be influenced by a change in circumstance since previously considered. And this change may be as a result of our (lower-level) actions previously made.

Eventually, the window of possible action upon a moral problem will pass and we can return to the ‘larger-scale’ problems which still linger. (When we have solved the injustices of inequality, poverty and violence in the Middle East, and have no more immediate problems to deliberate over, we can take a holiday.)

It automatically and dynamically determines the appropriate level of consideration for every problem we encounter.

I think this is a sensible moral theory. It is an intelligent theory. This is true almost by definition, because this Predictive Mind mechanism is how evolution has produced intelligence – an embodied general intelligence acting in a changing environment.

Georgia State University


I somewhat provocatively point out an irony that:

  • A moral philosopher sits in his armchair, proudly proposing a moral theory that is detached from the world of ‘is’.
  • Inside his head is a bunch of neurons wired together in a particular way to produce a particular way of thinking.
  • But his moral theory is an inferior description of the way his brain thinks!

So we end up with a cognitive theory in which moral problem solving isn’t really any different from any other type of problem solving! This is an Ethical Naturalist point of view.

From Dualism to Physicalism

For ordinary people of our grandparents’ generation, the dominant philosophical belief was of the separation of mind and matter. We had free will – the mind was free to make choices, unconstrained by the physical world.

In contrast, our grandchildrens’ generation will have grown up in an environment where the idea of the brain defining behaviour within what is essentially a deterministic world is commonplace. The concept of ‘free will’ is unlikely to survive this transition of worldviews intact and unmodified.

Now, there is no single fact of neuroscience that makes any Dualist suddenly switch over to being a Physicalist. People don’t change worldviews just like that. But the accumulation of coherent neuroscientific information over many years does cause a shift. As Greene says

“Neuroscience makes it even harder to be a dualist”

So, though we can always invoke the is/ought distinction to ensure that neuroscience and morality are disconnected, its influence on our metaphysics indirectly affects our concepts of morality.

With a Dualist worldview, we can say that if it is wrong for person A to do something in some precise situation, then it is also be wrong for person B to do that in that same precise situation. A and B can be substituted. It is the act that is moral.

However, with a Physicalist worldview, we have to accept that the physical state of an agent’s brain plays a part.

Psychology Fun!

Trajectory of the tamping iron through Phineas Gage’s head

Consider the two classic case studies of Phineas Gage and Charles Whitman:

  • Whilst working on the railroads in 1848, an explosion blew an iron rod straight through Phineas Gage’s head, up under a cheekbone and out through his forehead, leaving a gaping hole in his brain. He miraculously survived but his personality was changed from that of a responsible foreman beforehand to an irreverent, drunken brawler.
  • Charles Whitman personally fought his “unusual and irrational thoughts” and had sought help from doctors to no avail. Eventually he could hold them back no more whereupon he went on a killing spree killing 16. Beforehand, he had written “After my death I wish that an autopsy would be performed on me to see if there is any physical disorder.” The autopsy revealed a brain tumour.

It is not surprising to us that substantial changes to the physical brain cause it to behave substantially differently.

We can no longer say that it is equally blameworthy for persons A and B to do something in exactly the same situation because their brains are different.

Were I to find myself standing on the observation deck of the University of Texas tower with a rifle in my hand, I would not start shooting people at random as Whitman did. A major reason for this is that I don’t have the brain tumour he had. But if I were to have a brain like Whitman’s, then I would behave as he did! In shifting towards a physicalist position, we must move from thinking of acts being good or bad towards thinking of actors (the brains thereof) being good or bad. We move from Deontology or Consequentialism towards Virtue Ethics.

There is the concept of ‘flourishing’ within Virtue Ethics. We try to ‘grow’ people so that they are habitually good and fulfil their potential. To do this, we must design our environment so that they ‘grow’ well.

And when we talk of ‘bad brains’, we don’t blame Whitman for his behaviour. In fact, we feel sorry for him. We might actively strive to avoid such brains (by providing an environment in which doctors take notice, or take brain scans, when people complain to them about uncontrollable urges, for example). ‘Blame’ and ‘retribution’ no longer make sense. As others have said:

  • ‘with determinism there is not blame, and, with not blame, there should be no retribution and punishment’ (Mike Gazzaniga)
  • ‘Blameworthiness should be removed from the legal argot’  (David Eagleman)
  • `We foresee, and recommend, a shift away from punishment aimed at retribution in favour of a more progressive, consequentialist approach to the criminal law’ (Joshua Greene and Jonathan Cohen)


I have defined the essence of morality as being the balancing the wants of oneself with those of others:

  • As well involving reason, this means getting into someone else’s mind (rather than just getting into their shoes). On a scale of ‘iterated knowings’, we need at least a ‘theory of mind’. I have set out a theory of the moral development of a person in which there is progression up the scale of iterated knowings up to having a desire and ability to understand another’s entire epistemological framework, which is something relatively few people reach.
  • Whilst we can act morally based on the selfish maintenance of reputationand a rather mechanical ability to communicate, it is better if we also have ‘social cognition’ (an ability to see how another feels and read what they want, more directly than verbal communication) and to actually care about the other.
  • The origins of both social cognition and care lie in our basic cognitive need to be able to distinguish between self and non-self. In doing this, we can unconsciously relate the feelings of others back onto ourselves when we seethem, allowing us to empathize with them.

We can make a link from the actions of the neurotransmitters Oxytocin & Vasopressin up through social cognition and empathy to the shifting of the balance towards others in being more considerate and generous to others. A common factor in this behaviour is proximity – an unconscious emotional preference for those we know and see around us. This provides us with ‘some good reason’ why biasing towards the ‘up close and personal’ feels intuitively right even though we logically think there should be no bias.

The moral philosopher R. M. Hare proposes a sensible balancing of intuition and logic. But this ‘dual process’ psychology type of moral theory is just an inferior form of the more general neuroscientific theory of the ‘predictive mind’, advocated by Karl Friston, Jacob Hohwy, Andy Clark and others. The latter inspires an improved moral theory that:

  • Generalizes to advocating more detailed slower deliberation for more complex moral dilemmas, rather than just offering a two-stop shop.
  • Relates moral thinking to generalintelligent thinking of an agent embodied within an environment. This is an ethical naturalist position: moral problem solving is not distinct from other types of problem solving.
  • Improves the theory in being dynamic. Moral decisions are not ‘fire and forget’. We should continue to deliberate on our more complex moral problems after we have made a decision and moved on to subsequent moral situations, particularly as circumstances change or we see the results of our actions.

So ‘is’ might inspire ‘ought’ but it still does not imply it. Not directly, anyway.

Neuroscientific knowledge pushes society further away from dualism, towards physicalism in which the moral actor is embedded within its own environment and hence physically determined in the same way. Our moral framework must then shift towards a Virtue Ethics position of trying to cultivate better moral actors rather than the Deontological or Consequentialist focus on correct moral acts.

This forces us to re-evaluate blame and praise, shifting us away from retribution. We must actively cultivate a society in which people can morally ‘flourish’.

Our new-found knowledge in neuroscience forces us recognize that our neural construction constrains but also increasingly allow us to overcome it – but at our peril.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , | Leave a comment



The Fall of Artificial Neural Networks: XOR gates

In the 1969 book ‘Perceptrons: an introduction to computational geometry’, Marvin Minsky and Seymour Papert demonstrated that single-layer Artificial Neural Networks could not even implement an XOR (‘exclusive or’) logical function. This was a big disappointment. In the history of Artificial Neural Networks, this is seen as a significant contributor to the ‘AI winter’ of reduced interest (and hence also of reduced funding) in them.

The Rise of Artificial Neural Networks: Back-Propagation

The backpropagation algorithm effectively solved the exclusive-or problem in that:

  • To implement XORs required one or more hidden layers in the network (between the inputs and the output layer).
  • The backpropagation algorithm enabled multi-layer networks to be trained.

This contributed to a resurgence of interest in Artificial Neural Networks. Backpropagation was invented independently a number of times, most notably by Paul Werbos (1974), Rumelhart, Hinton and Williams (1986) and Yann LeCun (1987).

Watch Victor Lavrenko’s Youtube for more technical details on the XOR problem…

The Backpropagation Code

The purpose of this post is to provide example code for the backpropagation algorithm and demonstrate that it can be solve the XOR problem.

As noted elsewhere:

  • The code here is unapologetically ‘unpythonic’.
  • If you do not have a Python application installed, you can open the online in a new window and use that. All code fragments are combined at the end of this piece into a single listing that can be copy-pasted into the interpreter.

As well as being unpythonic, the code here differs from typical implementations in that it can handle more than 2 layers. The code can be configured for any full-connected feed-forward network of any number of layers greater than 1 and any number of neurons for each layer.

Some Housekeeping

Firstly, let’s sort out some housekeeping. Here are 2 functions so that:

  • We can pause the run to see things before they disappear off the top of the screen. (We can stop if we type ‘n’)
  • We can control how much information gets printed out by varying a ‘verbosity’ variable value.
def prompted_pause(s):
    import sys
    ok = input(s)
    if ok=="n" or ok=="N":
        print("Stopping here")

verbosity = 1

def print_info(v,s, end="DEFAULT"):
    if verbosity >= v:
        if end == "DEFAULT":
            print(s) # With newline
        elif end == "":
            print(s, end="") # Without newline
            print(s, end)

Where there is a call to print_info(3, “Blah”), the 3 means that the message “Blah” will only get printed out if the verbosity level is 3 or more. Across the whole program below, verbosity levels are such that:

  • If verbosity is set to 1, it will only print out the minimal.
  • If verbosity is set to 2, it will only print out more.
  • If verbosity is set to 3, it will only print out the minimal.

The Application

The neural network will be trained to behave like a ‘full adder’ circuit. This is a common building block in digital electronic circuits. It adds up three 1-bit numbers to produces a 2-bit output number (range 0…3). The ‘CI’ and ‘CO’ signals are the carry-in and carry-out respectively. As an example application, by chaining 32 of these circuits together (connecting the CO output of one full adder to the CI input of another) we get a circuit that adds two 32-bit numbers together.

Full Adder circuit

The full adder has been chosen because:

  • It contains at least one XOR gate (it has 2), to demonstrate that a multilayer network can learn this non-linearly-separable behaviour, and
  • It has more than one output (it has 2), to provide Python code that is more generalised.

This is not a good example of what a neural network could be used for. Here, there are only 8 possible combinations of inputs. Any 3-in 2-out (combinatorial) function can be defined with just 16 bits of information.

The training set is the same as the test set and just defines the LINK truth table of a full adder. After training, the network will be tested against all the 8 possible input combinations. A more appropriate application is where then number of possible input combinations is much greater than the number of vectors it can be trained and tested against.

# Full Adder example:
Training_Set = [
    # A  B  CI   S  CO
    [[0, 0, 0], [0, 0]],
    [[0, 0, 1], [0, 1]],
    [[0, 1, 0], [0, 1]],
    [[0, 1, 1], [1, 0]],
    [[1, 0, 0], [0, 1]],
    [[1, 0, 1], [1, 0]],
    [[1, 1, 0], [1, 0]],
    [[1, 1, 1], [1, 1]]
# Bit assignments...
SUM   = 1

For example, there are 2 bits set to 1 in the input [0, 1, 1] so the sum is 2 which is binary ‘10’, so the output vector is [1, 0].

A Neuron

We now define the code for a single neuron. For each neuron, we need to have:

  • A list of weights, one for each neuron input (from the layer below).
  • A bias – this behaves in the same way as a weight except the input is a constant ‘1’.
  • A gradient ∂E/∂z. This is used for training the network.

The FeedForward_Neuron function calculates a neuron’s output y, based on its inputs x, and its bias b. A sum-of-products is formed:

z = Σwi.xi + b

and the output y is derived from that using the Sigmoid function:

y = σ(z)

The Sigmoid function provides the non-linearity that allows the network to learn non-linear relationships such as the ‘XOR’ function (compare this with the simple, linear network in ‘Fish, Chips, Ketchup’).

import math

class Neuron:
    def __init__(self, bias):
        self.B = bias
        self.W = []
        self.dEdz = 0.0

""" The logistic function """
def Sigmoid(z):
    return 1 / (1 + math.exp(-z))

""" Generate neuron output from inputs"""
def FeedForward_Neuron(inputs, bias, weights):
    z = bias
    for i in range(len(inputs)):
        z += inputs[i] * weights[i]
    return Sigmoid(z)

We start with the list of weights being empty; we will fill these in as we build up the network of neurons.

As in previous posts, the code is not Pythonic. Here, there are no ‘def’ functions (‘methods’) defined within any class. All functions are outside which means they require all the information used to be passed as parameters to the function. This is to make it clear what the dependencies are. Python examples of back-propagation available elsewhere on the interweb will used classes properly and use vector operations where I have used for loops.

A Layer of Neurons

A neuronal layer is then just an array of neurons. The biases and weights of the neurons all get initialized to random values before any training is done.

Updating a neuronal layer is just updating each neuron in turn.

import random

class NeuronLayer:
    def __init__(self, num_neurons, num_inputs):
        self.Neuron = [] # Build up a list of neurons
        for n in range(0, num_neurons):
            print_info(3,"  Neuron[%d]" % (n))
            # Add a neuron to the layer, with a random bias
            print_info(3,"    Bias = %.3f" % self.Neuron[n].B)
            # Give it random weights
            for i in range(0, num_inputs):
                self.Neuron[n].W.append(random.random()) # Initialized randomly
                print_info(3,"    Weight[%d] = %.3f" % (i, self.Neuron[n].W[i]))

def FeedForward_Layer(inputs, layer):
    outputs = []
    for neuron in layer.Neuron:
        neuron.X = inputs
        y = FeedForward_Neuron(neuron.X, neuron.B, neuron.W)
        neuron.Y = y
    return outputs

The Neural Network

A complete multilayer network can then be created, with:

  • a particular number of inputs and outputs,
  • a particular number of layers
  • a particular number of neurons in each layer.

And we can use this network by feeding it input signals which propagate up through the layers to then return the outputs.

The application calls for 3 inputs and 2 outputs. Typically, the number of layers is 2 but you can configure for more than this (for so-called ‘deep’ networks). Here, we configure the network as follows:

num_inputs   = 3
num_outputs  = 2
num_neurons_in_layer = [4, num_outputs] # num. neurons in each layer from inputs up to output
# The num. neurons in the top (output) layer is the same as the num. output ports
output_layer = len(num_neurons_in_layer)-1 # Layer number

The num_neurons_in_layer variable defines the number of layers as well as the number of neurons in each layer. You can experiment with the number of neurons.

To actually create the network, we use:

Net = []
for L in range(len(num_neurons_in_layer)):
    if L==0: # Input layer
        i = num_inputs
        i = num_neurons_in_layer[L-1] # (Fully connected to lower layer)
    print_info(1, "Create layer %d with %d neurons and %d inputs" % (L, num_neurons_in_layer[L], i))
    Net.append(NeuronLayer(num_neurons = num_neurons_in_layer[L], num_inputs = i))


For actual usage, we just apply the inputs then update each layer in turn from the input layer forward to the output layer.

def FeedForward_Net(inputs, Net):
    for L in range(len(Net)): # Up through all layers
        print_info(3, "  Feed-Forward layer Net[%d]" % L)
        if L==0:
            y = FeedForward_Layer(inputs, Net[L])
            y = FeedForward_Layer(y, Net[L])
    return y


In the trivial example here, we test the network by applying all input combinations, as defined in the truth table training set.

def Test_Network(Net, Training_Set):
    print("Test Network:")
    for i in range(8):
        Training_Input, Training_Output = Training_Set[i]
        print("  %d+%d+%d" % (Training_Input[0], Training_Input[1], Training_Input[2]), end="")
        result = FeedForward_Net(Training_Input, Net)
        rounded_result = [round(result[0]), round(result[1])]
        print(" = %d%d"       % (rounded_result[CARRY], rounded_result[SUM]), end="")
        print(" (%.3f, %.3f)" % (        result[CARRY],         result[SUM]), end="")
        if rounded_result == Training_Output:
            print(" correct")
            print(" bad")

Not surprisingly, the network does not behave as desired before it is trained. There will be just a 50:50 chance that the output will be correct.

Test_Network(Net, Training_Set)

Training with Back-Propagation

Now we want to train the network to behave according to the application (in this case, to behave like a full adder circuit). We train using the ‘back-propagation’ algorithm. This involves:

  1. Applying the inputs for a particular training set and propagate these forward to produce the outputs.
  2. Seeing how the outputs differ from what you want them to be (what the training set outputs say they should be). The mismatch is called the ‘error’ E.
  3. For each neuron in the output layer, working out what change to the signal z (remember: z = Σwi.xi + b and y = σ(z)) would be needed to make the output correct (i.e. make E=0). This is ∂E/∂z, the ‘partial derivative’ of the error with respect to z.
  4. For each layer working from that output layer back to the input layer, repeat the above operation for each neuron. Setting ∂E/∂z will require using the weights and ∂E/∂z values of the neurons of the higher layers. (We are propagating the error derivatives back through the layers.)
  5. We update the weights of each neuron by deriving a partial derivative of the error with respect to the weight ∂E/∂w (derived from the ∂E/∂z values already calculated). We adjust each weight by a small fraction of this change, determined by the ‘learning rate‘) ε so that a weight w is changed to become w+ ε.∂E/∂w. We do the same with the biases ∂E/∂b.

We perform the above operations for each item in the training set in turn. Over many iterations, the weights converge on values that produce the desired behaviour (hopefully); this is called ‘gradient descent’.

As an example, consider how to modify the weight w12 that connects neuron n1 in layer 1 to a neuron n2 in layer 2 where this is in a 3-layer network. The error derivatives ∂E/∂z3 in the higher layer, 3, have already been calculated.

The error derivative for n2 is calculated using the ‘chain rule’, multiplying the derivatives of everything along the signal path:

∂E/∂z2 = ∂E/∂z3 . ∂z3/∂y2 . ∂y2/∂z2.

This result is used for both:

  1. Continuing to propagate back to lower layers (in this case, just layer 1), and
  2. Calculating the derivative for the weight adjustment:

∂E/∂w12 = ∂E/∂z2 . ∂z2/∂w12.

For more details, I recommend Matt Mazur’s…excellent worked-out example and Pythonic Python code.

def calc_dEdy(target, output):
    return -(target - output)

# Derivative of the sigmoid function:
def dydz(y):
    return y * (1 - y)

def calc_dEdz(target, output):
    # Are these vectors or scalars?
    return calc_dEdy(target, output) * dydz(output);

def dzdw(x):
    return x # z=sum(w[i].x[i]) therefore dz/dw[i]=x[i]

LEARNING_RATE = 0.5 # Often denoted by some Greek letter - often epsilon

# Uses 'online' learning, ie updating the weights after each training case
def Train_Net(Net, training_inputs, training_outputs):
    # 0. Feed-forward to generate outputs
    print_info(2,"  Feed-forward")
    FeedForward_Net(training_inputs, Net)

    for L in reversed(range(len(Net))): # Back through all layers
        print_info(2,"  Back-prop layer Net[%d]" % (L))
        if L == output_layer: # Output layer
            # 1. Back-propagation: Set Output layer neuron dEdz
            for o in range(len(Net[L].Neuron)): # For each output layer neuron
                print_info(3,"    Back-prop Net[%d].Neuron[%d]" % (L, o))
                print_info(3,"    %d" % (training_outputs[o]))
                print_info(3,"    calc_dEdz(%.3f, %.3f)" % (training_outputs[o], Net[L].Neuron[o].Y))
                Net[L].Neuron[o].dEdz = calc_dEdz(training_outputs[o], Net[L].Neuron[o].Y)
            # 2. Back-propagation: Set Hidden layer neuron dE/dz = Sum dE/dz * dz/dy = Sum dE/dz * wih
            for h in range(len(Net[L].Neuron)):
                print_info(3,"    Back-prop Net[%d].Neuron[%d]" % (L, h))
                dEdy = 0
                for output_neuron in range(len(Net[L+1].Neuron)):
                    dEdy += Net[L+1].Neuron[output_neuron].dEdz * Net[L+1].Neuron[output_neuron].W[h]
                Net[L].Neuron[h].dEdz = dEdy * dydz(Net[L].Neuron[h].Y)
    # 3. Update output layer neuron biases and weights: dE/dw = dE/dz * dz/dw
    for L in range(len(Net)): # Up through all layers
        print_info(2,"  Update weights in layer Net[%d]" % (L))
        for n in range(len(Net[L].Neuron)):
            dEdb = Net[L].Neuron[n].dEdz * 1.0
            # dE/db = dE/dz * dz/db; dz/db=1 (the bias is like a weight with a constant input of 1)
            Net[L].Neuron[n].B -= LEARNING_RATE * dEdb # db = epsilon * dE/db
            for w in range(len(Net[L].Neuron[n].W)):
                dEdw = Net[L].Neuron[n].dEdz * dzdw(Net[L].Neuron[n].X[w])
                Net[L].Neuron[n].W[w] -= LEARNING_RATE * dEdw # dw = epsilon * dE/dw

We train the network until it is ‘good enough’. For that, we need a measure of how good (or how bad) the network is performing whilst we are training. That measure is derived by the Total_Error function. In this simple example, there are only 8 possible combinations of inputs so, at each training round, a training vector is randomly selected from the 8.

""" For reporting progress (to see if it working, or how well it is learning)"""
def Total_Error(Net, training_sets):
    Etotal = 0
    """ Use the first 8 training vectors as the validation set """
    num_validation_vectors = 8 # There are only 8 vectors in the Full-Adder example
    for t in range(num_validation_vectors):
        training_inputs, training_outputs = training_sets[t]
        FeedForward_Net(training_inputs, Net)
        Etotal += 0.5*(training_outputs[0] - Net[output_layer].Neuron[0].Y)**2
        Etotal += 0.5*(training_outputs[1] - Net[output_layer].Neuron[1].Y)**2
    return Etotal

Etotal = 0.0
for i in range(0, 100000):
    Training_Input, Training_Output = random.choice(Training_Set)
    if i%100==99:
        print_info(1,"Training iteration %d" % i, end="")
        print_info(3,"  %d+%d+%d" % (Training_Input[0], Training_Input[1], Training_Input[2]), end="")
        print_info(3,"  =  %d%d" % (Training_Output[CARRY], Training_Output[SUM]), end="")
    Train_Net(Net, Training_Input, Training_Output)
    if i%100==99:
        Etotal = Total_Error(Net, Training_Set)
        print_info(1,"  Validation E = %.3f" % Etotal)
        if Etotal < 0.02:

Testing the Trained Network

Then we test the network again to see how well it has been trained.

Test_Network(Net, Training_Set)

With the error threshold to stop training fixed at 0.02, you can experiment with changing the size and depth of the network and seeing how many training iterations it takes to get to that error threshold.

An example output is given below – the beginning and end at least…

Create Neural Net
Create layer 0 with 4 neurons and 3 inputs
Create layer 1 with 2 neurons and 4 inputs
Test Network:
  0+0+0 = 11 (0.834, 0.829) bad
  0+0+1 = 11 (0.864, 0.851) bad
  0+1+0 = 11 (0.877, 0.874) bad
  0+1+1 = 11 (0.893, 0.886) bad
  1+0+0 = 11 (0.857, 0.847) bad
  1+0+1 = 11 (0.880, 0.864) bad
  1+1+0 = 11 (0.890, 0.884) bad
  1+1+1 = 11 (0.901, 0.893) correct
Training iteration 99
  Validation E = 1.972
Training iteration 199
  Validation E = 1.881
Training iteration 299
  Validation E = 2.123
Training iteration 10099
  Validation E = 0.020
Training iteration 10199
  Validation E = 0.020
Test Network:
  0+0+0 = 00 (0.014, 0.080) correct
  0+0+1 = 01 (0.029, 0.942) correct
  0+1+0 = 01 (0.028, 0.946) correct
  0+1+1 = 10 (0.972, 0.065) correct
  1+0+0 = 01 (0.029, 0.936) correct
  1+0+1 = 10 (0.980, 0.062) correct
  1+1+0 = 10 (0.976, 0.061) correct
  1+1+1 = 11 (0.997, 0.922) correct

All together

Piecing all this code together so we have a single file to run…


def prompted_pause(s):
    import sys
    ok = input(s)
    if ok=="n" or ok=="N":
        print("Stopping here")

verbosity = 1

def print_info(v,s, end="DEFAULT"):
    if verbosity >= v:
        if end == "DEFAULT":
            print(s) # With newline
        elif end == "":
            print(s, end="") # Without newline
            print(s, end)

   Application: Full adder

# Full Adder example:
Training_Set = [
    # A  B  CI   S  CO
    [[0, 0, 0], [0, 0]],
    [[0, 0, 1], [0, 1]],
    [[0, 1, 0], [0, 1]],
    [[0, 1, 1], [1, 0]],
    [[1, 0, 0], [0, 1]],
    [[1, 0, 1], [1, 0]],
    [[1, 1, 0], [1, 0]],
    [[1, 1, 1], [1, 1]]
# Bit assignments...
SUM   = 1

print("Create Neural Net")

import math

class Neuron:
    def __init__(self, bias):
        self.B = bias
        self.W = []
        self.dEdz = 0.0

""" The logistic function """
def Sigmoid(z):
    return 1 / (1 + math.exp(-z))

""" Generate neuron output from inputs"""
def FeedForward_Neuron(inputs, bias, weights):
    z = bias
    for i in range(len(inputs)):
        z += inputs[i] * weights[i]
    return Sigmoid(z)

import random

class NeuronLayer:
    def __init__(self, num_neurons, num_inputs):
        self.Neuron = [] # Build up a list of neurons
        for n in range(0, num_neurons):
            print_info(3,"  Neuron[%d]" % (n))
            # Add a neuron to the layer, with a random bias
            print_info(3,"    Bias = %.3f" % self.Neuron[n].B)
            # Give it random weights
            for i in range(0, num_inputs):
                self.Neuron[n].W.append(random.random()) # Initialized randomly
                print_info(3,"    Weight[%d] = %.3f" % (i, self.Neuron[n].W[i]))

def FeedForward_Layer(inputs, layer):
    outputs = []
    for neuron in layer.Neuron:
        neuron.X = inputs
        y = FeedForward_Neuron(neuron.X, neuron.B, neuron.W)
        neuron.Y = y
    return outputs

A complete multilayer network can then be created,

# Configuration...
num_inputs   = 3
num_outputs  = 2
num_neurons_in_layer = [4, num_outputs] # num. neurons in each layer from inputs up to output
# The num. neurons in the top (output) layer is the same as the num. output ports
output_layer = len(num_neurons_in_layer)-1 # Layer number

Net = []
for L in range(len(num_neurons_in_layer)):
    if L==0: # Input layer
        i = num_inputs
        i = num_neurons_in_layer[L-1] # (Fully connected to lower layer)
    print_info(1, "Create layer %d with %d neurons and %d inputs" % (L, num_neurons_in_layer[L], i))
    Net.append(NeuronLayer(num_neurons = num_neurons_in_layer[L], num_inputs = i))

def FeedForward_Net(inputs, Net):
    for L in range(len(Net)): # Up through all layers
        print_info(3, "  Feed-Forward layer Net[%d]" % L)
        if L==0:
            y = FeedForward_Layer(inputs, Net[L])
            y = FeedForward_Layer(y, Net[L])
    return y


def Test_Network(Net, Training_Set):
    print("Test Network:")
    for i in range(8):
        Training_Input, Training_Output = Training_Set[i]
        print("  %d+%d+%d" % (Training_Input[0], Training_Input[1], Training_Input[2]), end="")
        result = FeedForward_Net(Training_Input, Net)
        rounded_result = [round(result[0]), round(result[1])]
        print(" = %d%d"       % (rounded_result[CARRY], rounded_result[SUM]), end="")
        print(" (%.3f, %.3f)" % (        result[CARRY],         result[SUM]), end="")
        if rounded_result == Training_Output:
            print(" correct")
            print(" bad")

Test_Network(Net, Training_Set)



def calc_dEdy(target, output):
    return -(target - output)

# Derivative of the sigmoid function:
def dydz(y):
    return y * (1 - y)

def calc_dEdz(target, output):
    # Are these vectors or scalars?
    return calc_dEdy(target, output) * dydz(output);

def dzdw(x):
    return x # z=sum(w[i].x[i]) therefore dz/dw[i]=x[i]

LEARNING_RATE = 0.5 # Often denoted by some Greek letter - often epsilon

# Uses 'online' learning, ie updating the weights after each training case
def Train_Net(Net, training_inputs, training_outputs):
    # 0. Feed-forward to generate outputs
    print_info(2,"  Feed-forward")
    FeedForward_Net(training_inputs, Net)

    for L in reversed(range(len(Net))): # Back through all layers
        print_info(2,"  Back-prop layer Net[%d]" % (L))
        if L == output_layer: # Output layer
            # 1. Back-propagation: Set Output layer neuron dEdz
            for o in range(len(Net[L].Neuron)): # For each output layer neuron
                print_info(3,"    Back-prop Net[%d].Neuron[%d]" % (L, o))
                print_info(3,"    %d" % (training_outputs[o]))
                print_info(3,"    calc_dEdz(%.3f, %.3f)" % (training_outputs[o], Net[L].Neuron[o].Y))
                Net[L].Neuron[o].dEdz = calc_dEdz(training_outputs[o], Net[L].Neuron[o].Y)
            # 2. Back-propagation: Set Hidden layer neuron dE/dz = Sum dE/dz * dz/dy = Sum dE/dz * wih
            for h in range(len(Net[L].Neuron)):
                print_info(3,"    Back-prop Net[%d].Neuron[%d]" % (L, h))
                dEdy = 0
                for output_neuron in range(len(Net[L+1].Neuron)):
                    dEdy += Net[L+1].Neuron[output_neuron].dEdz * Net[L+1].Neuron[output_neuron].W[h]
                Net[L].Neuron[h].dEdz = dEdy * dydz(Net[L].Neuron[h].Y)
    # 3. Update output layer neuron biases and weights: dE/dw = dE/dz * dz/dw
    for L in range(len(Net)): # Up through all layers
        print_info(2,"  Update weights in layer Net[%d]" % (L))
        for n in range(len(Net[L].Neuron)):
            dEdb = Net[L].Neuron[n].dEdz * 1.0
            # dE/db = dE/dz * dz/db; dz/db=1 (the bias is like a weight with a constant input of 1)
            Net[L].Neuron[n].B -= LEARNING_RATE * dEdb # db = epsilon * dE/db
            for w in range(len(Net[L].Neuron[n].W)):
                dEdw = Net[L].Neuron[n].dEdz * dzdw(Net[L].Neuron[n].X[w])
                Net[L].Neuron[n].W[w] -= LEARNING_RATE * dEdw # dw = epsilon * dE/dw

""" For reporting progress (to see if it working, or how well it is learning)"""
def Total_Error(Net, training_sets):
    Etotal = 0
    """ Use the first 8 training vectors as the validation set """
    num_validation_vectors = 8 # There are only 8 vectors in the Full-Adder example
    for t in range(num_validation_vectors):
        training_inputs, training_outputs = training_sets[t]
        FeedForward_Net(training_inputs, Net)
        Etotal += 0.5*(training_outputs[0] - Net[output_layer].Neuron[0].Y)**2
        Etotal += 0.5*(training_outputs[1] - Net[output_layer].Neuron[1].Y)**2
    return Etotal

Etotal = 0.0
for i in range(0, 100000):
    Training_Input, Training_Output = random.choice(Training_Set)
    if i%100==99:
        print_info(1,"Training iteration %d" % i, end="")
        print_info(3,"  %d+%d+%d" % (Training_Input[0], Training_Input[1], Training_Input[2]), end="")
        print_info(3,"  =  %d%d" % (Training_Output[CARRY], Training_Output[SUM]), end="")
    Train_Net(Net, Training_Input, Training_Output)
    if i%100==99:
        Etotal = Total_Error(Net, Training_Set)
        print_info(1,"  Validation E = %.3f" % Etotal)
        if Etotal < 0.02:

See how it behaves now, after training.
Test_Network(Net, Training_Set)
Posted in Uncategorized | Tagged , , , , , | Leave a comment

Firing and Wiring

Brains essentially are ‘just a bunch of neurons’  which are connected to one another by synapses. A neuron will ‘fire’ when there is enough activity (firing) on its synapses. The network learns by modifying the strengths of those synapses. When both sides of a synapse are active around the same time, the synapse will be strengthened. When they are out of sync, the synapse will weaken.

This is summarized by Donald Hebb’s  famous slogan:

‘neurons that fire together, wire together’

often continued as

‘and out of sync, fail to link.’

Artificial Neural Nets are inspired by the real Neural Nets that are our brains. Hopfield Networks were an early form of artificial neural network – one in which

‘neurons that fire together, wire together’

is the central concept.

Here I provide some Python code to demonstrate Hopfield Networks.

Unapologetically Unpythonic

As noted elsewhere, the code here is very ‘unpythonic’. It does not use library functions and vectorizing to make the code efficient and compact. It is written as a C programmer learning Python might write it, which highlights the underlying arithmetic operations and complexity within the nested for loops. Conversion to efficient Python code is ‘left as an exercise for the reader’.

Alternatively, you could just look at ‘code-affectionate’s posting that I gratefully acknowledge, which similarly introduces Hopfield Networks but with pythonic code.

An Online Python Interpreter

Another beginner’s approach to Python is to use an online interpreter rather than downloading and installing one.

Open in a new window…

The white region on the left hand side of the page is the ‘editor’ region where code can be written then run (click on ‘run’) with the output appearing in the ‘console’ region (black background) on the right hand side. Alternatively, code can be written directly into the console.

Running the ‘editor’ program resets everything in the console; any objects previously defined will be forgotten. So, where I introduce code below, it is easiest if you just copy and paste it at the end of the ‘editor’ code and then re-run the whole lot.

This interpreter is then a sandbox for you to play around in. You can make changes to the code or enter different commands into the console and see what happens.

MICR Application

We are going to train a tiny Hopfield network to recognize the digits 0…9 from an array of pixels where there is some noise affecting some of the pixels. This is like MICR (magnetic ink character recognition) where human-readable digits printed in magnetic ink on cheques (bank checks) were stylized such that they were also machine-readable.

E13B MICR font digits

The E13B MICR font digits for MICR (Magnetic Ink Character Recognition)

But here, to keep things simple, the character set is just built on a tiny 4 x 5 pixel array…

MICR-like characters in a tiny (4 x 5) array

MICR-like characters in a tiny (4 x 5) array

… and the resulting 20-neuron network will have a paltry learning ability which will demonstrate the limitations of Hopfield networks.

Here goes…

The digits are defined in Python as…

Num = {} # There's going to be an array of 10 digits

Num[0] = """

Num[1] = """

Num[2] = """

Num[3] = """

Num[4] = """

Num[5] = """

Num[6] = """

Num[7] = """

Num[8] = """

Num[9] = """

A function is used to convert those (easily human-discernable) 4 x 5 arrays into a 20-element list of plus and minus ones for the internal processing of the Hopfield network algorithm. (This pythonic code has been copied from ‘code-affectionate’)

import numpy
def Input_Pattern(pattern):
    return numpy.array([+1 if c=='X' else -1 for c in pattern.replace('\n','')])

digit = {}
for i in range(0, 10):
    digit[i]     = Input_Pattern(Num[i])

Typing ‘digit[1]’ into the console will show you how a ‘1’ is represented internally.

Another function converts that internal representation into a 20-bit number just for reporting purposes…

def State_Num(pattern):
    state_num = 0
    for x in range(0,20):
         if pattern[x]==1:
            state_num += (1 << x)          #print("x = %d; bit = %d; s = %d" % (x, pattern[x], state_num))     return state_num state_num = {} for i in range(0, 10):     state_num[i] = State_Num(digit[i])     print("Digit %2d state number 0x%x" % (i, state_num[i])) 

We are going to add random errors to the digits and see how well the network corrects them. That is, whether the network recognizes them as being one of the 10 particular digits upon which it has been trained.

 import copy import random def Add_Noise(pattern, num_errors):     # (We need to explicitly 'copy' because Python arrays are 'mutable'...)     noisy = copy.deepcopy(pattern)     if num_errors > 0:
        for i in range(0, num_errors):
            pixel = random.randint(0, 19) # Choose a pixel to twiddle
            noisy[pixel] = -noisy[pixel] # Change a -1 to +1 or vice versa
    return noisy
    # Note: It can choose the same pixel to twiddle more than once
    #       so the number of pixels changed may actually be less

And to help see what is going on, we are going to have a function to display patterns…

def Output_Pattern(pattern):
Display a 4x5 digit array.
for x in range(0,20):
if pattern[x]==1:
print("●", end="")
print(" ", end="")
if x % 4 == 3 :

Putting these components together, we can see noisy patterns that we will use to test our Hopfield network…

for i in range(0, 10):
print("n = %d; s = 0x%5x" % (i, state_num[i]))

print("A noisy digit 1 with 3 errors...")
Output_Pattern(Add_Noise(digit[1], 3))

Now onto the main event.

We have a 20-neuron network (just one neuron per pixel) and we train it with some digits. Each neuron is (‘synaptically’) connected to every other neuron with a weight.

At the presentation of each number, we just apply the Hebbian rule: we strengthen the weights between neurons that are simultaneously ‘on’ or simultaneously ‘off’ and weaken the weights when this is not true.

def Train_Net(training_size=10):
    weights = numpy.zeros((20,20)) # declare array. 20 pixels in a digit
    for i in range(training_size):
        for x in range(20): # Source neuron
            for y in range(20): # Destination neuron
                if x==y:
                    # Ignore the case where neuron x is going back to itself
                    weights[x,y] = 0
                    # Hebb's slogan: 'neurons that fire together wire together'.
                    weights[x,y] += (digit[i][x]*digit[i][y])/training_size
                    # Where 2 different neurons are the same (sign), increase the weight.
                    # Where 2 different neurons are different (sign), decrease the weight.
                    # The weight adjustment is averaged over all the training cases.
    return weights

training_size = 3 # just train on the digits 0, 1 and 2 initially
weights = Train_Net(training_size)

Whereas training was trivially simple, to ‘recall’ a stored ‘memory’ requires more effort. We inject an input pattern into the network and let it rattle around inside the network (updating due to the synchronous firing on neurons and dependent on the weights of the synapses between those neurons) until it has settled down…

def Recall_Net(weights, state, verbosity=0):
    for step in range(25): # 25 iterations before giving up
        prev_state_num = State_Num(state) # record to detect if changed later

        new_state = numpy.zeros(20) # temporary container for updated weights
        for neuron in range(0,20): # For each neuron
            # Add up the weighted inputs from all the other neurons
            for synapse in range(0, 20):
                # (When i=j the weight is zero, so this doesn't affect the result)
                new_state[neuron] += weights[neuron,synapse] * state[synapse]
        # Limit neuron states to either +1 or -1
        for neuron in range(0,20):
            if new_state[neuron] < 0:                 state[neuron] = -1             else:                 state[neuron] = 1         if verbosity >= 1:
            print("Recall_Net: step %d; state number 0x%5x" % (step, State_Num(state)))
        if verbosity >= 2:
        if State_Num(state) == prev_state_num: # no longer changing
            return state # finish early
    if verbosity >= 1:
        print("Recall_Net: non-convergence")
    return state

We now test this recall operation …

print("Recalling an error-free '1'...")
Recall_Net(weights, digit[1], verbosity=2)

And we now test this recalling when there is some added noise. In this example, the noise is added deterministically rather than randomly so that you can get the same results as me.

I use a ‘1’ digital but set all the pixels on the top row to +1…


…and this does the recall of this character…

print("Recalling a '1' with errors...")
noisy_digit = Add_Noise(digit[1], 0)
Recall_Net(weights, noisy_digit, verbosity=2)

This shows the state of the network over successive iterations, until it has settled into a stable state.

Recall_Net2: step 0; state number 0xfbba3
 ● ●
●● ●
●● ●

Recall_Net2: step 1; state number 0xfbbaf
 ● ●
●● ●
●● ●

Recall_Net2: step 2; state number 0xfbbbf
●● ●
●● ●
●● ●

Recall_Net2: step 3; state number 0xfbbbf
●● ●
●● ●
●● ●

Unfortunately, it is the wrong stable state!

As an example of how this recall function can be expressed more pythonically

def Recall_Net_Pythonically(weights, patterns, steps=5):
    from numpy import vectorize, dot
    sgn = vectorize(lambda x: -1 if x<0 else +1)
    for _ in xrange(steps):
        patterns = sgn(dot(patterns,weights))
    return patterns

(This is not quite a fair comparison as it cannot output any debug information, controlled by the ‘verbosity’ flag.)

Wrapping training and recall into an ‘evaluation’ function allows us to test the network more easily…

def Evaluate_Net(training_size, errors, verbosity=0):
    # Training...
    weights = Train_Net(training_size)
    # Usage...
    successes = 0
    print("Tsize = %2d   "  % training_size, end="")
    print("   Error pixels = %2d    " % errors, end="")
    for i in range(training_size):
        noisy_digit = Add_Noise(digit[i], errors)
        recalled_digit = Recall_Net(weights, Add_Noise(digit[i], errors), verbosity)
        if State_Num(digit[i]) == State_Num(recalled_digit):
            successes += 1
            if verbosity == 0: print("Y", end="")
            else: print(" Correct recall")
            if verbosity == 0: print("N", end="")
            else: print(" Bad recall")
    print("   Success = %.1f%%" % (100.0*successes/training_size))

Training the network with 3 numbers with 1 bad pixel or without any bad pixels works OK…

print("Training 3 digits with no pixel errors")
Evaluate_Net(3, 0, verbosity=0)
print("Training 3 digits with just 1 pixel in error")
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)
Evaluate_Net(3, 1, verbosity=0)

… whereas trying with 2, 3 or 4 errors only works some of the time…

print("Training 3 digits with 2 pixels in error")
print("Training 3 digits with 2 pixels in error")
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
Evaluate_Net(3, 2, verbosity=0)
print("Training 3 digits with 3 pixels in error")
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
Evaluate_Net(3, 3, verbosity=0)
print("Training 3 digits with 4 pixels in error")
Evaluate_Net(3, 4, verbosity=0)
Evaluate_Net(3, 4, verbosity=0)
Evaluate_Net(3, 4, verbosity=0)
Evaluate_Net(3, 4, verbosity=0)
Evaluate_Net(3, 4, verbosity=0)
Evaluate_Net(3, 4, verbosity=0)

But the big problem here is trying to train the network with more digits.

It doesn’t work even with error-free input for just one more digit…

print("Training more digits but with no pixel errors")
Evaluate_Net(training_size=4,  errors=0, verbosity=0)
Evaluate_Net(training_size=5,  errors=0, verbosity=0)
Evaluate_Net(training_size=6,  errors=0, verbosity=0)
Evaluate_Net(training_size=7,  errors=0, verbosity=0)
Evaluate_Net(training_size=8,  errors=0, verbosity=0)
Evaluate_Net(training_size=9,  errors=0, verbosity=0)
Evaluate_Net(training_size=10, errors=0, verbosity=0)

The network just doesn’t have the capacity to learn more digits. Learning new digits results on old ones getting forgotten. This is the problem with Hopfield networks. They need around 7 or more neurons per training item. The network here just doesn’t have enough neurons and has a limit consistent with this.

More typical neural nets are ‘non-recurrent’ and employ back-propagation:

  • There are no loops in the network. Paths through the network run from inputs through one or more neurons to outputs but never back on themselves.
  • Usage (‘recall’) is easy and literally straight-forward: the calculations are performed from inputs, forward, through to the outputs.
  • Training is more complex, using the back-propagation algorithm to determine synaptic weights (more on that later).

In contrast, learning in Hopfield networks is easy and recall requires more effort.

Hopfield networks are more obviously in keeping with the biological brain:

  • They are recurrent.
  • Recall is performed by presenting a stimulus to which the network responds, eventually settling down on a particular state.
  • There is a process that is obviously analogous to Hebbian learning, in which ‘neurons that fire together – wire together’.
Posted in Uncategorized | Tagged , , , | Leave a comment

Fish, Chips and Ketchup

Fish, chips and the International Herald Tribune

Fish, chips and the International Herald Tribune

During his PhD years in Edinburgh, Geoffrey and his experimental psychology chums would often stop by the chippy after a night on the town. Geoffrey would queue up with his order of x1 pieces of fish, x2 lots of chips and x3 sachets of ketchup (yes, they charge for ketchup in Edinburgh!). Unable to focus his blurry eyes on the price list, he would estimate what the total would come to in order to ensure he had enough cash.

If he had been able to remember all the previous ordering history (‘first occasion: 3 pieces of fish, 4 lots of chips and 2 sachets of ketchup cost £1.10’), he would have been able solve the problem exactly after a few visits. But he didn’t – he just remembered the best guesses after the previous visit to the chippy.

But no worries. He treated the problem as a linear neural network and knew how to modify his best guesses after each visit well. He was also lucky in choosing a learning rate, ε, of 0.05 and so it only took 18 visits to the chippy before he was within tuppence of the right amount which he thought was good enough.

This almost certainly doesn’t bear any resemblance to the reality of why Prof Hinton (the ‘Godfather of Deep Learning’) chose to teach linear neural networks with an introductory example of fish, chips and ketchup.

But explaining how it works through a mathematical explanation of ‘the delta rule’ for fast ‘gradient descent’

∆wi= ε xi (t-y)

…is beyond most people whereas a large number of school child now learn to program in Python. I think playing around with some Python code would be a demystifying introduction to neural networks for many. So here is some code to help with this…

A very simple example of the learning of a linear neural network
# This is coded explicitly for fish, chips and ketchup
# for teaching clarity rather than being generalized.

from numpy  import exp      # For setting the learning rate
from random import randint  # For generating random chippy orders
MAX_ITERATIONS = 2000 # Number of visits to the chippy before giving up.
START_PRINTS   = 10   # Number of iterations reported on at the start.
STOP_ERROR     = 0.03 # Error margin - good enough to stop
cost = {'fish': 0.20, 'chips': 0.10, 'ketchup': 0.05} # This is the menu

def print_status_line(iteration, price, error): # Reporting of results at each iteration
    print ("%4d  Fish £%.2f, Chips £%.2f, Ketchup £%.2f, error £%.2f"
           % (iteration, price['fish'], price['chips'], price['ketchup'], error))

for e in range(1,7):
   # Set the learning rate 'epsilon' to exponentially slower values at each iteration
   epsilon = exp(-e)
   print ("Case %d: learning rate = %.3f" % (e, epsilon))

   weight = {'fish': 0.30, 'chips': 0.05, 'ketchup': 0.02} # Initial guesses
   error = (abs(weight['fish']-cost['fish'])
          + abs(weight['chips']-cost['chips'])
          + abs(weight['ketchup']-cost['ketchup']))
   print_status_line(0, weight, error)

   for n in range(1, MAX_ITERATIONS+1):
      # Just randomly set what this particular menu order is...
      portions = {'fish': randint(1, 5), 'chips': randint(1, 5), 'ketchup': randint(1, 5)}
      target_price = (weight['fish']*portions['fish']
                    + weight['chips']*portions['chips']
                    + weight['ketchup']*portions['ketchup'])
      actual_price = (portions['fish']*cost['fish']
                    + portions['chips']*cost['chips']
                    + portions['ketchup']*cost['ketchup'])
      # Difference in output...
      residual_error = target_price - actual_price
      # Condition for halting loop...
      prev_error = error
      error = (abs(weight['fish']-cost['fish'])
             + abs(weight['chips']-cost['chips'])
             + abs(weight['ketchup']-cost['ketchup']))
      # Adjust the weights
      for i in ['fish', 'chips', 'ketchup']:
         delta_weight = epsilon * portions[i] * residual_error
         weight[i] -= delta_weight

      # Output display and automatic halting on divergence or convergence...
      if abs(error) > 4.0*abs(prev_error):
          print_status_line(n, weight, error)
          print ("      Halting because diverging")
      if (error <= STOP_ERROR) :
          print_status_line(n, weight, error)
          print ("      Halting because converged")
      if (n <= START_PRINTS):
          print_status_line(n, weight, error)
      if (n == MAX_ITERATIONS) :
          print_status_line(n, weight, error)
          print ("      Halting but not yet converged")

Note: this Python code is written for clarity – for understanding by people not intimately familiar with the Python language – rather than for conciseness and efficiency. It is unapologetically ‘unpythonic’.

Running it produces the output…

Case 1: learning rate = 0.368
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.29, Chips £0.03, Ketchup £0.01, error £0.18
   2  Fish £0.32, Chips £0.06, Ketchup £0.05, error £0.19
   3  Fish £-0.71, Chips £-0.14, Ketchup £-0.78, error £0.16
   4  Fish £12.15, Chips £12.72, Ketchup £15.30, error £1.98
      Halting because diverging
Case 2: learning rate = 0.135
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.32, Chips £0.08, Ketchup £0.04, error £0.18
   2  Fish £0.28, Chips £0.05, Ketchup £-0.04, error £0.15
   3  Fish £0.24, Chips £0.03, Ketchup £-0.06, error £0.23
   4  Fish £0.36, Chips £0.60, Ketchup £0.51, error £0.22
   5  Fish £-1.41, Chips £-2.35, Ketchup £-1.26, error £1.12
      Halting because diverging
Case 3: learning rate = 0.050
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.22, Chips £0.00, Ketchup £0.00, error £0.18
   2  Fish £0.29, Chips £0.17, Ketchup £0.17, error £0.16
   3  Fish £0.12, Chips £-0.04, Ketchup £0.13, error £0.28
   4  Fish £0.33, Chips £0.13, Ketchup £0.17, error £0.29
   5  Fish £0.22, Chips £0.02, Ketchup £0.10, error £0.29
   6  Fish £0.22, Chips £0.02, Ketchup £0.10, error £0.15
   7  Fish £0.20, Chips £0.01, Ketchup £0.07, error £0.15
   8  Fish £0.21, Chips £0.07, Ketchup £0.12, error £0.12
   9  Fish £0.18, Chips £0.05, Ketchup £0.04, error £0.11
  10  Fish £0.19, Chips £0.06, Ketchup £0.06, error £0.08
  18  Fish £0.21, Chips £0.11, Ketchup £0.04, error £0.02
      Halting because converged
Case 4: learning rate = 0.018
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   2  Fish £0.29, Chips £0.04, Ketchup £0.01, error £0.18
   3  Fish £0.30, Chips £0.07, Ketchup £0.04, error £0.18
   4  Fish £0.26, Chips £0.06, Ketchup £0.03, error £0.14
   5  Fish £0.25, Chips £0.06, Ketchup £0.03, error £0.11
   6  Fish £0.25, Chips £0.06, Ketchup £0.03, error £0.11
   7  Fish £0.26, Chips £0.07, Ketchup £0.04, error £0.11
   8  Fish £0.26, Chips £0.08, Ketchup £0.04, error £0.10
   9  Fish £0.26, Chips £0.08, Ketchup £0.04, error £0.09
  10  Fish £0.26, Chips £0.08, Ketchup £0.04, error £0.09
  44  Fish £0.22, Chips £0.09, Ketchup £0.05, error £0.03
      Halting because converged
Case 5: learning rate = 0.007
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   2  Fish £0.30, Chips £0.06, Ketchup £0.02, error £0.18
   3  Fish £0.30, Chips £0.06, Ketchup £0.03, error £0.17
   4  Fish £0.30, Chips £0.06, Ketchup £0.02, error £0.17
   5  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.17
   6  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.17
   7  Fish £0.29, Chips £0.05, Ketchup £0.02, error £0.18
   8  Fish £0.29, Chips £0.05, Ketchup £0.02, error £0.18
   9  Fish £0.29, Chips £0.04, Ketchup £0.02, error £0.18
  10  Fish £0.29, Chips £0.04, Ketchup £0.01, error £0.18
 152  Fish £0.21, Chips £0.09, Ketchup £0.04, error £0.03
      Halting because converged
Case 6: learning rate = 0.002
   0  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   1  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   2  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   3  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   4  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   5  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   6  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   7  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   8  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
   9  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
  10  Fish £0.30, Chips £0.05, Ketchup £0.02, error £0.18
 389  Fish £0.21, Chips £0.09, Ketchup £0.04, error £0.03
      Halting because converged

Hinton and python

… …which shows:

  1. How the error (the total mismatch between ‘Geoffrey’s’ best guesses and the actual costs) generally (but not always) decrease, leading towards the correct answer,
  2. Fewer iterations are required for faster learning rate (higher values of ε) but that the guesses actually diverge when ε increases beyond some particular point.

Incidently, Prof Hinton was also introduced to python at an early age…

Posted in Uncategorized | Tagged , , , | 1 Comment

Consciousness and Zombies


Common Sense Consciousness

There are common-sense notions of what consciousness is about which tell us:

  • We are consciousness when we are awake,
  • We are not consciousness if we are asleep except when we are dreaming,
  • People under anaesthetic are not consciousness.
  • People in a coma are not consciousness but those suffering from ‘locked in’ syndrome are.
  • People have a single consciousness. It is not that there are multiple consciousnesses within them.
  • There is no higher consciousness – groups of people are not conscious.
  • Machines are not conscious.

But these can be wrong. For example, to take the last point, there is the danger of us being ‘biochauvinist’, failing to recognize that non-biological stuff can be conscious in any way.

We Need a Theory

Much has been said on the nature of consciousness by philosophers but, as with much of philosophy, it is pre-scientific. We are still grappling with the problem to find a way to make it scientific where we can progress beyond speculating by testing hypotheses – predicting and quantifying them. It is like we are at the same stage as the ancient Ionian philosophers were when speculating about the physical nature of the universe. For example:

  • Thales speculated that ‘everything is water’ and provided reasons for his argument,
  • Anaximenes speculated that ‘everything is air’ and provided reasons for his argument, and
  • Heraclitus speculated that ‘everything is change’ and provided reasons for his argument.

No amount of speculation on its own could have ever led anyone to our current understanding of the physical world, involving quantum theory and relativity. Our understanding has developed through a long series of theories that have all been refuted as being ‘wrong’ but were necessary steps to make progress.

We have been lacking theories which would provide the first step towards a scientific understanding of the fundamentals of consciousness. This is ‘proto-science’ – at the start of the scientific process. We need to have a theory that is scientific in that it describes consciousness in wholly physical terms and that, given a specific physical state, can predict whether there is consciousness. As there is progress, theories and methods get established into what we normally understand as ‘science’. It can then provide useful applications. For example, a good theory would provide us with 100% success rate in avoiding ‘anaesthesia awareness’. It must agree with our common-sense understanding of consciousness to some degree but it may surprise us. For example, it might tell us:

  • We are consciousness throughout the time we are asleep – the difference is that our experiences are not laid down in memory.
  • In some specific circumstances, machines and/or groups of people can be conscious.

Integrated Information Theory

Giulio Tononi’s 2004 ‘Integrated information theory’ (IIT) of consciousness has been described by Christof Koch as

“the only really promising fundamental theory of consciousness”


In it, Tononi proposes a measure named after the Greek letter φ (‘phi’) which is the amount of ‘integrated information’ of a system. Consciousness is a fundamental property of the universe which arises wherever φ > 0. It is therefore a form of ‘panpsychism’ – consciousness can arise anywhere. The higher the value of φ, the larger the amount of consciousness. Consciousness is a matter of degree. Humans have large brains and very large φ and are highly conscious. Small rodents have smaller φ and are therefore less conscious. But sleeping humans must have a lower φ than wakeful rodents.

I have previously posted about Tononi’s theory, by providing an overview of his book ‘Phi: A voyage from the Brain to the Soul’. The book is a curious fusion of popular science and fiction and so, disappointingly avoids all technicalities involved with the theory and the calculation (quantification) of φ.

In one form of the ‘Integrated Information Theory’, φ is calculated as:





In short, φ is a measure of the information flow within a system. It is essentially formulated back from wanting (!) the following:

  • The information flow between humans is much much less than the information flow within a human brain.
  • The distinguishing indicator between wakefulness and REM sleep versus non-REM sleep is that there is a large drop in ‘long’ range’ communication in the latter – information flow is much more localised.

And this (necessarily) leads to the conclusions we ‘want’:

  • We are not conscious in non-REM sleep or in a coma but are at other times, including if suffering from locked-in syndrome.
  • There is not a consciousness associated with a group of people.

A positive φ requires the mutual flow of information within the system – between parts of the system, there is flow in both directions. In short, there are loops and ‘internal states’ i.e. memory. Tononi provides a metaphor of a digital camera. A 10-megapixel camera sensor provides 10 megabits of information but there is no integration of that information and no memory. In contrast:

  • The human visual system combines information from neighbouring rod and cone photo-receptors in the retina before the information gets to the cortex of the brain, and
  • There are more connections in the brain going from the ‘higher’ levels down towards the retina than there are going in the opposite direction.

A camera sensor has zero φ. so there is no consciousness. But a thermostat has memory (precisely 1 bit capacity) and a loop because of its hysteresis. It has some small positive value of φ. Hence is has some (absolutely minimal) degree of consciousness!

This all sounds like a crack-pot theory but it is being taken seriously by many. Tononi’s academic specialization is on sleep but he has worked at Gerald Edelman’s Neurosciences Institute, La Jolla, working with Gerald Edelman on metrics for brain complexity. This has evolved into his metric for consciousness. (Incidentally, he has also worked with Karl Friston who was also at the Neurosciences Institute at the same time). Christof Koch is now collaborating with Tononi on the theory. My point: he is not someone on the fringes of this academic field.

Cynically, we might say that the theory has credibility because there is so very little else of substance to go on. We need to recognize that this is all still just ‘proto-science’.

IIT 3.0

The ‘Integrated Information Theory’ has gone through two major revisions. The original ‘IIT 1.0’ from 2004 was superceded by ‘IIT 2.0’ in 2008 and ‘IIT 3.0’ in 2014.

‘IIT 1.0’ and ‘IIT 2.0’ based measures of ‘effective information’ (ei) on entropy – the effective information was an average ‘Kullback–Leibler divergence’ (alternatively termed ‘relative entropy’). This may sound familiar: entropy and the Kullback–Leibler divergence also feature in Karl Friston’s ‘Variational Free Energy’ theory of generalized brain function.

But ‘IIT 3.0’ uses a different metric for ‘effective information’. The basis of this is known:

  • in mathematical circles by the formal term of the ‘Wasserstein distance’, and
  • in computer science circles by the (literally) more down-to-earth term of the ‘Earth Mover’s Distance’ (EMD)

Imagine the amount of earth that a digger would have to move to make a pile of earth of a particular shape (‘distribution’) into the shape of another (these piles of earth represent probability distributions). When applied to simple binary distributions, this just reduces to the ‘Hamming distance’ used in Information Theory for communication systems.

Two Circuits

Unlike previous editions, ‘IIT 3.0’ explicitly provided an example that I find rather incredible.

Figure 21 of ‘IIT 3.0’ shows 2 circuits, A and B (see below). The circuits consist of circles connected together with red and black arrows. The circles are ‘nodes’. The arrows are signals which are inputs to and outputs from the nodes. My interpretation of these diagrams is as follows:

  • Black arrows mark ‘excitatory’ connections.
  • Red lines with a dot at one end mark ‘inhibitory’ connections (going to the end with the dot).
  • At each node, the input values are added (for excitatory connections, effectively scaled by 1) or subtracted (for inhibitory connections, effectively scaled by -1). If they meet the criterion marked at the node (e.g ‘>=2’) then each output will take the value 1 and otherwise it will be 0.
  • Time advances in fixed steps (let us say 1 millisecond, for convenience) and all nodes are updated at the same time.
  • The diagrams colour some nodes yellow to indicate that the initial value of a node output is 1 rather than 0 (for a white node).


Figure 21. Functionally equivalent conscious and unconscious systems.

 The caption for the figure reads:

(A) A strongly integrated system gives rise to a complex in every network state. In the depicted state (yellow: 1, white: 0), elements ABDHIJ form a complex with ΦMax = 0.76 and 17 concepts. (B) Given many more elements and connections, it is possible to construct a feed-forward network implementing the same input-output function as the strongly integrated system in (A) for a certain number of time steps (here at least 4). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain. The transition from the first layer to the second hidden layer in the feed-forward system is assumed to be faster than in the integrated system (τ << Δt) to compensate for the additional layers (A1, A2, B1, B2)

The caption concludes with a seemingly outrageous statement on zombies and consciousness which I will come back to later on.

Unfortunately, in the figure:

  • With the ‘integrated system’, I cannot reproduce the output sequence indicated in the figure!
  • With the ‘feed-forward system’, it is difficult to determine the actual directed graph from the diagram but, from my reasonable guess, I cannot reproduce the output sequence indicated in this figure either!

But there are strong similarities between Tononi’s ‘integrated system’ versus ‘feed-forward system’ and ‘IIR filters’ versus ‘FIR filters’ in Digital Signal Processing that are more than coincidental. It looks like Tononi’s two ‘complexes’ as he calls them are derived from IIR and FIR representations. So I am going to consider digital filters instead.

IIR Filters

An input signal changes over time, but only at discrete time intervals. For the purposes of this example, assume there is a new sample every millisecond. There is an input stream of samples around time t:

X[t], X[t+1], X[t+2], X[t+3], X[t+4] and on.

And there is an output stream of samples:

Y[t], Y[t+1], Y[t+2], Y[t+3], Y[t+4] and on.

A simple filter that smoothes out changes in input ‘samples’ can be formed by averaging the input with the previous output value:

Ya(t) = ½.Xa(t) + ½.Ya(t-1)

This is a filter of a type called an ‘infinite impulse response’ (IIR) filter. A diagram for an IIR filter is shown below:


A ‘z-1’ indicates a delay of 1ms. The b, a0 and a1 boxes are multipliers (b, a0 and a1 are the constant values by which the signals are multiplied) and the ‘Σ’ circle sums (adds). The diagram shows a ‘second order’ filter (two delays) but I will only consider a first order one:

b = 1/2

a1 = 1/2

a0 = 0

A single non-zero value within a series of zero values is called an ‘impulse’:

X = … 0, 0, 0, 0, 1, 0, 0, 0, 0, …

If this impulse is fed into a filter, the resulting output from that impulse is called the ‘impulse response’. For the IIR filter it will be as follows:

Y = … 0, 0, 0, 0, 0.5, 0.25, 0.125, 0.0625, …

that is:

Y(1) = 1/2

Y(2) = 1/4

Y(3) = 1/8

Y(4) = 1/16

and in general form:

Y(t) = 2t.

so there is some non-zero (but infinitesimally small) output at very high t – the response carries on infinitely and this is why the filter is called an ‘infinite impulse response filter’.

If we put a ‘step’ into the IIR filter…

X = … 0, 0, 0, 0, 1, 1, 1, 1, 1 …

we get a ‘step response’ out, which shows the smoothing of the transition:

Y = … 0, 0, 0, 0.500, 0.750, 0.875, 0.938, 0.969, 0.984, 0.992, …

This IIR filter is the equivalent to Tononi’s ‘integrated system complex’.

FIR Filters

The DSP equivalent to Tononi’s ‘feed-forward system complex’ is a ‘finite impulse response’ (FIR) filter:

Y(t) = b0.X(t) + b1.X(t)1) + b2.X(t-2) + b3.X(t-3) + … + bN-1.X(t-N+1))

A diagram corresponding to this FIR filter (of ‘order N-1’) is shown below:


Here, the triangles are multipliers and the ‘+’ circles obviously add.

Now, we can try to get a FIR filter to behave very similarly to an IIR filter by setting its coefficients

b0 , b1 , b2 , b3 … bN-1

to be the same as the first N terms of the IIR’s impulse response. The values after t=5 are quite small so let’s set N=6:

b0 = 1/2

b1 = 1/4

b2 = 1/8

b3 = 1/16

b4 = 1/32

b5 = 1/64

so the transfer equation is:

Y(t) = (1/2).X(t) + (1/4).X(t -1) + (1/8).X(t -2) + (1/16).X(t -3) + (1/32).X(t -4) + (1/64).X(t -5)

and the step responses is then:

Y = … 0, 0, 0, 0.500, 0.750, 0.875, 0.9375, 0.96875, 0.984375, 0.984375,  …

The FIR’s ‘impulse response’ only lasts for 6 samples – it is finite, hence why the filter is called a ‘finite impulse response filter’. The output is not dependent on any input value from more than 6 samples prior.  but the first 6 output samples following an impulse will be the same as that of the IIR’s and so behave in a very similar way.

(Note: The output never gets any higher than 0.984375 – the sum of all the coefficients)

IIR and FIR Filters are alike but not the same

This is exactly the same situation as described by Tononi:

Reiterating Tononi’s figure caption:

Given many more elements and connections, it is possible to construct a ‘feed-forward’ network implementing the same input-output function as the ‘integrated system’ in (A) for a certain number of time steps (here at least 4). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain. …

And then there is the punchline that I omitted previously…

… Despite the functional equivalence, the ‘feed-forward system’ is unconscious, a “zombie” without phenomenological experience.

So it is true with the digital filters:

Given more elements and connections, it is possible to construct a FIR filter implementing the same input-output function as the IIR filter for a certain number of time steps (here 6). This is done by unfolding the elements over time, keeping the memory of their past state in a feed-forward chain.

and hence

… Despite the functional equivalence, the FIR filter is unconscious, a “zombie” without phenomenological experience (unlike the IIR filter)!

For the FIR filter, there are no loops in the network – the arrows all point south/east and the value is then φ=0, in contrast with the non-zero φ for the IIR filter which does have loops.

To anyone that understands digital signal processing, the idea that an IIR filter has some consciousness (albeit tiny) whereas an equivalent FIR filter does not is absurd. This is an additional absurdity beyond that of the panpsychist idea that any filter could have consciousness in the first place.

Could Androids Dream of Electric Sheep?

In a previous talk (‘Could Androids Dream of Electric Sheep?’) I considered whether something that behaved the same way as a conscious human would also be conscious.

If something behaves the same way as a conscious human, we can still deny that it is not conscious because it is just an imitation. We would not credit a computer running Joseph Weizenbaum’s famous ELIZA program as being genuinely conscious in the same way as we are (although the Integrated Information Theory would grant it as having some lower value of φ, but one that is still greater than zero).

A narrower philosophical question is whether a computer running a simulation (‘emulation’) of a human brain would be conscious. (Yes – ‘whole brain simulation’  is not possible – yet.) A simulation at a sufficiently low level can show the same phenomenon as the real object (such as ‘getting wet’ in a rainstorm in a weather simulation.) In this case, the ‘same thing’ is going on, but just implemented in a different physical substrate (electronic transistors rather than gooey biological stuff); a functionalist would say that the simulation is conscious by virtue of it being functionally the same.

The yet narrower argument is if the physical construction of the ‘simulation’ was the same. It would no longer be a simulation but a direct (atom-by-atom) copy. Anyone insisting on this can be accused of being ‘bio-chauvinist’ in denying that computer simulations are conscious. But it is still possible that consciousness is not duplicated. For example, if whatever it is that causes consciousness is at a sub-atomic level, an atom-for-atom copy might miss this out. How would we know?

I took a functionalist position.

However, the example above shows that, according to the ‘Integrated Information Theory’, it is possible for two systems to be functionally the same (caveat: almost) but for one to be conscious whilst the other is not. In short – that (philosophical) zombies can exist.

But any ‘system’ is just a component in a larger system. It is not clear to me whether, if one component with φ>0 is substituted with a functionally identical one with φ=0, that the φ of the larger system is reduced. In a larger system, the loop-less φ=0 implementation ends up with loops around it.

To be continued (eventually, hopefully).

Posted in Uncategorized | Tagged , , , , , , , , , | 2 Comments

Brexit and the Brain


On this blogsite up to now, I have touched on many of the sub-fields of philosophy – the philosophy of mind, consciousness, epistemology, philosophy of science and, most recently, ethics. The biggest sub-field not covered is politics.

But then came ‘Brexit’.

Thinking about Brexit has reminded me of many of the ideas within past posts. So here, in a bit of a departure from the normal, I try to relate Brexit to these ideas. It is not really a foray into political philosophy. It is about the cognitive processes behind the political event. It might provide you with some food for thought about Brexit. And the Trump phenomenon too, for that matter.

I’ll start by summarizing apposite ideas from past posts:


Intelligence and Knowledge

Intelligence is about adapting and responding appropriately to circumstances, particularly when they are complex and changing. An important aspect is the ability to make predictions.  A central topic of this blogsite is that of the idea of the brain as a hierarchy of predictors  (Hohwy’s ‘predictive brain’ thesis and Friston’s ‘variational free energy’ theory) that is continuously trying to minimize of surprise, through action and perception. These brain theories are closely related to ideas around bio-inspired ‘artificial neural networks’ that are now making significant strides in various artificial intelligence applications (threatening to take away many white-collar jobs in the near-future).

Our ability to predict events in the world outside improves over our lifetime. Knowledge grows. In the early stages of life, the forest of neurons is very plastic hence highly adaptable but very ‘impressionable’ to stimulus. When mature, the brain has become wise – good at anticipating events in the environment that it has grown up in. But it can get ‘stuck in its ways’ if that environment has now changed. Keynes is famously supposed to have said:

“When the facts change, I change my mind. What do you do, sir?”

But the difficulty is in accepting that the new facts are valid, because they do not cohere with everything else you know.

I have related this mechanistic learning process to Susan Haack’s epistemological ‘foundherentist’ theory which is a synthesis of the competing correspondence and coherence theories of truth. New information modifies one’s knowledge if it both (i) corresponds to how things seem to behave in the outside world and (ii) if it coheres with the other knowledge within one’s head.



Embedded within the totality of our knowledge is our worldview – the big picture of how the world appears to us. It is cultural. We grow up within the culture of our parents’ environment and it evolves within us. Our worldview is a bit different from that of our parents. Our children’s will be a bit different too. But only a bit. If it changes too much, the culture is broken.

The traditional Western philosophy has been one of a non-material Cartesian mind acting within an absolutist objective world of facts; we should be perfectly rational. But our modern understanding is of an evolved, physical mind. Our understanding of how knowledge works has been influenced by the reactions to the horrors of totalitarianism central Europe by Kuhn, Feyerabend, Polanyi and Lakatos.

People are separately building models within their brains of the same (shared) environment – but those models are not the same. People do not believe in things that are objectively right or wrong. They do not believe in just anything. They believe in things because they work – they correspond and cohere. Their knowledge, embodied within the connectome, is neither objective/absolutist nor subjective/relativist. It is a middle course. But still, some brains make better predictions in particular circumstances than others.


Cognitive Biases

So it seems that our thinking falls short of the simple, pure, logical rationality required for decision-making the 21st Century world.  We have cognitive biases that seem to distort our thinking. For example, there is ‘anchoring’ (already hinted at), in which early information (when ‘impressionable’) has a disproportionate influence on our thinking compared with later information (when ‘mature’).

From the work of Tversky, Kahneman, Gigerenzer and Tetlock (focussed on politics and economics decision-making but generally applicable), we understand that these biases are the result of evolution and have endowed us with a cognitive toolbox of tricks that can make decisions in a timely manner that are ‘good-enough’. Much of this is intuitive. Our thinking is more complex, more efficient but less rational.

In our search for meaning, we tend to want to pull our ideas together to create some greater ‘truth’. Experts are liable to focus on a learnt ideology of grand overarching principles – of too much coherence than is warranted. Computers can deal with the mass of data to maintain correspondence between events in the outside world and their predictions and hence can outperform the experts. But straightforward heuristic tricks (such as the ‘recognition heuristic’ – that things we haven’t heard of will tend to be less important than those we have) mean that amateurs can often outperform the theories of experts!



So, much of our thinking is irrational and intuitive. But our thinking is also affected by emotion.

A most basic emotion is fear. The basic animal state of nature is continuous anxiety – to be constantly alert, fearfully anticipating potential life-threatening events.  But we need to balance risk. We cannot be completely risk-averse (hiding in a dark room). We must explore the world around us when the risk is low in order to have learnt what to do for when the risk is high.


Social Cohesion

And well-being is improved by cooperation with others around us. Biological mechanisms of motherhood (such as the neurotransmitter oxytocin) give rise to caring for those immediately around us. Knowing our place within the hierarchy of society reduces physical violence within our community (but the potential for violence means that we do not have an improved feeling of well-being). The flip-side of the empathy that we feel towards those within our ‘in-group’ community who are like ourselves is that it emboldens us against the ‘out-group’ beyond.

Over time, we learn how those around us behave. Through familiarity, we can predict how others will behave in particular circumstances and can imagine how they see us. We have a ‘theory of mind’ – an ability to recognise that others may think differently from you. We formulate how reputable others are and understand that other do that to us. We have a reputation. With established reputations, we can cooperate, able to trust one another. However, we have no knowledge of how reputable strangers from outside our community are. Hence we treat them with suspicion. But that suspicion reduces with more frequent contact. Strangers become less strange, particularly if they are associated with reputable institutions. This allows societies to grow beyond the size where everyone knows everyone else. To act morally is to balance our wants with those of others – to get inside the mind of others to understand what they want and to take that into consideration.



Classic case examples such as Phineas Gage and Charles Whitman show that physical effects on the brain cause different behaviour. This challenges our traditional notions of free will and responsibility. We are a product of our environment. In a classic legal case example, murderer Richard Loeb was spared the death penalty because it was successfully argued that did not choose the (privileged) environment in which he grew up.

But if transgressors cannot be blamed for their deeds, then equally the successful cannot be praised for their achievements. They feel proud of their achievements that are a result of their personal abilities. Little is credited to fortunate circumstances in which are born and grow up.

(Note: a lack of traditional responsibility does not mean that a transgressor is not sanctioned in some way and it does not mean we do not promote positive examples.)


Affluent Societies

Various research indicates that (i) moral behaviour and reasoning of those at the top of the social tree differs from that of the rest of us, and (ii) individuals in affluent societies behave differently from those in less affluent ones.

In short, the affluent are less empathetic. They are more likely to prioritize their own self-interests above the interests of others (simple example:  they are less likely to stop for pedestrians at crossings) Piff calls this ‘the asshole effect’! In contrast with traditional intuitive, emotional responses, they favour more ‘rational’ utilitarian choices such as being more prepared to take resources from one person to benefit several others. They have a higher sense of entitlement.

Charitable donations are one indicator of the consideration given to others. Being rich does not generally confer greater generosity. But being married, older, living in rural rather than urban areas or living in a mixed rather than segregated social neighbourhood all correlate with high donations. So does regular attendance of religion services which can simply be attributed to being reminded of the needs of others on a regularly basis.

A general picture emerges of how affluent ‘Western’ societies differ from those with lower GDPs. There is less empathy for those immediately around us. People are more individualistic and self-indulgent. Relationships have less commitment. People live in an urban environment in which social interaction is anonymous and transactional rather than proximate (‘up close and personal’). There is higher monetization.  (Regardless of status, just thinking about money decreases empathy, shifting the balance from others to oneself.) We are less dependent on other specific people and their goodwill. If we want something, we can just buy it with the minimum of personal interaction, from an anonymous provider. There is a high degree of social connectedness but this is not with those outside our own social spheres and there is less interaction with those living in our immediate vicinity. It is a case of ‘out of sight; out of mind’.

But the flip-side of this is that the affluent are more likely to interact with members of the out-group – to be less xenophobic.



Now, applying all these ideas to Brexit…


Confirmation Bias

It is generally agreed that the quality of the political debate during the referendum campaign was dire. Leave campaigners appealed to those with a Leave worldview. Remain campaigners appealed to those anchored with a Remain worldview. These worldviews were formed long before the referendum; they were as good as instinctive. Remain arguments did not fit into the Leave worldview and Leave arguments did not fit into the Remain worldview. Confirmation bias reigned. Arguments became increasingly coherent, but this was because of reduced correspondence to reality! There would be no £350 million a week and there would be no World War III. There may have been undecideds to be swayed from an unconscious worldview to a conscious voting intention but I suspect that it actually changed the minds of very few.


The Failure of Empathy

Recasting what was said above in terms of Brexit, Remainers were more affluent, more self-sufficient and less empathetic than Leavers. They were more likely to prioritize their own self-interests above the interests of others. In contrast to the traditional intuitive, emotional responses of poorer Leavers, they favoured more ‘rational’ choices. The Remain argument was that of the financial impact of Brexit. It was in terms of money, and monetization decreases feelings of empathy. Being older and living in rural rather than urban areas correlate with empathy – and correlated with Leavers. But this empathy was for those within the in-group. The flip-side of this empathy effect (such as the effect of Oxytocin) is that Leavers are less trusting of those in the out-group.


The Failure of Trust

From within a Leave worldview, a vote to Remain was a self-interested vote to maintain the status quo. Remain voted as ‘Homo economicus’ – as rational self-interested agents, without caring about the opinions of others. Leavers heard the Remain campaigners’ claims about the bad economic consequences but rejected them because of a failure of trust. The bad reputation of individuals campaigning for Remain was inherited from the institutions with which they were associated with – the institutions of the elite. These were the politicians and ‘greedy banksters’ of the Establishment whose reputations had been destroyed in the eyes of the public as self-interested in the extreme.


The Failure of Experts

Part of this Establishment were the ‘experts’ whose reputation was now tarnished by their inability to predict. Among them were the inability to predict the failure of the banking system and the inability to predict election outcomes. It may be that their expertise was based on a world which has now changed. Some scepticism about expert opinion was justified.


The Failure to Think

Too many Leavers did not think. They accepted things to be true because they wanted them to be true. They did not question them. It was a failure to think for themselves. The stereotypical view from within the Remain worldview was that a vote to Leave was a vote based on ignorance and stupidity; there is some truth in this.

But too many Leavers did not think either – or think much. A large proportion of the Remain vote will not have given much thought to the vote because the correct way to vote was obvious and no further thought was deemed necessary. They did not question whether there might be any merits of Brexit.


The Failure of Morality

I have defined morality as being about balancing our wants against those of others – to get inside the mind of others to understand what they want and to take that into consideration. To want to do the balancing requires intellect and for us to care about the other.

Leavers tended to see the issue in terms of the others – as an issue of inequality. The ‘elite’ others did not seem to care about them. They could see that it would be in the interest of the others to vote Remain. They balanced their wants against those of the other and came down firmly on the side of their own faction’s wants. (When might they have another opportunity for this cri de cœur?)

It was noted previously that there are no issues that are purely moral. A moral aspect is just one of many aspects of a problem. Brexit had moral aspects and well as economic and other aspects. In short:

  • Leavers saw the moral aspect., but
  • Remainers (skewed towards higher intellect) saw only the economic aspect.

Remainers may well find this assertion to be outrageous!


Mindlessness and Heartlessness

So, Leavers were mindless and Remainers were heartless. Remainers did not empathize, or did not think that they should be empathizing. Leavers engaged in apparently mindless political vandalism. But it was not necessarily mindless. One telling comment on a blog after 23 June asked ‘what if voting Leave was the rational thing to do?’ To answer that, Remainers would be forced to think of what the other was thinking. And they might conclude it was not mindless political vandalism after all; it was just political vandalism.


The environment

We are all products of our environment. If we were brought up in a Remain environment (e.g. Cambridge) or Leave environment (e.g. Sunderland), would we have voted differently? Probably. If we recognize this, we will not demonize the other.



I have tried to fit one story into another – to fit a story about the epistemological and ethical aspects of a philosophical worldview into the political story of Brexit! It is far from a perfect match. I have not talked about economics or immigration or identity or globalization or other issues central to Brexit because they do not fit into the story of the brain here. But it is hopefully interesting and food for thought.

Returning to my favourite piece of graffiti:

“So many heads, so few brains.
So many brains, so little understanding.”

The first line is about a failure to think. The second line is about a failure to think about others. The first can be levelled against many Leavers. The second can be levelled against many Remainers.

We must look more to the future than the past. We must look backwards not to blame but to understand why people voted the way they did so that we might understand what might satisfy them. We need to get inside their minds (and the easiest way of doing that is to ask them!).

We can then look forwards – to how we can create a solution that is acceptable for a large majority of us (much more than 52%) – both Leavers and Remainers. Then we will heal the rift. We will see.


Mrs Varoufakis (allegedly) trying but failing to see one standpoint from the position of another.

Posted in Uncategorized | 1 Comment

Some Good Reason


This is the 19th part of the ‘Neural Is to Moral Ought’ series of posts. The series’s title comes from Joshua Greene’s opinion-piece paper

‘From Neural Is To Moral Ought: What are the moral implications of neuroscientific moral psychology?’

Here, I pick through Greene’s paper, providing responses to extensive quotes of his which refer back to a considerable number of previous parts of the series. His paper divides into 3 sections which I will examine in turn:

  1. The ‘is’/‘ought’ distinction
  2. moral intuition
  3. moral realism vs relativism


The ‘Is’/‘Ought’ Distinction

The paper’s abstract is:

Many moral philosophers regard scientific research as irrelevant to their work because science deals with what is the case, whereas ethics deals with what ought to be. Some ethicists question this is/ought distinction, arguing that science and normative ethics are continuous and that ethics might someday be regarded as a natural social science. I agree with traditional ethicists that there is a sharp and crucial distinction between the ‘is’ of science and the ‘ought’ of ethics, but maintain nonetheless that science, and neuroscience in particular, can have profound ethical implications by providing us with information that will prompt us to re-evaluate our moral values and our conceptions of morality.

and the body of the paper then starts:

Many moral philosophers boast a well cultivated indifference to research in moral psychology. This is regrettable, but not entirely groundless. Philosophers have long recognized that facts concerning how people actually think or act do not imply facts about how people ought to think or act, at least not in any straightforward way. This principle is summarized by the Humean dictum that one can’t derive an ‘ought’ from an ‘is’. In a similar vein, moral philosophers since Moore have taken pains to avoid the ‘naturalistic fallacy’, the mistake of identifying that which is natural with that which is right or good (or, more broadly, the mistake of identifying moral properties with natural properties).

This naturalistic fallacy mistake was committed by the now-discredited ‘Social Darwinists’ that aimed to ground moral philosophy in evolutionary principles. But:

.. the idea that principles of natural science might provide a foundation for normative ethics has won renewed favour in recent years. Some friends of ‘naturalized ethics’ argue, contra Hume and Moore, that the doctrine of the naturalistic fallacy is itself a fallacy, and that facts about right and wrong are, in principle at least, as amenable to scientific discovery as any others.

Only to a certain extent, I would say. It is true that the ‘ought’ is not logically bound to the ‘is’. We are free to claim that anything ought to be done. But ‘ought’ is substantially restricted by ‘is’. Moral theories cannot require us to do things which are outside of our physical control. ‘This is how we ought to think’ is constrained by ‘This is how we think’. For Greene,

… I am sceptical of naturalized ethics for the usual Humean and Moorean reasons.

Continuing, with reference to William Casebeer’s opinion piece in the same journal issue:

in my opinion their theories do not adequately meet them. Casebeer, for example, examines recent work in neuroscientific moral psychology and finds that actual moral decision-making looks more like what Aristotle recommends and less like what Kant and Mill recommend. From this he concludes that the available neuroscientific evidence counts against the moral theories of Kant and Mill, and in favour of Aristotle’s. This strikes me as a non sequitur. How do we go from ‘This is how we think’ to ‘This is how we ought to think’? Kant argued that our actions should exhibit a kind of universalizability that is grounded in respect for other people as autonomous rational agents. Mill argued that we should act so as to produce the greatest sum of happiness. So long as people are capable of taking Kant’s or Mill’s advice, how does it follow from neuroscientific data — indeed, how could it follow from such data — that people ought to ignore Kant’s and Mill’s recommendations in favour of Aristotle’s? In other words, how does it follow from the proposition that Aristotelian moral thought is more natural than Kant’s or Mill’s that Aristotle’s is better?

The ‘Neural Is to Moral Ought’ series started with an examination of (Mill’s) Utilitarianism, (Kant’s) Deontological ethics and (Aristotelian) Virtue Ethics in turn. All three approaches have their merits and deficiencies. Of the three, I am disinclined towards the dogmatism of Deontological ethics and particularly inclined towards Virtue Ethics because of its accounting for moral growth. The latter is more ‘natural’ because it is in keeping with how our brains physical learn as opposed to being treated as idealized reasoners or rule-followers.

Whereas I am sceptical of attempts to derive moral principles from scientific facts, I agree with the proponents of naturalized ethics that scientific facts can have profound moral implications, and that moral philosophers have paid too little attention to relevant work in the natural sciences. My understanding of the relationship between science and normative ethics is, however, different from that of naturalized ethicists. Casebeer and others view science and normative ethics as continuous and are therefore interested in normative moral theories that resemble or are ‘consilient’ with theories of moral psychology. Their aim is to find theories of right and wrong that in some sense match natural human practice. By contrast, I view science as offering a ‘behind the scenes’ look at human morality. Just as a well-researched biography can, depending on what it reveals, boost or deflate one’s esteem for its subject, the scientific investigation of human morality can help us to understand human moral nature, and in so doing change our opinion of it.

But this is too vague. It says virtually nothing. Greene suggests that something might be profound but provides no idea of how things might actually look ‘behind the scenes’.

Let’s take a step back to ask- what is the purpose of morality? Ethics is about determining how we ought to behave, but to answer that requires us to decide upon the purpose of human existence. Such metaphysical meaning has proved elusive except for religious communities. Without any divine purpose, we are left with deciding meaning for ourselves and the issue then is that our neighbour may find a different meaning which will then determine different behaviour. The conclusion is that the purpose of morality is the balancing of the wants of others against that of ourselves. But this requires us to consider:

  1. What do we want?
  2. How can we understand the wants of others?
  3. How can we cognitively decide?

All three considerations are ultimately grounded in the physics of our brains:

  1. We are free to want whatever we want, but we are all physically very similar so it should come as no surprise that we will have similar wants (food, water, shelter, companionship…).
  2. We need a ‘theory of mind’ (second-order intentionality) in order to understand that others may have wants of their own. We need an understanding of ‘reputation’ (third-order intentionality) to want to moderate our behaviour.
  3. We need a cognitive ability to deliberate in order to make moral choices (in short, to be able to make rational decisions).

(Even the religion opt-out eventually leads us back to the physical brain – how people learn, know and believe is rooted in the physical brain.)

In principle there is no connection between ‘is’ and ‘ought’ and a philosopher can propose any moral theory. But when they do, others provide counter-examples

which lead to prescribing absurd responses. All too often, the difficulty lies not in what should be done in practice but in trying to codify their moral theory and they end up modifying their theory rather than their action!

What if we try to combine the best elements of the three (Utilitarianism, Deontological ethics and Virtue Ethics) main moral theories in order to provide practical moral guidance? Such a synthesis was presented. Ignoring the details here, an extremely brief summary is:

  • We imagine the consequences of potential actions in terms of its effect on the collective well-being of all.
  • In the early stages of growth, we respond with the application of (learnt) simple rules.
  • The less clear-cut those rules are to the particular situation, the less confidence we have in them and we apply more conscious effort into assessing consequences.
  • This provides us with an ability to respond both to the ‘simple’ moral problems quickly and efficiently and to complex problems with considerable attention.
  • We gradually develop more subtle sub-rules that sit upon the basic rules and we learn to identify moral situations and then apply the rules and sub-rules with greater accuracy and speed. This is moral growth.

The resulting ‘mechanistic’ account of moral reasoning is remarkably similar to the ‘hierarchy of predictors’ (‘predictive brain’, ‘variational free energy’) theory of what the brain is doing generally. So, what the brain is doing when there is moral deliberation is basically the same as when there is non-moral deliberation. There is nothing particularly special about moral thinking.


Moral Intuition

Greene acknowledges the role of methods of determining judgements other than just ‘Pure Reason’:

There is a growing consensus that moral judgements are based largely on intuition — ‘gut feelings’ about what is right or wrong in particular cases. Sometimes these intuitions conflict, both within and between individuals. Are all moral intuitions equally worthy of our allegiance, or are some more reliable than others? Our answers to this question will probably be affected by an improved understanding of where our intuitions come from, both in terms of their proximate psychological/neural bases and their evolutionary histories.

He contrasts two moral dilemmas (both due to Peter Unger): Firstly, Case 1:

You are driving along a country road when you hear a plea for help coming from some roadside bushes. You pull over and encounter a man whose legs are covered with blood. The man explains that he has had an accident while hiking and asks you to take him to a nearby hospital. Your initial inclination is to help this man, who will probably lose his leg if he does not get to the hospital soon. However, if you give this man a lift, his blood will ruin the leather upholstery of your car. Is it appropriate for you to leave this man by the side of the road in order to preserve your leather upholstery? Most people say that it would be seriously wrong to abandon this man out of concern for one’s car seats.

And then Case 2:

You are at home one day when the mail arrives. You receive a letter from a reputable international aid organization. The letter asks you to make a donation of two hundred dollars to their organization. The letter explains that a two-hundred-dollar donation will allow this organization to provide needed medical attention to some poor people in another part of the world. Is it appropriate for you to not make a donation to this organization in order to save money? Most people say that it would not be wrong to refrain from making a donation in this case.

Now, most people think there is a difference between these scenarios:

  • the driver must give the injured hiker a lift, but
  • it would not be wrong to ignore the request for a donation.

In fact, we can imagine doing a Utilitarian calculation, trading off the benefits between the two situations, and concluding from that that it is more Utilitarian to donate the money it would cost to repair the leather upholstery to charity instead of helping the hiker. But we are then more likely to actually help the hiker anyway and refine the Utilitarian calculus somehow. We override our codified system because it feels like there is ‘some good reason’ why the decision is right. But Greene, like Peter Singer before him, thinks that, whatever that reason is, it is not a moral reason.

And yet this case and the previous one are similar. In both cases, one has the option to give someone much needed medical attention at a relatively modest financial cost. And yet, the person who fails to help in the first case is a moral monster, whereas the person who fails to help in the second case is morally unexceptional. Why is there this difference? About thirty years ago, the utilitarian philosopher Singer argued that there is no real moral difference between cases such as these two, and that we in the affluent world ought to be giving far more than we do to help the world’s most unfortunate people. (Singer currently gives about 20% of his annual income to charity.) Many people, when confronted with this issue, assume or insist that there must be ‘some good reason’ for why it is alright to ignore the severe needs of unfortunate people in far off countries, but deeply wrong to ignore the needs of someone like the unfortunate hiker in the first story. (Indeed, you might be coming up with reasons of your own right now.) Maybe there is ‘some good reason’ for why it is okay to spend money on sushi and power windows while millions who could be saved die of hunger and treatable illnesses. But maybe this pair of moral intuitions has nothing to do with ‘some good reason’ and everything to do with the way our brains happen to be built.

Green identifies the difference as being between ‘personal’ and ‘impersonal’ situations:

The dilemma with the bleeding hiker is a ‘personal’ moral dilemma, in which the  moral violation in question occurs in an ‘up-close-and-personal’ manner. The donation dilemma is an ‘impersonal’ moral dilemma, in which the moral violation in question does not have this feature. To make a long story short, we found that judgements in response to ‘personal’ moral dilemmas, compared with ‘impersonal’ ones, involved greater activity in brain areas that are associated with emotion and social cognition. Why should this be? An evolutionary perspective is useful here. Over the last four decades, it has become clear that natural selection can favour altruistic instincts under the right conditions, and many believe that this is how human altruism came to be. If that is right, then our altruistic instincts will reflect the environment in which they evolved rather than our present environment. With this in mind, consider that our ancestors did not evolve in an environment in which total strangers on opposite sides of the world could save each others’ lives by making relatively modest material sacrifices. Consider also that our ancestors did evolve in an environment in which individuals standing face-to-face could save each others’ lives, sometimes only through considerable personal sacrifice. Given all of this, it makes sense that we would have evolved altruistic instincts that direct us to help others in dire need, but mostly when the ones in need are presented in an ‘up-close-and-personal’ way. What does this mean for ethics? Again, we are tempted to assume that there must be ‘some good reason’ why it is monstrous to ignore the needs of someone like the bleeding hiker, but perfectly acceptable to spend our money on unnecessary luxuries while millions starve and die of preventable diseases. Maybe there is ‘some good reason’ for this pair of attitudes, but the evolutionary account given above suggests otherwise: we ignore the plight of the world’s poorest people not because we implicitly appreciate the nuanced structure of moral obligation, but because, the way our brains are wired up, needy people who are ‘up close and personal’ push our emotional buttons, whereas those who are out of sight languish out of mind.

This is just a hypothesis. I do not wish to pretend that this case is closed or, more generally, that science has all the moral answers. Nor do I believe that normative ethics is on its way to becoming a branch of the natural sciences, with the ‘is’ of science and the ‘ought’ of morality gradually melding together. Instead, I think that we can respect the distinction between how things are and how things ought to be while acknowledging, as the preceding discussion illustrates, that scientific facts have the potential to influence our moral thinking in a deep way.

But again, this is all rather vague.

Relating this to what I have previously discussed…

  • The ‘hierarchy of predictors’ model describes the way in which many levels compete with one another to influence behaviour (spreading from reflex to rational, via sensorimotor, emotional, subconscious and conscious levels . Lower levels will dominate action in familiar moral situations. But in unfamiliar circumstances or when the problem consists of two familiar reactions with contradictory actions, lower levels will less confident about their response and control will effectively be passed upwards for (slower) rational judgement. In a decision between helping the bleeding hiker and donating to charity, rational deliberation gets shut out by the lower level emotional and intuitive response.
  • Patricia Churchland shows that our caring originates in our brain, such as in the way that the greater density of oxytocin receptors in the nucleus accumbens and a greater density of vasopressin receptors in the ventral pallidum (both nucleii are part of the basal ganglia at the base of the forebrain) makes the significant difference in behaviour between the otherwise-similar (monogamous) Prairie Vole and Montane Voles. The ‘up-close-and-personal’ proximity effect of alloparenting expands this beyond the family to the ‘In-Group’. But oxytocin is not a magic bullet. It improves empathy with the In-Group but it actually works against Out-Group members.

The physical construction of the brain seems to provide one ‘some good reason’ why immediate ‘up close and personal’ situations elicit a moral response in the way that slowly-rationalized situations do not. (A frequent rational response of worldwide charities to appeal to us is not by presenting facts about the suffering of many, many thousands but it is to present an image of a single individual suffering, furnishing them with a name and a story of misfortune – to make the problem ‘up-close-and-personal’.)

If we truly do want to have a morality that does not prioritize those ‘up close’, then we need to provide some compensation mechanisms to our decision making – consciously equalizing out our emotions. But our emotions can play an important positive role. Empathy is a very significant factor in creating habits that underpin the balancing of the wants of others against the wants of oneself. Yes, we must learn the virtue of balancing others against ourselves, but we must also learn the virtue of balancing reason against our emotions.


Moral Realism

Greene then shifts attention to Moral Realism:

According to ‘moral realism’ there are genuine moral facts, whereas moral anti-realists or moral subjectivists maintain that there are no such facts. Although this debate is unlikely to be resolved any time soon, I believe that neuroscience and related disciplines have the potential to shed light on these matters by helping us to understand our common-sense conceptions of morality. I begin with the assumption (lamentably, not well tested) that many people, probably most people, are moral realists. That is, they believe that some things really are right or wrong, independent of what any particular person or group thinks about it. For example, if you were to turn the corner and find a group of wayward youths torturing a stray cat, you might say to yourself something like, “That’s wrong!”, and in saying this you would mean not merely that you are opposed to such behaviour, or that some group to which you belong is opposed to it, but rather that such behaviour is wrong in and of itself, regardless of what anyone happens to think about it. In other words, you take it that there is a wrongness inherent in such acts that you can perceive, but that exists independently of your moral beliefs and values or those of any particular culture.

I think torturing cats is not just wrong but universally wrong. Universally wrong means that it is wrong in all societies. Across societies, we understand sufficiently the same about what ‘wrongness’ and ‘morality’ actually mean that, when presented with a clear (black and white) moral case, we can all agree on whether that case is right or wrong. It is not that there is some absolute truth of the matter, just that similar agents understanding of common concepts leads to common knowledge. Universally wrong is not the same as absolutely (‘real-ly’) wrong.

Surveying cultures around the world across all civilisations, we find that they have surprisingly similarly moralities. It is not that one society accepts stealing but not murder and another accepts murder but not stealing! The differences are predominantly down to how liberal or conservative a society is. Liberal societies have a shorter list of vices than conservative ones. For example, the way an individual dresses is seen as a matter of aesthetics or custom for liberal (e.g. U.S) societies but a matter of morality for conservative (e.g. Muslim) societies.

There are clear cases of what is right and wrong that apply across most if not all human civilizations. It is in the less clear-cut cases that they differ and hence moral problems arise.

This realist conception of morality contrasts with familiar anti-realist conceptions of beauty and other experiential qualities. When gazing upon a dazzling sunset, we might feel as if we are experiencing a beauty that is inherent in the evening sky, but many people acknowledge that such beauty, rather than being in the sky, is ultimately ‘in the eye of the beholder’. Likewise for matters of sexual attraction. You find your favourite movie star sexy, but take no such interest in baboons. Baboons, on the other hand, probably find each other very sexy and take very little interest in the likes of Tom Cruise and Nicole Kidman. Who is right, us or the baboons? Many of us would plausibly insist that there is simply no fact of the matter. Although sexiness might seem to be a mind-independent property of certain individuals, it is ultimately in the eye (that is, the mind) of the beholder.

I have previously looked at how aesthetics and moral knowledge are just particular forms of knowledge. Moral knowledge is neither uniquely nor totally separate from the physical world of what ‘is’. Aesthetics is the same; it is dependent on things like our (neural) ability to perceive and on our emotions (such as disgust).

The big meta-ethical question, then, might be posed as follows: are the moral truths to which we subscribe really full-blown truths, mind-independent facts about the nature of moral reality, or are they, like sexiness, in the mind of the beholder?

Elsewhere, I have examined how truth is ‘in the mind of the beholder’ – that knowledge (crudely ‘facts’) grows within our brains, building upon earlier ‘facts’ such that it both corresponds with our personal experience and coheres with what else we know. The apparent universality of ‘facts’ (including moral knowledge) arises because we grow up:

  • in the same (or very similar) environment as others, and
  • in a shared culture, meaning that we (more explicitly) learn the same as others.

For our ‘rational’ upper levels, our lower levels (including our emotional urges) are just part of the environment in which we grow up (a very immediate part, mind you).

One way to try to answer this question is to examine what is in the minds of the relevant beholders. Understanding how we make moral judgements might help us to determine whether our judgements are perceptions of external truths or projections of internal attitudes. More specifically, we might ask whether the appearance of moral truth can be explained in a way that does not require the reality of moral truth. As noted above, recent evidence from neuroscience and neighbouring disciplines indicates that moral judgement is often an intuitive, emotional matter. Although many moral judgements are difficult, much moral judgement is accomplished in an intuitive, effortless way.

In my worldview, the appearance of moral truth does not require the reality of moral truth!

With the ‘hierarchy of predictors’ model of the brain, it should be expected that moral judgements, like judgements of other forms of knowledge, are typically accomplished in an intuitive, effortless way – by the lower levels of the hierarchy. It is what we do with the exceptional, difficult decisions that is interesting – those decisions that are propagated up to the higher levels that have our conscious attention.

We are limited by the specifics of our physiology and neurology associated with the instruments that our senses  (although we can now build external instruments to extend our senses). We cannot like or dislike what we cannot sense.

An interesting feature of many intuitive, effortless cognitive processes is that they are accompanied by a perceptual phenomenology. For example, humans can effortlessly determine whether a given face is male or female without any knowledge of how such judgements are made. When you look at someone, you have no experience of working out whether that person is male or female. You just see that person’s maleness or femaleness. By contrast, you do not look at a star in the sky and see that it is receding. One can imagine creatures that automatically process spectroscopic redshifts, but as humans we do not.

All of this makes sense from an evolutionary point of view. We have evolved mechanisms for making quick, emotion-based social judgements, for ‘seeing’ rightness and wrongness, because our intensely social lives favour such capacities, but there was little selective pressure on our ancestors to know about the movements of distant stars. We have here the beginnings of a debunking explanation of moral realism: we believe in moral realism because moral experience has a perceptual phenomenology, and moral experience has a perceptual phenomenology because natural selection has outfitted us with mechanisms for making intuitive, emotion-based moral judgements, much as it has outfitted us with mechanisms for making intuitive, emotion-based judgements about who among us are the most suitable mates.

Or much as natural selection has outfitted us with mechanisms for making intuitive, emotion-based judgements about anything.

Therefore, we can understand our inclination towards moral realism not as an insight into the nature of moral truth, but as a by-product of the efficient cognitive processes we use to make moral decisions. According to this view, moral realism is akin to naive realism about sexiness, like making the understandable mistake of thinking that Tom Cruise is objectively sexier than his baboon counterparts.

Both intuition and emotion play an important part in moral deliberation just as it does in other forms of deliberation.

Greene has just been making vague comments so far. But then he makes a comment that is acute:

Others might wonder how one can speak on behalf of moral anti-realism after sketching an argument in favour of increasing aid to the poor

to which his reply is

giving up on moral realism does not mean giving up on moral values. It is one thing to care about the plight of the poor, and another to think that one’s caring is objectively correct.

I have emphasized the importance of caring in creating a moral society and looked at its biological foundations. It is largely true that we act morally because we care.

… Understanding where our moral instincts come from and how they work can, I argue, lead us to doubt that our moral convictions stem from perceptions of moral truth rather than projections of moral attitudes.

A case has been presented of how our neurology promotes caring to extend, via oxytocin, alloparenting, group behaviour and institutional trust, to very large societies in which we care for complete strangers. This is how our moral convictions arise. Our morals are contingent on culture and environment and not on absolute moral truths. Our moral instincts that make us to help the injured hitchhiker (emotionally, quickly) and ignore the appeal through the letterbox (deliberatively, slowing, consciously) are built upon the ‘up close and personal’ origins of our caring. It could not be otherwise. Our logical/rational/deliberative higher levels of cognition are built (evolved) upon lower, quicker instinctive levels.

Some might worry that this conclusion, if true, would be very unfortunate.

First, it is important to bear in mind that a conclusion’s being unfortunate does not make it false.

This is true for moral determinism as well as moral instincts (our instincts are that we are free but the scientific evidence points towards determinism). The unfortunate conclusion of determinism all too often made is that the lack of free will and therefore cannot punish transgressors for actions they could not have avoided. And hence moral order dissolves.

Second, this conclusion might not be unfortunate at all.

I have argued elsewhere that we might not have ‘free will’ as conventionally understood but that will still have freedom and can still be held responsible. The moral order can be maintained. But furthermore, recognizing that some individuals do not have the control they are traditionally purported to have, we will be less retributive and we will be more prepared to intervene in order to design a society that further improves well-being (yes, in a scientific way).


Posted in Uncategorized | Tagged , , , , , , , , , , | Leave a comment

Getting Started on Deep Learning with Python


An Introduction to Deep Learning

In Karl Friston’s wonderfully entitled paper ‘The history of the future of the Bayesian brain’, he recalls his working with Geoffrey Hinton, how Hinton emphasized Bayesian formulations and generative models, and how Friston developed his biological minimization of ‘Variational Free Energy’ theory from Hinton’s ideas, adopting Hinton’s references to Free Energy, Kullback–Leibler divergence and Helmholtz and Boltzmann Machines within the field of artificial neural networks.

Hinton (co-)invented ‘Boltzmann Machines’ which are recurrent  artificial neural networks that have randomized weights or neuron function (i.e. ‘stochastic’) and he also invented fast learning algorithms for  ‘Restricted Boltzmann Machines’ (where neurons have connections to neurons in other layers but not to those in the same layer).

He modestly claims that his efforts over the decades led to a 10-fold increase in performance but that, during this time, Moore’s Law increased computing power by 100,000! Added to that was the new availability of large data sets with which to train networks.

But the result of all this was that ‘deep’ neural networks (those with more than 1 hidden layer i.e. those with more than 3 layers in total) were able to perform very good feature extraction in a reasonable time. Lower layers in the hierarchy extra simple features uppon which the higher layers can extract more and more elaborate features. This then resulted in a rapid commercialization of such algorithms for applications like speech recognition, as used in Google Voice search and Apple’s Siri.

So now the emeritus Professor Hinton is a founding father of ‘Deep Learning’ and works part-time at Google.

A new strand of posts here will look at Deep Learning and how it works. These will be based around the Python computer language. This ‘Introduction to Deep Learning with Python’ video by Alec Radford at indico talks through some Python code for optical character recognition. Below, I cover installing all the code and applications to be able to run the code shown in the video, to get us started.


Overview of Installing Python

To get this code running on a Windows PC, we need:

  1. The python source code.
  2. Python itself
  3. The NumPy maths package, required by the source code.
  4. The Theano numerical methods Python package, required by the source code.
  5. ‘Pip’ (‘Pip Installs Python’) – for installing Python packages!
  6. The ‘MinGW’ gcc compiler, for compiling the Theano package for much faster execution times.
  7. The MNIST data set of training and usage character bitmaps.


Installing Anaconda

Anaconda2 provides 3 of the above:

  • Python 2.7
  • NumPy
  • Pip

Go to:

and go to the ‘zipped Windows installers’ (to work whether behind a firewall or not).

Download the latest 32-bit version for Python 2:

Double-clicking on the downloaded ZIP file automatically pushes through to the Anaconda2-2.5.0-Windows-x86 application (Windows understands ZIP compression format). Double-click on this Anaconda2-2.5.0-Windows-x86  application to install Anaconda. Selecting to install ‘just for me’ will probably be easier hence install to the user area – C:/Users/User/Anaconda2_32. (Add the ‘_32’ suffix as in case we need to install a 64-bit installation later on.)

Have ‘yes’ ticked for adding Anaconda to PATH. Have ‘yes’ ticked for Anaconda to have the default Python 2.7. Installation then takes a while.


Installing the Main Python Packages

Locate the ‘Anaconda Prompt’ – easiest through the Windows search. This opens a command shell.

Go to the Anaconda2_32\Scripts directory:

cd Anaconda2_32\Scripts

‘Pip’ (pip.exe0 and ‘Conda’ (conda.exe) will be in here.

Installation will generally use Conda rather than Pip. Ensure you have the latest packages to install, but first ensure you have the latest Conda to install them!:

conda update conda

Select ‘y’ if not up to date. Continue:

conda update –all

Finally, install the desired packages:

conda install scipy

conda install numpy


Installing GCC for Compiling the Theano Package

The Theano numerical methods package can be interpreted but this will be very slow. Instead, the package should be compiled. For this, the MinGW (‘Minimalist Gnu for Windows’) compiler should be installed. Follow the link from:

to SourceForge to automatically download the setup executable:


into the Downloads directory.

Double-click this and install this. Select


as the install directory (for consistency with the Anaconda2-32 installation).


Setting the Path to point to GCC

To ensure that Conda will ‘see’ the compiler when doing the Theano installation, confirm that the PATH environment variable compiler points to it. Select:

Start -> Control Panel -> System -> Advanced -> Environment Variables

(Alternatively, in the Search window, type Environment and select ‘Edit the Environment Variables’.)

Double-click on ‘PATH’ and add MinGW to the start/top of the list. It should point to:






Installing the Theano Package

Then install the Gnu c++/g++ compiler to speed-optimize the Theano library. In the ‘Anaconda Prompt’ shell, ensure that you are in the correct directory:

cd \Users\User\Anaconda2_32\Scripts

and type:

conda install mingw libpython

And finally install the numerical methods python library ‘Theano’:

pip install theano.


Download the Example Python Code

The text with the YouTube video points to the code at:

and click ‘Download ZIP’. Double click on the downloaded ZIP and copy the Theano-Tutorials directory to C:\Users\User\Anaconda2.


Downloading the MNIST Character Dataset

The MNIST character dataset is available through Yann LeCun‘s personal website:

Windows cannot unzip ‘gzip’ (*.gz) files directly. I you don’t have an application to do this, download and run ‘7zip’:

Gzip (*.gz) need to be associated with ‘7zip’. Then double-click on each gzip file in turn and ‘extract’ the uncompressed files from them. These should all be installed under:


There is a mismatch between the filenames in the MNIST dataset and the file references in the Python code. Using the Windows Explorer, change the ‘.’ in all the filenames to a ‘-‘ e.g. rename train-images.idx3-ubyte to train-images-idx3-ubyte.


Running the Code

The Anaconda installation includes the ‘Spyder’ IDE for Python. Search for ‘Spyder Desktop App’ and run.

Browse to set the working directory (top right) to:


An open the first Python script (File -> Open):

This shows the source code.

Select  Run -> Run (F5) to execute this code.

Selecting other programs are likely to result in either a ‘memory error’ or ‘No module named foxhound.utils.vis’.

The memory error issue can be overcome by running the code from the Anaconda Prompt:

cd C:\Users\User\Anaconda2_32\Theano-Tutorials-master


This still means that and cannot be run, and what the other programs are actually doing hasn’t been discussed. That is left for another time.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment