1. The Bayesian Brain
Many other posts on this blogsite refer to Karl Friston’s section within ‘Variational Free Energy’ theory of the brain as a hierarchy of predictors that adapt internal models of the environment based on experience, essentially using Bayesian inference. This is the so-called ‘Bayesian Brain’.
Here, I look at the well-known ‘Monty Hall problem’ to explain, using Bayesian inference, why the problem’s answer is correct and why people so often choose the wrong answer.
2. Bayes Theorem
Bayes Theorem is
P(H|D).P(D) = P(D|H).P(H)
(Notation: P(H|D) denotes the probability of H, given that D is true.)
To get there we start with the obvious relationship that the probability of both events A and B occurring commutative.
P(A ^ B) = P(B ^ A)
P(A ^ B) = P(A).(P(B|A).
This can be graphically represented with a Karnaugh map as shown below:
- The 4 boxes represent the 4 possible combinations of A and B being either true or false.
- The 2 red boxes on the right represent the 2 combinations where A is true: P(A).
- The bottom right box in the picture below represents the combinations where B is also true: P(B|A).P(A).
But similarly by commutativity.
P(B ^ A) = P(B).(P(A|B)
and hence we get to Bayes theorem:
P(A).P(B|A) = P(B).P(A|B)
which is frequently rearranged to:
P(B|A) = P(B).P(A|B)/P(A)
3. What Does it Mean When a Girl Smiles at You Every Time She Sees You?
What does it mean when a girl smiles at you every time she sees you?
His answer is as follows:
It’s simple. Just use Bayes’ theorem.
The probability she likes you is
P(like|smile) = P(smile|like).P(like)/P(smile)
P(like|smile) is what you want to know – the probability she likes you given the fact that she smiles at you.
P(smile|like) is the probability that she will smile given that she sees someone she likes.
P(like) is the probability that she likes a random person.
P(smile) is the probability that she will smile at a random person.
For example, suppose she just smiles at everyone. Then intuition says that fact that she smiles at you doesn’t mean anything one way or another. Indeed, P(smile|like) = 1 and P(smile)=1, and we have
P(like|smile) = P(like)
meaning that knowing that she smiles at you doesn’t change anything.
At the other extreme, suppose she smiles at everyone she likes, and only those she likes. Then P(smile) = P(like) and P(smile|like) = 1. Then we have
P(like|smile) = 1
and she is certain to like you.
In the intermediate case, what you need to do is find the ratio of odds of smiling at people she likes to smiles in general, multiply by the percentage of people she likes, and there is your answer.
The more she smiles in general, the lower the chance she likes you. The more she smiles at people she likes, the better the chance. And of course the more people she likes, the better your chances are.
Of course, how to actually determine these values is a mystery I have never solved.
4. Bayesian Inference
In the above example, we are wanting to see how justified we are in inferring a particular hypothesis (‘she likes me’) based on some particular evidence (‘she smiled at me’) and ‘Bayesian inference’ was used for this.
Generalizing this inference, Bayes theorem can be rearranged to:
P(H|E) ∝ P(E|H).P(H)
which is interpreted by the Bayesian (for whom probability represents knowledge) as:
posterior ← likelihood . prior
- We start with a ‘prior probability’ degree of belief in a particular hypothesis, P(H).
- New evidence, E, is presented.
- We then calculate the new ‘posterior probability’, P(H|E), which is the degree of belief for the hypothesis H after taking into account the evidence E for and against that hypothesis H. This new degree of belief can be more or less than it was before, depending on the evidence.
- The‘conditional probability’ or ‘likelihood’, P(E|H), is the degree of belief in the evidence E, given that the hypothesis H is true.
- P(E) is irrespective of the hypothesis and so can be ignored (the ‘=’ changes to a proportional-to ‘∝’).
5. The Curious Incident of the Dog in the Night-Time
In the novel ‘The Curious Incident of the Dog in the Night-Time’, Christopher, the 15-year-old narrator, tells of the events following his discovery of a neighbour’s dog having been killed. Christopher has Asperger’s syndrome and is very mathematically-minded, which is apparent in his account. One mathematical excursion is into the ‘Monty Hall problem’ which he describes as follows:
You are on a game show on television. On this game show the idea is to win a car as a prize. The game show host shoes you three doors. He says that there is a car behind one of the doors and there are goats behind the other two doors. He asks you to pick a door. You pick a door but the door is not opened. The the game show host opens one of the doors you didn’t pick to show a goat (because he knows what is behind the doors). Then he says you have one final chance to change you mind before the doors are opened and you can get a car or goat. So he asks you if you want to change your mind and pick the other unopened door instead. What should you do?
(Note: Monty Hall was a game show host).
So, what do you think?
6. The ‘Non-Mathematical’ Solution
Most people think it does not matter whether they stick or switch. But they would be wrong – it is actually better to switch.
In ‘The Curious Incident…’, Christopher provides two explanations of why. The first is Bayesian and I’ll come back to that later. But…
The second way you can work it out is by making a picture of all the possible outcomes like this
So, if you change, 2 times out of 3 you get a car. And if you stick, you only get a car 1 time out of 3.
And this shows that intuition can sometimes get things wrong. And intuition is what people use in life to make decisions. But logic can help you work out the right answer.
I think this decision tree approach (rather than the Bayesian way) is how most people consciously reconcile themselves to the solution. But the Monty Hall problem is not a problem because the solution is difficult to prove or anything like that. It is that people’s intuition doesn’t work here. Why is that?
7. Nothing Changes if You Stick
For the Bayesian solution, I will adopt Christopher’s notation:
Firstly you can do it by maths like this
Let the doors be called X, Y and Z
Let CX be the event that the car is behind door X and so on.
Let HX be the event that the host opens door X and so on.
The mapping of names to doors is arbitrary. Consistent with Christopher, let us just say that you choose door X and then Monty will open either door Y or door Z.
The normal approach from here on is to look at the probability of winning if you switch. I’ll come to that later on but first I want to look at things the other way around – to look at the probability of winning if you stick with door X. That’s because, as well as trying to provide an explanation that is as clear as possible, I also want it to help explain why people’s intuition is wrong.
So, consider the probability of winning the car if you stick with the door you first chose. Firstly in mathematical parlance:
P(win the car if you stick)
= P(HY ^ CX) + P(HZ ^ CX)
= P(CY).H(HZ | CX) + P(CZ).P(HY | CX)
= (1/3 .p) + (1/3 .(1-p))
And in a more wordy form:
P(winning the car if you stick)
= P(car is behind door X after the host opened door Y)
+ P(car is behind door X after the host opened door Z)
= P(car is behind door X) . P(host opens door Y if the car is behind door X)
+ P(car is behind door X) . P(host opens door Z if the car is behind door X)
= P(car is behind door X) . PK
PK = (P(host opens door Y if the car is behind door X)
+ P(host opens door Z if the car is behind door X))
Now, Monty has to choose between opening door Y or door Z. He is likely to choose with equal probability but his actual choice is irrelevant. In any case:
PK = 1
P(winning the car if you stick) = P(car is behind door X) = 1/3
The posterior probability is the same as the prior probability.
This is a mathematical reason why people intuitively feel that Monty opening one of the other doors doesn’t make any difference:
The new information hasn’t changed the probability of winning with the first-chosen door.
8. Everything Changes if You Switch
But things have changed. With one door opened to reveal a goat, there are two doors to choose from. We want to choose the door with the highest probability of winning. The mistake made is to think that the decision P(CZ) > P(CX) (if door Y was opened) will not have changed if one side of the inequality, P(CX), has not changed. We need to look at the other doors to see this.
Prior to opening any doors:
P(CY) = 1/3; P(CZ) = 1/3.
And after Monty has opened a door – let us say suppose that it is door Y:
P(CY) = 0; P(CZ) = 2/3.
Where previously P(CZ) = P(CX), the new relationship is P(CZ) > P(CX) – it is worth switching.
And if we suppose that Monty opens door Z:
P(CY) = 2/3; P(CZ) = 0
and the new relationship is P(CY) > P(CX) so that it is worth switching.
In all cases, it is worth switching.
More thoroughly, Christopher’s solution in mathematical parlance is:
Supposing that you choose door X, the possibility that you win a car if you then switch your choice is given by the following formula:
P(HZ ^ CY) + P(HY ^ CZ)
= P(CY).P(HZ | CY) + P(CZ).P(HY | CZ)
= (1/3.1) + (1/3.1) = 2/3
And in more wordy form:
P(winning the car if you switch)
= P(car is behind door Y after the host opened door Z)
+ P(car is behind door Z after the host opened door Y)
= P(car is behind door Z) . P(host opens door Y if the car is behind door Z)
+ P(car is behind door Y) . P(host opens door Z if the car is behind door Y)
But Monty will always open door Y if the car is behind door Z and vice-versa. The two probabilities are both 1. Hence:
P(winning the car if you switch)
= P(car is behind door Z) + P(car is behind door Y)
9. Postscript: Are Birds Smarter Than Mathematicians?
Top of the list of distinguished mathematicians who ‘got it wrong’ was the most profilic mathematical genius of the 20th Century, Paul Erdős:
‘No, that is impossible. It should make no difference.’
Even the decision tree approach failed to convince him:
‘You are not telling me why to switch.’
He only accepted that switching really was the better strategy after seeing ‘Monte Carlo’ simulations (repeating the game many times with random responses and counting the wins/losses).
The wonderfully-titled paper ‘Are Birds Smarter Than Mathematicians? Pigeons (Columba livia) Perform Optimally on a Version of the Monty Hall Dilemma’ describes how experiments on pigeons and humans repeatedly ‘playing’ the Monty Hall game showed that the former quickly adapted their strategy to switching but that the humans clung onto their intuitions. The authors say of Erdős:
‘Until he was able to approach the problem like a pigeon—via empirical probability—he was unable to embrace the optimal solution.’