Professor Leighton Vaughan Williams

March 14, 2017

Shakespeare’s Othello: A Bayesian Puzzler

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

The majestic tragedy, Othello, was written by William Shakespeare in about 1603. The play revolves around four central characters: Othello, a Moor who is a general in the Venetian army; his beloved wife, Desdemona; his loyal lieutenant, Cassio; and his trusted ensign, Iago.

A key element of the play is Iago’s plot to convince Othello that Desdemona is conducting an affair with Cassio, by planting a treasured keepsake Othello gave to Desdemona, in Cassio’s lodgings, for Othello ‘accidentally’ to come upon.

We playgoers know she is not cheating on him, as does Iago, but Othello, while reluctant to believe it of Desdemona, is also very reluctant to believe that Iago could be making it up.

If Othello refuses to contemplate any possibility of betrayal, then we would have a play in which no amount of evidence, however overwhelming, including finding them together, could ever change his mind. We would have a farce or a comedy instead of a tragedy.

A shrewder Othello would concede that there is at least a possibility that Desdemona is betraying him, however small that chance might be. This means that there does exist some level of evidence, however great it would need to be, that would leave him no alternative. If his prior trust in Desdemona is almost, but not absolutely total, then this would permit of some level of evidence, logically incompatible with her innocence, changing his mind. This might be called ‘Smoking Gun’ evidence.

On the other hand, Othello might adopt a more balanced position, trying to assess the likelihood objectively and without emotion. But how? Should he try and find out the proportion of female Venetians who conduct extra-marital affairs? This would give him the probability for a randomly selected Venetian woman but no more than that. Hardly a convincing approach when surely Desdemona is not just an average Venetian woman. So should he limit the reference class to women who are similar to Desdemona? But what does that mean?

And this is where it is easy for Othello to come unstuck. Because it is so difficult to choose a prior probability (as Bayesians would term it), the temptation is to assume that since it might or might not be true, the likelihood is 50-50. This is known as the ‘Prior Indifference Fallacy’. Once Othello falls victim to this common fallacy, any evidence against Desdemona now becomes devastating. It is the same problem as that facing the defendant in the dock.

Extreme, though not blind, trust is one way to avoid this mistake. But an alternative would be to find evidence that is logically incompatible with Desdemona’s guilt, in effect the opposite of the ‘Smoking Gun.’ The ‘Perfect Alibi’ would fit the bill.

Perhaps Othello would love to find evidence that is logically incompatible with Desdemona conducting an affair with Cassio, but holds her guilty unless he can find it. He needs evidence that admits no True Positives.

Lacking extreme trust and a Perfect Alibi, what else could have saved Desdemona?

To find the answer, we shall turn as usual to Bayes and Bayes’ Theorem. Bayes’ Theorem, otherwise known as the most important equation in the world, solves these sorts of problems very adeptly every time, using the wonderfully simple x,y,z formula.

The (posterior) probability that a hypothesis is true after obtaining new evidence, according to the x,y,z formula of Bayes’ Theorem, is equal to:

xy/[xy=z(1-x)]

x is the prior probability, i.e. the probability that a hypothesis is true before you see the new evidence.

y is the probability you would see the new evidence if the hypothesis is true.

z is the probability you would see the new evidence if the hypothesis is false.

In the case of the Desdemona problem, the hypothesis is that Desdemona is guilty of betraying Othello with Cassio.

Before the new evidence (the finding of the keepsake), let’s say that Othello assigns a chance of 4% to Desdemona being unfaithful.

So x = 0.04

The probability we would see the new evidence (the keepsake in Cassio’s lodgings) if the hypothesis is true (Desdemona and Cassio are conducting an affair) is, say, 50%. There’s quite a good chance she would secretly hand Cassio the keepsake as proof of her love for him and not of Othello.

So y = 0.5

The probability we would see the new evidence (the keepsake in Cassio’s lodgings) if the hypothesis is false is, say, just 5%. Why would it be there if Desdemona had not been to his lodgings secretly, and why would she take the keepsake along in any case?

So z = 0.05

Substituting into Bayes’ equation gives:

0.04 x 0.5 / [0.04 x 0.5 + 0.05 (1 – 0.04)] = 0.294.

So, using Bayes’ Rule, and these estimates, the chance that Desdemona is guilty of betraying Othello is 29.4%, worrying high for the tempestuous Moor but perhaps low enough to prevent tragedy. The power of Bayes here lies in demonstrating to Othello that the finding of the keepsake in the living quarters of Cassio might only have a 1 in 20 chance of being consistent with Desdemona’s innocence, but in the bigger picture, there is a less than a 3 in 10 chance that she actually is culpable.

If this is what Othello concludes, the task of the evil Iago is to lower z in the eyes of Othello by arguing that the true chance of the keepsake ending up with Cassio without a nefarious reason is so astoundingly unlikely as to merit an innocent explanation that 1 in 100 is nearer the mark than 1 in 20. In other words, to convince Othello to lower his estimate of z from 0.05 to 0.01.

The new Bayesian probability of Desdemona’s guilt now becomes:

xy/[xy=z(1-x)]

x = 0.04 (the prior probability of Desdemona’s guilt, as before)

y = 0.5 (as before)

z = 0.01 (down from 0.05)

Substituting into Bayes’ equation gives:

0.04 x 0.5 / [0.04 x 0.5 + 0.01 (1 – 0.04)] = 0.676.

So, if Othello can be convinced that 5% is too high a probability that there is an innocent explanation for the appearance of the Cassio – let’s say he’s persuaded by Iago that the true probability is 1% – then Desdemona’s fate, as that of many a defendant whom a juror thinks has more than a 2 in 3 chance of being guilty, is all but sealed. Her best hope now is to try and convince Othello that the chance of the keepsake being found in Cassio’s place if she were guilty is much lower than 0.5. For example, she could try a common sense argument that there is no way that she would take the keepsake if she were actually having an affair with Cassio, nor be so careless as to leave it behind. In other words, she could argue that the presence of the keepsake where it was found actually provides testimony to her innocence. In Bayesian terms, she should try to reduce Othello’s estimate of y. What level of y would have prevented tragedy? That is another question.

William Shakespeare wrote Othello about a hundred years before the Reverend Thomas Bayes was born. That is true. But to my mind the Bard was always, in every inch of his being, a true Bayesian. Othello was not, and therein lies the tragedy.

Appendix

In the case of the Othello problem, the hypothesis is that Desdemona is guilty of betraying Othello with Cassio. Before the new evidence (the finding of the keepsake), let’s say that Othello assigns a chance of 4% to Desdemona being unfaithful.

So P (H) = 0.04

The probability we would see the new evidence (the keepsake in Cassio’s lodgings) if the hypothesis is true (Desdemona and Cassio are conducting an affair) is, say, 50%.

So P (EIH) = 0.5

The probability we would see the new evidence (the keepsake in Cassio’s lodgings) if the hypothesis is false is, say, just 5%.

So P (EIH’) = 0.05

Substituting into Bayes’ Theorem:

P (HIE) = P (EIH). P (H) / [P (EIH) . P(H) + P (EIH’) . P(H’)]

P (HIE) = 0.5 x 0.04 / [0.5 x 0.04 + 0.05 x 0.96]

P (HIE) = 0.02 / [0.02 + 0.048] = 0.294

Posterior probability = 0.294.

So, using Bayes’ Rule, and these estimates, the chance that Desdemona is guilty of betraying Othello is 29.4%.

If P (EIH’) = 0.01

The new Bayesian probability of Desdemona’s guilt now becomes:

P (HIE) = 0.5 x 0.04 / [0.5 x 0.04 + 0.01 x 0.96]

P (HIE) = 0.02 / (0.02 + 0.0096) = 0.02 / 0.0296 = 0.676

Updated probability = 0.676 = 67.6%.

March 14, 2017

The Bobby Smith Problem: Bayes in Action

Bobby Smith, aged 8, is a good schoolboy footballer, but you know that only one in a thousand such 8-year-olds go on to become professional players. So you would like to get an unbiased assessment of his real chance of developing into a top player. A coach tells you there is a test, taken by all good 8-year-old footballers, that can measure the child’s potential. The test, you learn, is 95% accurate in identifying future professional footballers, and these always receive a grade of A+.

Bobby takes the test and is graded A+.

How many of the 8-year-olds tested, who get an A+, fail to develop into top players, you ask. Now the coach imparts the good news. All current professional players scored A+ when they took the test in their own school days, and we can take it that anyone who scores below that can be ruled out as a future professional player. And the test is 95% accurate, so only 5% of those who get the A+ grade fail to develop into professional footballers. So what is the actual chance that Bobby will become a top player?

If you are like most people, you will think the chance is very high.

This is your reasoning: I don’t really know whether Bobby is likely to turn into a professional player or not. But he has taken this test. In fact, no current professional player scored below A+, and the test only very rarely allocates a top grade to a child who will not become a professional footballer. If the test is really this good, therefore, it looks like Bobby will have a bright future as a football star.

Is this true? Think of it this way. If there were no test, you would have asked the coach a very basic question: in your experience, what is the chance that Bobby will become a professional player? The coach would have dampened your enthusiasm: one in a thousand, he would have said. But with the test result in hand, there’s no need to ask this question. It’s irrelevant in the face of a very accurate test result, isn’t it?

In fact, this is a well-known fallacy, which psychologists call the Inverse Fallacy, or Prosecutor’s Fallacy. The fallacy is to confuse the probability of a hypothesis being true, given some evidence, with the probability of the evidence arising, given the hypothesis is true.

In our example, the hypothesis is that Bobby will become a top player, and the evidence is the high test score. What we want to know is the probability that Bobby will become a top player, given that the test says he will be. What we know, on the other hand, is the probability that the test says Bobby will be a top player, given that he will be. The coach told you this probability, on all available evidence, is 100%: the test is in this sense infallible, in that all professional players score A+ on the test. In answering your other question, the coach also told you the probability of an A+ test score, given that the child will not become a top player, is only 5%. You take this information and conclude that Bobby is very likely to turn into a top player.

In fact, of the thousand children who took the test, only one (statistically speaking) will become a professional footballer. The test is 95% accurate, so 5% of the 1,000 children will score A+ and not become top players, i.e. there will be 50 ‘false positives.’ Anyone who will become a top player, on the other hand, will score A+ on the test.

So what is the chance that Bobby will become a professional footballer if he scores A+ on the test?

Solution: 50 kids who will not become top footballers score A+ (the 50 ‘false positives’). Only one of the one thousand eight-year-olds who take the test develops into a professional player, and that child will score A+. Look at it this way. A thousand 8-year-olds take the test, and of these 50 of them will receive a letter telling them they have scored A+ on the test but will not develop into top players. One child will receive a letter with a score of A+ and actually will go on to become a professional player. Therefore the probability you will become a top footballer if you score A+ is just 1 in 51, i.e. 1.96%.

This is the same idea as the medical ‘false positives’ problem.

In that problem, a thousand people go to the doctor and all are tested for flu. Only one actually has the flu. Those with the flu always test positive. We know that the test for flu is 95% accurate, so 5% of the 1,000 people will test positive and not have the flu, i.e. there will be 50 ‘false positives’. One will test positive who does have the flu. Those with the flu all test positive. So what is the chance that you have the flu if you test positive?

Solution: 50 people who do not have the flu test positive. One person who has the flu tests positive. Therefore, the probability you have the flu if you test positive is 1 in 51, i.e. 1.96%

We can also solve the Bobby Smith problem using Bayes’ Theorem. The (posterior) probability that a hypothesis is true after obtaining new evidence, according to the a,b,c formula of Bayes’ Theorem, is equal to:

ab/[ab+c(1-a)]

a is the prior probability, i.e. the probability that a hypothesis is true before the new evidence. b is the probability of the new evidence if the hypothesis is true. c is the probability you of the new evidence if the hypothesis is false.

In the case of the Bobby Smith problem, the hypothesis is that Bobby will develop into a professional player.

Before the new evidence (the test), this chance is 1 in 1000 (0.001)

So a = 0.001

The probability of the new evidence (the A+ score on the test) if the hypothesis is true (Bobby will become a professional player) is 100%, since all professional players score A+ on the test.

So b =1

The probability we would see the new evidence (the A+ score on the test) if the hypothesis is false (Bobby will not become a professional player) is 5%, since the test is 95% accurate in spotting future professional footballers.

So c = 0.05

Substituting into Bayes’ equation gives:

Posterior probability = ab/[ab+c(1-a)] = 0.001x 1 / [0.001 x 1 + 0.05 (1 – 0.001)] = 0.0196

So, using Bayes’ Theorem, the chance that Bobby Smith, who scored A+ on the test which is 95% accurate, will actually become a top player, is not 95% as intuition might suggest, but just 1.96%, as we have shown previously by a different route.

So there is just a 1.96 per cent chance that Bobby Smith will go on to become a professional player, despite scoring A+ on that very accurate test of player potential.

That’s the statistics, the cold Bayesian logic. Now for the good news. Bobby Smith was the lucky one. He currently plays for Barcelona, under a different name.

Appendix

We can also solve the Bobby Smith problem using the traditional notation version of Bayes’ Theorem.

P (HIE) = P (EIH). P (H) / [P (EIH) . P(H) + P (EIH’) . P(H’)]

Before the new evidence (the test), this chance is 1 in 1000 (0.001)

So P (H) = 0.001

The probability of the new evidence (the A+ score on the test) if the hypothesis is true (Bobby will become a professional player) is 100%, since all professional players score A+ on the test.

So P (EIH) =1

So P (EIH’) = 0.05

Substituting into Bayes’ equation gives:

P (HIE) = 0.001x 1 / [0.001 x 1 + 0.05 (1 – 0.001)] = 0.0196

APPENDIX TO CHAPTER 8

So P (H) = 0.04

The probability we would see the new evidence (the keepsake in Cassio’s lodgings) if the hypothesis is true (Desdemona and Cassio are conducting an affair) is, say, 50%.

So P (EIH) = 0.5

The probability we would see the new evidence (the keepsake in Cassio’s lodgings) if the hypothesis is false is, say, just 5%.

So P (EIH’) = 0.05

Substituting into Bayes’ Theorem:

P (HIE) = P (EIH). P (H) / [P (EIH) . P(H) + P (EIH’) . P(H’)]

P (HIE) = 0.5 x 0.04 / [0.5 x 0.04 + 0.05 x 0.96]

P (HIE) = 0.02 / [0.02 + 0.048] = 0.294

Posterior probability = 0.294.

So, using Bayes’ Rule, and these estimates, the chance that Desdemona is guilty of betraying Othello is 29.4%.

If P (EIH’) = 0.01

The new Bayesian probability of Desdemona’s guilt now becomes:

P (HIE) = 0.5 x 0.04 / [0.5 x 0.04 + 0.01 x 0.96]

P (HIE) = 0.02 / (0.02 + 0.0096) = 0.02 / 0.0296 = 0.676

Updated probability = 0.676 = 67.6%.

March 12, 2017

Bayes’ Theorem: The Most Powerful Equation in the World

How should we change our beliefs about the world when we encounter new data or information? This is one of the most important questions we can ask. A theorem bearing the name of Thomas Bayes, an eighteenth century clergyman, is central to the way we should answer this question.

The original presentation of the Reverend Thomas Bayes’ work, ‘An Essay toward Solving a Problem in the Doctrine of Chances’, was given in 1763, after Bayes’ death, to the Royal Society, by Bayes’ friend and confidant, Richard Price.

In framing Bayes’ work, Price gave the example of a person who emerges into the world and sees the sun rise for the first time. As he has had no opportunity to observe this before (perhaps he has spent his life to that point entombed in a dark cave), he is not able to decide whether this is a typical or unusual occurrence. It might even be a unique event. Every day that he sees the same thing happen, however, the degree of confidence he assigns to this being a permanent aspect of nature increases. His estimate of the probability that the sun will rise again tomorrow as it did yesterday and the day before, and so on, gradually approaches, although never quite reaches, 1.

The Bayesian viewpoint is just like that, the idea that we learn about the universe and everything in it through a process of gradually updating our beliefs, edging incrementally ever closer and closer to the truth as we obtain more data, more information, more evidence.

As such, the perspective of Rev. Bayes on cause and effect is essentially different to that of philosopher David Hume, the logic of whose argument on this issue is contained in ‘An Enquiry Concerning Human Understanding’. According to Hume, we cannot justify our assumptions about the future based on past experience unless there is a law that the future will always resemble the past. No such law exists. Therefore, we have no fundamentally rational support for believing in causation. For Hume, therefore, predicting that the sun will rise again after seeing it rise a hundred times in a row is no more rational than predicting that it will not. Bayes instead sees reason as a practical matter, in which we can apply the laws of probability to the issue of cause and effect.

To Bayes, therefore, rationality is matter of probability, by which you update your predictions based on new evidence, thereby edging closer and closer to the truth. This is called Bayesian reasoning. According to this approach, probability can be seen as a bridge between ignorance and knowledge. The particularly wonderful thing about the world of Bayesian reasoning is that the mathematics of operationalising it are so simple.

Essentially, Bayes’ Theorem is just an algebraic expression with three known variables and one unknown. Yet this simple formula is the foundation stone of that bridge I referred to between ignorance and knowledge.

Bayes’ Theorem is in this way concerned with conditional probability. That is, it tells us the probability, or updates the probability, that a theory or hypothesis is true given that some event has taken place.

To help explain how it works, let us invent a little crime story in which you are a follower of Bayes and you have a friend in a spot of trouble. In this story, you receive a telephone call from your local police station. You are told that your best friend of many years is helping the police investigation into a case of vandalism of a shop window in a street adjoining where you knows she lives. It took place at noon that day, which you know is her day off work.

She next comes to the telephone and tells you she has been charged with smashing the shop window, based on the evidence of a police officer who positively identified her as the culprit. She claims mistaken identity.

You must evaluate the probability that she did commit the offence before deciding how to advise her.

So the condition is that she has been charged with criminal damage; the hypothesis you are interested in evaluating is the probability that she did it.

Bayes’ Theorem helps you answer this type of question.

There are three things you need to estimate.

A Bayesian’s first task is to estimate the probability that the new evidence would have arisen if the hypothesis was true. In this case, you need to estimate the probability of the police officer identifying your friend if your friend actually did break the window.
A Bayesian’s second task is to estimate the probability that the new evidence would have arisen if the hypothesis was false. In this case, you need to estimate the probability of the police officer identifying your friend if your friend did NOT break the window.
You need what Bayesians call a prior probability.

This is the probability you would have assigned to her smashing the shop window before she told you that she had been charged on the basis of the witness evidence. This is not always easy, since the new information might colour the way you assess the prior information, but ideally you should estimate this probability as it would have been before you received the new information.

A practical definition of a Bayesian prior is the odds at which you would be willing to place or offer a bet before the new information is disclosed.

Based on these three probability estimates, Bayes’ Theorem offers you the way to calculate accurately the revised probability you should assign to your friend’s guilt. The wonderful part about it is that the equation is true as a matter of logic. So the result it produces will be as accurate as the values inputted into the equation.

The formula is also so straightforward it can be jotted on the back of your hand. Actually, that’s not such a bad idea for such a powerful tool. Indeed, if you are attracted to tattoos, this is a good an idea for one as any. And it’s as simple as x,y,z.

The formula has xy on the top of the equation and xy+z(1-x) on the bottom.

And that’s it!

Bayes’ rule is:

Probability of hypothesis being true after obtaining new evidence = xy/[xy+z(1-x)]

This is known as the Posterior Probability.

So we have three variables.

x is the prior probability, i.e. the probability you assign to the hypothesis being true before you obtain the new evidence.

y is the probability that the new evidence would have arisen if the hypothesis was true.

z is the probability that the new evidence would have arisen if the hypothesis was false.

So let’s apply Bayes’ Rule to the case of the shattered shop window.

Let’s start with y. This is the probability that the new evidence would have arisen if the hypothesis was true. What is the hypothesis? That your friend broke the window. What is the new evidence? That the police officer has identified your friend as the person who smashes the window. So y is an estimate of the probability that the police officer would have identified your friend if she was indeed guilty.

If she threw the brick, it’s easy to imagine how she came to be identified by the police officer. Still, he wasn’t close enough to catch the culprit at the time, which should be borne in mind. Let’s say that the probability he has identified her and that she is guilty is 80% (0.8).

Let’s move on to z. This is the probability that the new evidence would have arisen if the hypothesis was false. What is the hypothesis again? That your friend broke the window. What is the new evidence again? That the police officer has identified your friend as the person who did it. So z is an estimate of the probability that the police officer would have identified if she was not the guilty party, i.e. a false identification.

If your friend didn’t shatter the window, how likely is the police officer to have wrongly identified her when he saw her in the street later that day? It is possible that he would see someone of similar age and appearance, wearing similar clothes, and jump to the wrong conclusion, or he may just want to identify someone to advance his career. Let us give him credit and say the probability is just 15% (0.15).

Finally, what is x? This is the probability you assign to the hypothesis being true before you obtain the new evidence. In this case, it means the probability you would assign to your friend breaking the shop window before you got the new information from her on the telephone about the evidence of the police officer? Well, you have known her for years, and it is totally out of character, although she does live just a stone’s throw from the shop, and is not at work that day, so she could have done it. Let’s say 5% (0.05). That’s just before you learn from her on the telephone about the witness evidence and the charge. Assigning the prior probability is fraught with problems, however, as awareness of the new information might easily colour the way you assess the prior information. You need to make every effort to estimate this probability as it would have been before you received the new information. You also have to be precise as to the point in the chain of evidence at which you establish the prior probability.

Once we’ve assigned these values, Bayes’ theorem can now be applied to establish a posterior probability. This is the number that we’re interested in. It is the measure of how likely is it that your friend broke the window, given that she’s been identified as the culprit by the police officer.

The calculation and the simple algebraic expression that we have identified is:

xy/[xy+z(1-x)]

where x is the prior probability of the hypothesis (she’s guilty) being true.

where y is the probability the police officer identifies her conditional on the hypothesis being true, i.e. she’s guilty.

where z is probability the police officer identifies her conditional on the hypothesis not being true, i.e. she’s not guilty.

In our example, x = 0.05, y = 0.8, z = 0.15

The rest is simple arithmetic.

xy = 0.05 x 0.8 = 0.04

z(1-x) = 0.15 x 0.95 = 0.1425

xy/xy+z(1-x) = 0.04/(0.04+ 0.1425) = 0.04/0.1825

Posterior probability = 0.219 = 21.9%

The most interesting takeaway from this is the relatively low probability you should assign to the guilt of your friend even though you were 80% sure that the police officer would get it right if she was guilty, and the small 15% chance you assigned that he would falsely identify her. The clue to the intuitive discrepancy is in the prior probability (or ‘prior’) you would have attached to the guilt of your friend before you were met face to face with the evidence of the police officer. If a new piece of evidence now emerges (say a second witness), you should again apply Bayes’ Theorem to update to a new posterior probability, gradually converging, based on more and more pieces of evidence, ever nearer to the truth.

It is, of course, all too easy to dismiss the implications of this hypothetical case on the grounds that it was just too difficult to assign reasonable probabilities to the variables. But that is what we do implicitly when we don’t assign numbers. Bayes’ rule is not at fault for this in any case. It will always correctly update the probability of a hypothesis being true whenever new evidence is identified, based on the estimated probabilities. In some cases, such as the crime case illustrated here, that is not easy, though the approach you adopt to revising your estimate will always be better than using intuition to steer a path to the truth.

In many other cases, we do know with precision what the key probabilities are, and in those cases we can use Bayes’ Rule to identify with precision the revised probability based on the new evidence, often with startlingly counter-intuitive results. In seeking to steer the path from ignorance to knowledge, the application of Bayes’ Theorem is always the correct method.

Thanks to Bayes, the path to the truth really is as easy as x,y,z. What remains is the wit and will to apply it.

Further Reading and Links

https://selectnetworks.net

The most important idea in probability. Truth and justice depend on us getting it right. https://leightonvw.com/2014/12/13/this-is-probably-the-most-important-idea-in-probability-truth-and-justice-depends-on-us-getting-it-right/

A Visual Guide to Bayesian Thinking. YouTube. https://youtu.be/BrK7X_XlGB8

Bayes’ Theorem and Conditional Probabilities https://brilliant.org/wiki/bayes-theorem/

March 5, 2017

The Monty Hall Problem

The Monty Hall Problem is a famous, perhaps the most famous, probability puzzle ever to have been posed. It is based on an American game show, Let’s Make a Deal, first hosted by Monty Hall. It came to public prominence as a question quoted in a column penned by mega-intellect Marilyn Vos Savant, in Parade magazine in 1990. The question itself is quite straightforward.

‘Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car: behind the others, goats. You pick a door, say No.1, and the host, who knows what’s behind al the doors, opens another door, say No. 3, which reveals a goat. He then says to you, “Do you want to switch to door No. 2?” This is not a strategic decision on his part based on knowing that you chose the car, in that he always opens one of the doors concealing a goat and offers the contestant the chance to switch. It is part of the rules of the game.

So should you switch doors?

Consider the probability that you chose the correct door the first time, i.e. No 1 is the door to a car. What is that probability? Well, clearly it is 1/3 in that you have three doors to choose from, all equally likely.

But what happens to the probability that Door No. 1 is the key to the car once Monty has opened one of the other doors?

This again seems quite straightforward. There are now two doors left unopened, and there is no way to tell behind which of these two doors lies the car. So the probability that Door 1 offers the star prize now that Door 2 (or else Door 3) has been opened would seem to be 1/2. So should you switch? Since the two remaining doors would seem to be equally likely paths to the car, it would seem to make no difference whether you stick with your original choice of Door 1 or switch to the only other door that is unopened.

But is this so?

Let’s think it through.

When you choose Door 1, there is a 1 in 3 chance that you have won your way to the car if you stick with it. There is a 2 in 3 chance that Door 1 leads to a goat.

On the other hand, if you have chosen Door 1, and it is the lucky door, the host is forced to open one of the two doors concealing a goat. He knows that. You know that. So he is introducing useful new information into the game.

Before he opened a door, there was a 2 in 3 chance that the lucky door was EITHER Door 2 or Door 3 (as there was a 1 in 3 chance it was Door 1). Now he is telling you that there is a 2 in 3 chance that the lucky door is EITHER Door 2 or Door 3 BUT it is not the door he just opened. So there is a 2 in 3 chance that it is the door he didn’t open. So, if he opened Door 2, there is a 2 in 3 chance that Door 3 leads to the car. Likewise, if he opened Door 3, it is a 2 in 3 chance that Door 2 leads to the car. Either way, you are doubling your chance of winning the car by switching from Door 1 (probability of car = 1/3) to whichever of the other doors he does not open (probability of car = 2/3).

It is because the host knows what is behind the doors that his actions, which are constrained by the fact that he can’t open the door to the car, that he introduces valuable new information. Because he can’t open the door to the car, he is forced to point to a door that isn’t concealing the car, increasing the probability that the door he doesn’t open is the lucky one (from 1/3 to 2/3).

If this is not intuitively clear, there is a way of making it more so. Let’s say there were 20 doors, with a car behind one of them and goats behind 19 of them. Now say we choose Door 1. This means that the probability that this is the winning door is 1 in 20. There is a 19 in 20 probability that one of the other doors conceals the car. Now Monty starts opening one door at a time, taking care not to reveal the car each time. After opening a carefully chosen 18 doors (chosen because they didn’t conceal a car), just one door remains. This could be the door to the car or your original choice of Door 1 could be the path to the car. But your original choice had an original probability of 1/20 of being the winning door. Nothing has changed that, because every time he opens a door he is sure to avoid opening a door leading to a car. So the chance that the door he leaves unopened points to the car is 19/20. So, by switching, you multiply the probability that you have won the car from 1/20 to 19/20.

If he didn’t know what lay behind the doors, he could inadvertently have opened the door to the car, so when he does so this adds no new information save that he has randomly eliminated one of the doors. If he randomly opens 18 doors, not knowing what is behind them, and two doors now remain, they each offer a 1 in 2 chance of the car. So you might as well just flip a coin – and hope!

Even when it is explained this way, I find that many people find it impossible to grasp the intuition. So here’s the clincher.

Say I have a pack of 52 playing cards, which I lay face down. If you choose the Ace of Spades, you win the car. Every other playing card, you win nothing. Go on, choose one. This is now laid aside from the rest of the deck, still face down. The probability that the card you have chosen is the Ace of Spades is clearly 1/52.

Now I, as the host, know exactly where the Ace of Spades is. There is a 51/52 chance that it must be somewhere in the rest of the deck, and if it is I know where. Now, I carefully turn over the cards in the deck one a time, taking care never to turn over the Ace of Spades, until there is just one card left. What is the chance that the one remaining card from the deck is the Ace of Spades? It is 51/52 because I have carefully sifted out all the losing cards to leave just one card, the Ace of Spaded. In other words, I have presented you with the one card out of the remaining deck of 51 that is the Ace of Spades, assuming that it was not the card you chose in the first place. The chance that the card you chose in the first place was the Ace of Spades is 1/52. So the card I have selected for you out of the remaining deck has a probability of 51/52 of being the Ace of Spades. So should you switch when I offer you the chance to give up your original card for the one that I have filtered out of the remaining 51 cards (taking care each time never to reveal the Ace of Spades). Of course you should. And that’s what you should tell Monty Hall every single time. Switch!

Appendix

In the standard description of the Monty Hall Problem, Monty can open door 1 or door 2 or door 3. The car can be behind door 1, door 2 or door 3. The contestant can choose any door.

We can apply Bayes’ Theorem to solve this.

D1: Monty Hall opens Door 1.

D2: Monty Hall opens Door 2.

D3: Monty Hall opens Door 3.

C1: The car is behind Door 1.

C2: The car is behind Door 2.

C3: The car is behind Door 3.

The prior probability of Monty Hall finding a car behind any particular door is P(C#) = 1/3,

where P(C1) = P (C2) = P(C3).

Assume the contestant chooses Door 1 and Monty Hall randomly opens one of the two doors he knows the car is not behind.

The probability that he will open Door 3 is 1/2 and the conditional probabilities given the door being behind either Door 1 or Door 2 or Door 3 are as follows.

P(D3 I C1) = 1/2 … as he is free to open Door 2 or Door 3, as he knows the car is behind the contestant’s chosen door, Door 1. He does so randomly.

P(D3 I C3) = 0 … as he cannot open a door that a car is behind (Door 3) or the contestant’s chosen door, so he must choose Door 2.

P (D3 I C2) = 1 … as he cannot open a door that a car is behind (Door 2) or the contestant’s chosen door (Door 1).

So, P(C1 I D3) = P(D3 I C1). P(C1) / P(D3) = 1/2 x 1/3 / 1/2 = 1/3

Therefore, there is a 1/3 chance that the car is behind the door originally chosen by the contestant (Door 1) when Monty opens Door 3.

But P(C2 I D3) = P(D3 I C2).P(C2) / P (D3) = 1 x 1/3 / 1/2 = 2/3

Therefore, there is twice the chance of the contestant winning the car by switching doors after Monty Hall has opened a door.

In the standard description of the Monty Hall Problem, Monty can open door 1 or door 2 or door 3. The car can be behind door 1, door 2 or door 3. The contestant can choose any door.

We can apply Bayes’ Theorem to solve this.

D1: Monty Hall opens Door 1.

D2: Monty Hall opens Door 2.

D3: Monty Hall opens Door 3.

C1: The car is behind Door 1.

C2: The car is behind Door 2.

C3: The car is behind Door 3.

The prior probability of Monty Hall finding a car behind any particular door is P(C#) = 1/3,

where P(C1) = P (C2) = P(C3).

Assume the contestant chooses Door 1 and Monty Hall randomly opens one of the two doors he knows the car is not behind.

The conditional probabilities given the car being behind either Door 1 or Door 2 or Door 3 are as follows.

P(D3 I C1) = 1/2 … as he is free to open Door 2 or Door 3, as he knows the car is behind the contestant’s chosen door, Door 1. He does so randomly.

P(D3 I C3) = 0 … as he cannot open a door that a car is behind (Door 3) or the contestant’s chosen door, so he must choose Door 2.

P (D3 I C2) = 1 … as he cannot open a door that a car is behind (Door 2) or the contestant’s chosen door (Door 1).

These are equally probable, so the probability he will open D3, i.e. P(D3) = ½ + 0 + 1 / 3 = 1/2

So, P (C1 I D3) = P(D3 I C1). P(C1) / P(D3) = 1/2 x 1/3 / 1/2 = 1/3

Therefore, there is a 1/3 chance that the car is behind the door originally chosen by the contestant (Door 1) when Monty opens Door 3.

But P (C2 I D3) = P(D3 I C2).P(C2) / P (D3) = 1 x 1/3 / 1/2 = 2/3

Therefore, there is twice the chance of the contestant winning the car by switching doors after Monty Hall has opened a door.

Further Reading and Links

https://selectnetworks.net

Related blog post on leightonvw.com The Deadly Doors Problem: Monty Hall Plus. https://leightonvw.com/2014/11/27/the-four-doors-problem/

Related blog post on leightonvw.com Open the Box or Take the Money https://leightonvw.com/2011/11/25/open-the-box-or-take-the-money/

Wikipedia on the Monty Hall Problem. https://en.wikipedia.org/wiki/Monty_Hall_problem

The Monty Hall Problem. http://www.montyhallproblem.com/

Understanding the Monty Hall Problem. https://betterexplained.com/articles/understanding-the-monty-hall-problem/

The Monty Hall Problem. http://mathforum.org/dr.math/faq/faq.monty.hall.html

The Official Let’s Make a Deal Website: The Monty Hall Problem. http://www.letsmakeadeal.com/problem.htm

Probability and the Monty Hall Problem. Khan Academy. https://www.khanacademy.org/math/precalculus/prob-comb/dependent-events-precalc/v/monty-hall-problem

The Monty Hall Problem. Numberphile. YouTube. https://www.youtube.com/watch?v=4Lb-6rxZxx0

The Monty Hall Problem. YouTube. https://www.youtube.com/watch?v=mhlc7peGlGg

Testing out the Monty Hall Problem. YouTube. https://www.youtube.com/watch?v=o_djTy3G0pg

Monty Hall Problem. Singing Banana. YouTube. https://www.youtube.com/watch?v=njqrSvGz8Ps

Monty Hall II: Revenge of Monty Hall. Singing Banana. YouTube. https://www.youtube.com/watch?v=fYPXYzymUqI

Bayes’ Theorem and Conditional Probabilities. https://brilliant.org/wiki/bayes-theorem/

March 4, 2017

The Sleeping Beauty Problem

Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: On Sunday she will be put to sleep. Once or twice during the experiment, Beauty will be awakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes her forget that awakening.

A fair coin will be tossed on Sunday evening after she is put to sleep, to determine which experimental procedure to undertake: if the coin comes up heads, Beauty will be awakened and interviewed on Monday only. If the coin comes up tails, she will be awakened and interviewed on Monday and Tuesday. In either case, she will be awakened on Wednesday without interview and the experiment ends.

Any time Sleeping Beauty is awakened and interviewed, she is asked, “What is your belief now, as a percentage, in the proposition that the coin landed heads?”

What should Beauty’s answer be?

To one way of thinking about this, the answer is clear. The coin was tossed once prior to her awakening, however many times she is woken, whether once (if it landed heads) or twice (if it landed tails).

Since the fair coin was tossed just once, and no further information is obtained by Beauty at the time she is awoken and interviewed, the answer she should give should be 50 per cent, i.e. a 1 in 2 chance that the fair coin landed heads.

To another way of thinking about it, she is interviewed just once if it landed heads (on the Monday) but she is interviewed twice if it landed tails (on Monday and Tuesday). She does not know which day it is when she is woken and interviewed but from her point of view there are three possibilities. These are:

It landed heads and it is Monday.
It landed tails and it is Monday.
It landed tails and it is Tuesday.

So there are three possibilities, of equal likelihood, and two of these involve the coin landing tails and just one for the coin landing heads. So the answer she should give should be 33.3 per cent, i.e. a 1 in 3 chance that the fair coin landed heads.

So which answer is correct? The world of probability is by and large divided into those who are adamant that she should go with ½ (the so-called ‘halfers’) and those who are equally adamant that she should go with 1/3 (the so-called ‘thirders’). Are they both right, are they both wrong, or somewhere in between?

A way that I usually advocate to resolve seemingly intractable probability paradoxes is to ask at what odds Beauty should be willing to place a bet.

So, if in this experiment Beauty is offered odds of 1.5 to 1 that the coin landed heads, should she take those odds? If the correct answer is a half, those odds are attractive as the correct odds should be 1 to 1 (evens). If the correct answer is a third, those odds are unattractive as the correct odds should be 2 to 1.

So what should Beauty do if offered odds of 1.5 to 1? Bet or decline the bet?

The simplest way to resolve this is to ask what would happen if she accepted the odds of 1.5 to 1 and placed a bet of £10 each time. When the coin came up heads, she would be awoken just once, placed the £10 bet and won £15. However, when the coin landed tails she would be awoken twice and placed two bets of £10, i.e. a total of £20 and lost both bets.

So her net outcome of this betting strategy would be a loss of £5.

This suggests that a half is the wrong answer as to the probability that the coin landed heads. At odds of 2 to 1, on the other hand, she would place £10 on the one occasion she would be awoken, i.e. Monday, and would win £20. However, when the coin came up tails, she would lose £10 on the Monday and £10 on the Tuesday, i.e. £20. Her expected outcome would in this case be to break even. This suggests that odds of 2 to 1 are the correct odds, which is consistent with a probability of 1/3. Some ‘Halfers’ argue that Beauty should be assigned a chip of half the value if the coin lands Tails than if it lands Heads, although she will be unaware of the value of the chip when she stakes it. In this case, she would indeed break even by betting at even money odds, but there seems no reasonable case to be made for applying this arbitrary fix to the experiment.

Applying the ‘betting test’ to this problem, therefore, suggests that Beauty’s answer when she is woken up should that there is a 1 in 3 chance that the coin landed heads when tossed after she was put to sleep on the Sunday.

But how can this be right, when the fair coin was tossed just once, and we know that the chance of a fair coin landing heads is ½? If this is the ‘prior probability’ Beauty should assign to the coin landing heads, and she is given no further information about what happened to the coin when she is woken and questioned, on what grounds should the probability she assigns change? The only information she acquires is that she has been woken and questioned, but she knew that would happen in advance, so this is not new information. Given she assigns a prior probability of ½ to the coin coming up heads, and she acquires no new information, it is perhaps difficult to see on what grounds she should change her opinion. The posterior probability she assigns (after she acquires all new information) should be identical to the prior probability, because she has acquired no new information after being put to sleep to change anything.

This is the kernel of the conundrum, and it is why there is a long-standing and ongoing debate between fervent so-called ‘Halfers’ and ‘Thirders.’

So the question is whether there is a correct answer, and that one school of thought is simply wrong, or whether there is no correct answer and both schools of thought are wrong or only right under one interpretation of the question.

It seems to me that there is, in fact, a straightforward answer, which resolves the problem. To see this, we need to identify the actual ‘prior probability’ that the coin tossed after Beauty goes to sleep is Heads.

This depends on the question we are seeking to answer, and what information is available to Beauty before she goes to sleep.

If she is simply told that a coin will be tossed after she goes to sleep, and nothing else, then her correct estimate that the fair coin will land on heads is ½. This is the answer to a simple question of how likely a fair coin is to land Heads with no conditions, i.e. the unconditional probability that the coin will land Heads is 1/2.

If she is given the additional information, however, that she will be woken just once if the coin lands Heads but twice if it lands Tails (albeit she will remember just one of the awakenings), then we are posing a very different question.

The new question she is being asked to answer is to estimate the probability that whenever she awakens, that her awakening resulted from the coin toss landing Heads. Since she has just one awakening when the coin lands Heads, but two awakenings when it lands Tails, the probability that any particular awakening occurred from a Heads flip is 1/3, i.e. the conditional probability that the coin landed Heads given any particular awakening is 1/3.

By extension, if she is told she will be woken 1,000 times if the coin lands Tails but only once if the coin lands Heads, then her correct estimate of the probability that any particular awakening resulted from the coin landing Heads is 1/1001.

So the ‘prior probability’ Beauty should assign to the chance of a coin landing Heads after any particular awakening is actually 1/3 within the terms of the experiment, even before she goes to sleep. It is true that she has access to no new information whenever she awakens, but that simply means that her ‘prior probability’ of being awakened by a Heads flip remains at 1/3 after she is woken. This is totally consistent with Bayesian reasoning which states the prior probability of an event will not change unless there is new information.

Given, therefore, that she assigns a prior probability of 1/3 to any particular awakening arising from a Heads flip, this should be the answer she gives whenever she awakens, and also before she goes to sleep.

So the paradox resolves to the question Beauty is being asked to answer. What is the probability that a fair coin will land Heads? Answer = ½. What is the probability that whenever she is woken this awakening has resulted from a Heads flip? Answer = 1/3. She is consistent in these answers both before she goes to sleep and whenever she wakes. In other words, because Beauty knows that she will correctly answer 1/3 whenever she is woken, given the rules of the experiment, of which she is aware, she will answer 1/3 before she goes to sleep.

The resolution of the Sleeping Beauty Problem has implications for the so-called ‘anthropic principle’ more generally.

The ‘anthropic principle’ is the consideration that theories of the universe are constrained by the necessity to allow human existence, because our existence as conscious observers of the universe is a given. So any theory or model of the universe must have our existence as at least one possibility.

The simplest state of affairs would be a situation in which nothing had ever existed. This would also be the least arbitrary, and certainly the easiest to understand. Indeed, if nothing had ever existed, there would have been nothing to be explained. Most critically, it would solve the mystery of how things could exist without their existence having some cause. In particular, while it is not possible to propose a causal explanation of why the whole Universe exists, if nothing had ever existed, that state of affairs would not have needed to be caused. This is not helpful to us, though, as we know that in fact this Universe does exist.

In fact, we are faced with the fact that the positive and negative contributions to the cosmological constant cancel to 120 digit accuracy, yet fail to cancel beginning at the 121st digit. In fact, the cosmological constant must be zero to within one part in roughly 10¹²⁰ (and yet be nonzero), or else the universe either would have dispersed too fast for stars and galaxies to have formed, or else would have collapsed upon itself long ago. How likely is this by chance? Essentially, it is the equivalent of tossing a coin and needing to get heads 400 times in a row and achieving it. Now, that’s just one constant that needs to be just right for galaxies and stars and planets and life to exist. There are quite a few, independent of this, which have to be equally just right, but this I think sets the stage. This is sometimes called the fine-tuning argument.

The parallel with the Sleeping Beauty Problem is that Beauty knows she has been awakened and so any explanation of this must have that awakening as at least one possibility, just as any theory of the Universe must have our conscious state as one possibility.

In terms of modelling the Universe, we might pose two possible theories. In one, all the physical constants we observe today are explained. They were designed that way or they have to be that way for some unknown reason. The second theory is that there could have been countless trillions of different ways that the physical constants could have arranged themselves, and only one of these is consistent with the Universe (and us) existing.

For simplicity of exposition, let us assume that the two theories are otherwise equal in terms of empirical evidence, scientific rigour, and so on, but the general point stands whatever.

In other words, from the perspective of an observer outside the Universe, these theories would be equally likely. Heads or Tails. ½.

But we as conscious observers of our existence are like Sleeping Beauty when she wakes. From our perspective, there is only one chance in countless trillions that we would be asking the question if the second theory is correct, which means from our ‘anthropic’ perspective the chance that the first theory is correct (the constants were designed that way or have to be that way) is trillions of times more plausible.

This has, of course, very important scientific, philosophical and theological implications, which demonstrates the power and importance of the Sleeping Beauty Problem as more than just a simple mind-bender.

Let us within this context now tackle the criticism of those who reject the larger importance of this vanishingly small possibility of the physical constants being randomly trillions to one in our favour on the grounds that if it wasn’t so, we would not have been around to even ask the question. This take on the ‘anthropic principle’ sounds a clever point but in fact it is not. For example it would be absolutely bewildering how I could have survived a fall out of an aeroplane from 39,000 feet onto tarmac without a parachute, but it would still be a question very much in need of an answer. To say that I couldn’t have posed the question if I hadn’t survived the fall is no answer at all.

Others propose the argument that since there must be some initial conditions, these conditions which gave rise to the Universe and life within it possible were just as likely to prevail as any others, so there is no puzzle to be explained.

But this is like saying that there are two people, Jack and Jill, who are arguing over whether Jill can control whether a fair coin lands heads or tails. Jack challenges Jill to toss the coin 400 times. He says he will be convinced of Jill’s amazing skill if she can toss heads followed by tails 200 times in a row, and she proceeds to do so. Jack could now argue that a head was equally likely as a tail on every single toss of the coin, so this sequence of heads and tails was, in retrospect, just as likely as any other outcome. But clearly that would be a very poor explanation of the pattern that just occurred. That particular pattern was clearly not produced by coincidence. Yet it’s the same argument as saying that it is just as likely that the initial conditions were just right to produce the Universe and life to exist as that any of the other pattern of billions of initial conditions that would not have done so. There may be a reason for the pattern that was produced, but it needs a more profound explanation than proposing that it was just coincidence.

A second example. There is one lottery draw, devised by an alien civilisation. The lottery balls, numbered from 1 to 49, are to be drawn, and the only way that we will escape destruction, we are told, is if the first 49 balls out of the drum emerge as 1 to 49 in sequence. The numbers duly come out in that exact sequence. Now that outcome is no less likely than any other particular sequence, so if it came out that way a sceptic could claim that we were just lucky. That would clearly be nonsensical. A much more reasonable and sensible conclusion, of course, is that the aliens had rigged the draw to allow us to survive, or else that the draw had to be that way because no other possible sequence of balls could physically emerge.

So the answer to the Sleeping Beauty Problem is 1/3 that she is in the Heads world if she is awakened once when the coin lands Heads and twice when it lands Tails. If awakened a million times in the Tails world but just once in the Heads world, the chance she awakes to a Heads world is 1 in a million and 1. The bigger question for humanity is what world we exist in, Heads (we have to exist) or Tails (there is effectively no chance that we exist). I call that the Possibility Problem, and it is a problem which would seem to have a probabilistic solution.

Appendix

Using Bayes’ Theorem:

P (Heads I Wake up) = P (Wake up I Heads) . P (Heads) / P (Wake up)

If you adopt the Self-Sampling Assumption (SSA), you sample a person from within that world at random.

So, P (Heads I Wake up) = 1 . 1/2 / 1 = 1/2

If you adopt the Self-Indication Assumption, you take into account that you are more likely to exist in a world with more beings (or opportunities to experience) than in one with less. In this case, there are twice as many opportunities to experience waking up if the coin lands Tails than if it landed Heads.

So, P (Heads I Wake up) = 1 . 1/3 / 1 = 1/3

Further Reading and Links

https://selectnetworks.net

Sections of this blog relating to the ‘anthropic principle’ and ‘fine-tuning’ have appeared in my related blog, ‘Why is there Something Rather than Nothing?’ Link at: https://leightonvw.com/2015/08/03/why-is-there-something-rather-than-nothing/

Bayes’ Theorem: The Most Powerful Equation in the World. Related blog. https://leightonvw.com/2017/03/12/bayes-theorem-the-most-powerful-equation-in-the-world/

Wikipedia entry on the Sleeping Beauty Problem https://en.wikipedia.org/wiki/Sleeping_Beauty_problem

The Sleeping Beauty Problem. By Julia Galef. YouTube.

https://youtu.be/zL52lG6aNIY

Philosophy- Epistemology. The Sleeping Beauty Problem. By Michael Campbell. YouTube.

https://www.youtube.com/watch?v=5Cqbf86jTro

Probably Overthinking It: A Blog by Allen Downey

http://allendowney.blogspot.co.uk/2015/06/the-sleeping-beauty-problem.html

Wikipedia entry on the Anthropic Principle

https://en.wikipedia.org/wiki/Anthropic_principle

Wikipedia entry on Fine-Tuned Universe

https://en.wikipedia.org/wiki/Fine-tuned_Universe

Blog entry on The Vaughan Williams ‘Possibility Theorem’ and related applications.

https://wordpress.com/post/leightonvw.com/445

Derek Parfit, ‘Why anything? Why this? Part 1. London Review of Books, 20, 2, 22 January 1998, pp. 24-27.

https://www.lrb.co.uk/v20/n02/derek-parfit/why-anything-why-this

Derek Parfit, ‘Why anything? Why this? Part 2. London Review of Books, 20, 3, 5 February 1998, pp. 22-25.

https://www.lrb.co.uk/v20/n03/derek-parfit/why-anything-why-this

John Horgan, ‘Science will never explain why there’s something rather than nothing’, Scientific American, April 23, 2012.

https://blogs.scientificamerican.com/cross-check/science-will-never-explain-why-theres-something-rather-than-nothing/

http://www.johnpiippo.com/2012/04/krausss-much-ado-about-nothing.html

David Bailey, What is the cosmological constant paradox, and what is its significance? 1 January 2017.

http://www.sciencemeetsreligion.org/physics/cosmo-constant.php

David Albert, ‘On the Origin of Everything’, Sunday Book Review, The New York Times, March 23, 2012.

https://nicolaelogofatu.wordpress.com/2014/04/26/on-the-origin-of-everything/

March 3, 2017

The Girl Named Florida Problem

Suppose that a family has two children. What is the probability that both are girls? Well, this is straightforward because there are four equally likely possibilities (assuming the chances of a boy and a girl are 50-50).

Let us assume that the two children are concealed from view, one behind a red curtain and one behind a yellow curtain.

Put like this, there are four possibilities:

Boy behind both curtains.
Boy behind red curtain and girl behind yellow curtain.
Girl behind red curtain and boy behind yellow curtain.
Girl behind both curtains.

So the probability that there is a girl behind both curtains = ¼.

This answers the first question. Given the information that a family has two children, the chance that both are girls is 1 in 4.

Now what if we are told that at least one of the children is a girl. This is like saying that there is at least one girl behind the curtains, possibly two.

This eliminates option 1, i.e. a boy behind both curtains, leaving three equally likely possibilities, only one of which is a girl behind both curtains. So the chance that there is a girl behind both curtains given that you know that there is a girl behind at least one curtain is 1 in 3.

This is equivalent to asking the probability that both children are girls if you know that at least one of the children is a girl. The answer is 1 in 3.

Now what if you are told that at least one of the girls has a chin. This adds little or no new information, insofar as presumably all (or the vast majority of) girls have a chin. So if I tell you that a family has two children, at least one of whom is a girl with a chin, it is giving me effectively no new information. So the probability that both children are girls if at least one is a girl is still 1 in 3.

What if instead I tell you that one of the children is a girl called Florida. This is pretty much equivalent to telling you that the family has a daughter behind the red curtain, insofar as it is not just identifying that there is at least one girl in the family, but identifying who or where she is. When now asked the probability that there is a girl behind the yellow curtain, options 1 and 2 (above) disappear, leaving just option 3 (a girl behind the red curtain and a boy behind the yellow curtain) and option 4 (a girl behind both curtains). So the new probability, given the additional information which identifies or locates one particular girl advance is 1 in 2.

In other words, knowing that there is girl behind the red curtain, or else knowing that her name is Florida, is like meeting her in the street with her parents who introduce her. If you know they have another child at home, the chance it is a girl is 1 in 2. By meeting her, you have identified a feature particular to that individual girl, i.e. that she is standing in front of you and not at home (or behind the red curtain, or named Florida and not simply possessed of a chin).

If, on the other hand, you meet a man in the pub who mentions his two children and you find out that at least one of them is a daughter, but nothing more than that, you are back to knowing that there is a girl behind at least one curtain, but not which, i.e. Options 2, 3 and 4 above. In only one of these equally likely options, i.e. Option 4, is there a girl behind both curtains, so the chance of the other child being a girl is 1 in 3.

So does it matter that the daughter has this unusual name? It does. If you know that the man in the pub has two children and at least one daughter, but nothing more, the chance his other child is a girl is 1 in 3. If you find out that the man in the street has two children, and then he tells you that one of children is called Florida, you are left with (to all intents and purposes) just two options. His other child is either a boy or else a girl not called Florida, which is pretty much equivalent to saying his other child is a boy or a girl. So the probability that his other child is a girl is now effectively 1 in 2.

The different information sets can be compared to tossing a coin twice. The possible outcomes are HH, HT, TH, TT. If you already know there is ‘at least’ one head, that leaves HH, HT, TH. The probability that the remaining coin is a Tail is 2 in 3. If, on the other hand, you identify that the coin in your left hand is a Head, the probability that the coin in your right hand is a Head is now 1 in 2. It is because you have pre-identified a unique characteristic of the coin, in this case its location. Identifying the girl as Florida does the same thing. In terms of two coins it is like marking one of the coins with a blue felt tip pen. You now declare that there are two coins in your hands, and one of them contains a Head with a blue mark on it. Such coins are rare, perhaps as rare as girls called Florida. So you are now asked what the chance is that the other coin is Heads (without a blue felt mark). Well, there are two possibilities. The other coin is either Heads (almost surely with no blue felt mark on it) or Tails. So the chance the other coin is Heads is 1 in 2. Without marking one of the coins, to make it unique, the chance of the other coin being Heads is 1 in 3.

Put another way, there are four possibilities without marking one of the coins:

Heads in left hand, Tails in right hand.
Heads in left hand, Tails in right hand.
Heads in both hands.
Tails in both hands.

If you declare that at least one of the coins in your hands is Heads, this means the chance the other is Heads is 1 in 3. This is equivalent to declaring that one of the two children is a girl but saying nothing further. The chance the other child is a girl is 1 in 3.

Now if you identify one of the coins in some unique way, for example by declaring that Heads is in your left hand, the chance that Heads is also in your right hand is 1 in 2, not 1 in 3.

Similarly, declaring that one of the coins is a Heads marked with a blue felt tip pen, the chance that the other coin is Heads, albeit not marked with a blue felt tip, is 1 in 2. Marking the coin with the blue felt tip is like pre-identifying a girl (her name is Florida) as opposed to simply declaring that at least one of the children is some generic girl (for example, a girl with a chin).

In other words, there are four possibilities without identifying either child.

Boy, Boy
Girl, Girl
Boy, Girl
Girl, Boy

If at least one of the children is a girl, Option 1 disappears, and the chance the other child is a girl is 1 in 3.

If you identify one of the children, say a girl whom you name as Florida, it is like marking the Heads with blue felt tip or declaring which hand you are holding the coin in.

Your options now reduce to:

Boy, Boy
Boy, Girl named Florida
Boy, Girl not named Florida
Girl named Florida, Girl not named Florida.

Options 1 and 3 can be discarded, leaving Options 2 and 4. In this scenario, the chance that the other child is a girl (not named Florida) is 1 in 2. By pre-identifying one of the girls, Option 3 disappears, changing the probability that the other child is a girl from 1 in 3 to 1 in 2.

The new information changes everything.

So what is the probability of the family having two girls if you know that one of the two children is a girl, but no more than that? The answer is 1 in 3.

But what is the probability of the family having two girls if one of the two children is a girl named Florida? Armed with this new information, the answer is, to all intents and purposes, 1 in 2.

Another way to look at this is to consider a set of 4,000 families made up of two children. Choose a single unique identifier of each child, say age (it could equally be height or alphabetical order, anything uniquely identifying one child from the other). 1,000 of these will be two boys – say older boy and younger boy (BB), 1,000 will be two girls – older girl and younger girl (GG), 1,000 will be Boy-Girl – older boy, younger girl (BG), 1,000 will be Girl-Boy – older girl, younger boy (GB). If you identify at least one of the children as boy, there remain 3,000 families (1,000 BB, 1,000 BG, 1,000 GB). 2/3 of these families contain a girl, so the probability the other child is a girl is 2/3.

Now, add into the mix the fact that one girl in a thousand in your set of 4,000 families is named Florida, and there are no families with two daughters named Florida.

In this case, 1,000 of these will be two boys – an older boy and a younger boy (BB), 1 will be a older boy and a younger girl named Florida (BF), 1 will be an older girl named Florida and a younger boy (FB), 1 will be an older girl named Florida and a younger girl not named Florida (FG), 1 will be an older girl not named Florida and younger girl named Florida (GF), 999 will be an older boy and a younger girl not named Florida (BG), 999 will be an older girl not named Florida and a younger boy (GB), 998 will be an older girl not named Florida and a younger girl not named Florida (GG). There will be no families with both girls named Florida.

This can be summarised as (given that B is a boy, G is a girl not named Florida, F is a girl named Florida, and the sequence is older-younger):

1,000 BB; 1 BF; 999 BG; 1 FB; 0 FF; 1 FG; 999 GB; 1 GF; 998 GG).

Given that at least one child is a girl named Florida, 4 possible pairs remain:

BF; FB; FG; GF.

Of these, 2 contain a girl named Florida:

FG and GF.

So, if we know that out of 4,000 families, one is a child named Florida (and 1 in 1,000 girls is named Florida), then what is the chance that the other child is a girl once you are told that one of the children is a girl named Florida. It is 1/2.

Appendix

The solution to the ‘Girl Named Florida’ problem can be demonstrated using a Bayesian approach.

Let P(GG) = probability of two girls if there are two children. Let G be the probability of at least one girl in the family)

Let P(GG I 2 children) be the probability of two girls given there are two children.

Let P(GG I G) be the probability of two girls GIVEN THAT at least one is a girl.

Then, P(GG I 2 children) = 1/4

P (GG I G) = P (H I GG) . P (GG) / P (G) … by Bayes’ Rule

So P (GG I G) = 1 x 1/4 / (3/4) = 1/3

P (GG I 2 children, older child is a girl)

Now there are only two possibilities, GB and GG (Older girl and younger boy or Older girl and younger girl), so the conditional probability of two girls given the older child is a girl, P (GG I Older child is G) = 1/2.

GIRL NAMED FLORIDA PROBLEM

P(GG I 2 children, at least one being a girl named Florida).

B = 1/2

G = 1/2 – x

GF (Girl named Florida) = x

where x is the % of people who are girls named Florida.

Of families with at least one girl named Florida, there are the following possible combinations, with associated probabilities.

B GF = 1/2 x

GF B = 1/2 x

G GF = x (1/2 – x)

GF G = x (1/2 – x)

GF GF = x^2

Probability of two girls if one is a girl named Florida =

G.GF + GF.G + GF.GF / G.GF + GF.G + GF.GF + B.GF + GF.B

= x (1/2 – x) + x (1/2 – x) + x^2 / [x (1/2-x) + x (1/2-x) + x^2 + x]

= 1/2 x – x^2 + 1/2x – x^2 + x^2 / [1/2x – x^2 + 1/2x – x^2 + x^2 + x]

= x – x^2 / x – x^2 + x = x(1-x) / x (2-x) = 1-x / 2-x

Assuming that Florida is not a common name, x approaches zero and the answer approaches 1/2.

So it turns out that the name of the girl is relevant information.

As x approaches 1/2, the answer converges on 1/3. For example, if we know that at least one child is a girl with a chin, x is close to 1/2 and the problem reduces the standard P (GG I G) problem outlined above, i.e.

P (GG I G) = P (H I GG) . P (GG) / P (G) … by Bayes’ Theorem

So P (GG I G) = 1 x 1/4 / (3/4) = 1/3

Further Reading and Links

https://selectnetworks.net

November 17, 2016

The Madness of Crowds, Polls and Experts

Since records began in 1868, no clear favourite for the White House has lost, except in the case of the 1948 election, when 8 to 1 longshot Harry Truman defeated his Republican rival, Thomas Dewey.

We can now add 2016 to that list, thanks to Donald Trump, who has beaten 1 to 5 favourite, Hillary Clinton, to take the presidency. In so doing, he also defied the polls, the experts and the wisdom of crowds.

I have been tracking various forecasting methodologies and prognosticators over the past few months, right up to election day, and can confirm that the rout of conventional wisdom was almost total.

Odds on

On the morning of the election, the best price available about Hillary Clinton was 2 to 7, equal to an implied win probability of about 78%. The spread betting markets made her a little over an 80% favourite, and gave her a head start over Trump of more than 80 electoral votes. The PredictIt prediction market assigned her a 79% chance of victory, and estimated her likely advantage as 323 electoral votes to 215 for Trump. Meanwhile, the Predictwise crowd wisdom platform assessed her chance of winning at a solid 89%, compared to 75% by the Hypermind prediction site.

The polling aggregation services fared no better. The RealClearPolitics and HuffPost Pollster polling averages gave Hillary Clinton a lead of between 3% and 6%. The FiveThirtyEight platform, which removes bias from polls based on their previous performance, gave her a popular vote lead on the day of 3.6% and an electoral vote advantage of 67 over Trump. Her chance of winning was assessed as 71.9% based on this polling.

Perhaps the biggest failure of the night, however, was Sam Wang’s Princeton Election Consortium, which gave Clinton more than a 99% chance of victory. Still, it must be said that his topline figures (an electoral college advantage of 307 to 231 for Clinton, and 2.5% in the popular vote) were less far off than a number of the other forecasting methodologies.

The New York Times Upshot elections model, which bases its estimates on state and national polls, gave Clinton a 84% chance of victory, which they helpfully compared to the chance of an NFL kicker making a 38-yard field goal. About 16% of the time they miss. That was the same chance as Hillary Clinton losing, they suggested.

Talking heads

Expert opinion was also woefully off. One of the most high-profile providers of expert political opinion is the Sabato Crystal Ball, run by Larry Sabato of the University of Virginia’s Center for Politics. This service has a very good track record. Yet, in line with the polls and the markets, the Crystal Ball got it badly wrong this time. Its final prediction was a win for Hillary Clinton by 322 electoral votes to 216.

It is the PollyVote election forecasting service which provides perhaps the most broad-based expert opinion survey, however, calling on its own panel of political experts to periodically update its forecast of the likely two-way vote share of the main candidates. The final expert panel survey, conducted on the eve of the election, put Clinton 4.4% up over Trump (52.2% to 47.8%).

In attempting to estimate the final vote share tallies of the candidates, PollyVote provides not just the estimates of experts, but also evidence gathered from a range of other methodologies, including prediction markets, poll aggregators, econometric models, citizen forecasts and index models. The idea is that aggregating and combining the wisdom of each and taking an average should provide a better estimate than any in isolation. It is a methodology which has served well over the past three election cycles.

This time the methodology broke down as badly as any of the main forecasting methodologies in isolation. Taking them in turn, the prediction market indicator (based on the trading in the Iowa electronic markets) gave Hillary Clinton a lead of 54.6% to 45.4%. Using data from RealClearPolitics and HuffPost Pollster to construct its poll aggregation metric, it gave the lead to Clinton by 52% to 48%.

PollyVote also highlights the various econometric forecasting models available, which typically use variables such as growth, unemployment, incumbency, and so on, to provide an aggregated estimate. That estimate was, this time, quite successful, giving Clinton the advantage in the popular vote of 50.2% to 49.8%. Winning the popular vote is, however, not the same thing as winning the electoral college, as Democrats in particular have learned in recent years.

The final two methodologies used to make up the PollyVote forecast are index models, which use information about the candidates, and citizen forecasts, which ask people whom they expect to win. The index models this time gave Clinton the edge over trump by 53.5% to 46.5%, and the citizen forecasts by 52.2% to 47.8%. Combining all these methodologies together produced an estimated advantage for Clinton over Trump of 52.5% to 47.5%.

The bottom line, therefore, is that most of the tried and tested forecasting methodologies failed this time. Election 2016 truly demonstrated, on a grand scale, the madness of crowds, polls and experts.

Further Reading and Links

https://selectnetworks.net

May 9, 2016

Donald Trump has won the battle: will he win the war?

Donald Trump has been declared the Republican Party’s nominee for the presidency of the United States – and for once, not only by himself. This victory defies all the laws of political gravity.

The traditional Republican way is to elect the establishment’s chosen candidate, generally someone who has served the party faithfully and well – and preferably someone plausibly electable against the Democrats’ standard bearer. The nominee is expected to stick to mainstream conservative principles and to be broadly acceptable to those pulling the strings at Fox News.

Trump fails all these tests. And with his signature blend of populism, provocation and spectacle, he has driven the party into a schism, pitting conservative against conservative.

In the immediate wake of the Indiana result the audience of Fox news was treated to a downcast debate between the network’s two principal conservative voices, Bill O’Reilly and Charles Krauthammer. While O’Reilly tried to defend Trump as a misunderstood populist hero, Krauthammer declared himself implacably opposed to a man he declared was not a true conservative and who could not be trusted to defend conservative values, far less be entrusted with the nuclear codes.
The party shows no sign of being ready to unite behind Trump. The Hill, an influential political newspaper published in Washington DC, has even provided a list of Republicans who have declared on the record that they simply will not back him. The list is long, and includes some very influential conservative names.

These horrified “NeverTrumpers”, who’ve been pushing their own #NeverTrump hashtag, are all too aware that nominating “The Donald” would not only betray the party’s core principles, but possibly doom the GOP to electoral catastrophe. Disgusted conservatives might well decline to vote at all. That would contaminate Republican candidates across the country; the party would probably lose control of the Senate, and perhaps even of the House of Representatives.

So what exactly are Trump’s chances against Hillary Clinton? The Real Clear Politics average of the most recent half dozen polls has Clinton leading Trump by an average of 6.5% in a hypothetical (and now very likely) match-up.

Take out the poll by the Rasmussen firm, which has a very chequered history – not least projecting a Mitt Romney victory on the eve of the 2012 election – and Clinton leads by 8.2%.

The respected Sabato Crystal Ball project at the University of Virginia’s Center for Politics offers another perspective. This uses expert judgement on a state-by-state level to assess the likely number of electoral votes that would be won in a match-up between Clinton and Trump.

The best estimate offered, as of today, is a projected 347 votes for Hillary Clinton in the electoral college, with 191 going to Donald Trump. A total of 270 votes is required to win the presidency. By way of comparison, Barack Obama won 332 electoral votes in 2012 to 206 for Mitt Romney.

The betting and prediction markets tell a broadly similar tale.

Finally, let’s look to the PollyVote project, which combines evidence derived from polls, expert judgement and prediction markets, plus a few other indicators, to provide an overall forecast of the likely outcome in November. As of today, the PollyVote predicts the Democrats to obtain 53.3% of the two-party popular vote, compared to 46.7% for the Republicans.

Trump stands today at the top of the Republican tree. He has won the battle. He will find it much harder to win the war.

May 9, 2016

From the 1503 papal conclave to the 2015 Nobels: Forecasting Closed Door Decisions

When the Belarusian writer Svetlana Alexievich won the 2015 Nobel Prize for Literature, it was not unexpected. She was not only the clear favourite with the bookmakers but had traded as one of the leaders in the betting in the previous two years.

While firms lay odds on the literature and peace prizes, there are no betting lines available for the Nobel Prizes in physics, chemistry and medicine. Instead, there is an organised platform which seeks to predict winners based on research citations.

Betting: from Hollywood to the Vatican

2015 was a very good year for favourites in awards contests. The favourite in the betting won almost every single one of the 24 Oscar categories at the Academy Awards. This domination of the favourites has been documented in politics for nearly 150 years, ever since hot favourite Ulysses S Grant strolled to the US presidency in 1868. The favourite in the betting won almost every single presidential election held since, up until 2016.

But the Nobel Prize deliberations are quite different from a political election or even a Hollywood awards ceremony. Instead, they are a little more like a papal conclave, where the deliberations are secretive and there is no defined shortlist of nominees. Betting on papal conclaves has been formally recorded from as early as 1503. In that year, the brokers in the Roman banking houses who offered odds on who would be elected Pope made Cardinal Francesco Piccolomini the clear favourite. It was no surprise, therefore, when he went on to become Pope Pius III.

Since then the betting markets have had a mixed record of success in predicting the winner. For example, Cardinal Ratzinger was a warm favourite to be elected pope in 2005, and duly became Pope Benedict. The election of Cardinal Bergoglio as Pope Francis, on the other hand, came as more of a surprise to the markets.
Betting on processes that take place behind closed doors also happens outside the church. In 2009, crowdsourced fantasy league (or “prediction market”) FantasySCOTUS.net launched an attempt to peer behind the doors of the US Supreme Court, predicting its deliberations – a market still going strong today. The Supreme Court might be particularly suitable for a prediction market, in that not only is there a relatively small number of decision makers, but the universe of possible outcomes is also very limited. Predicting the Nobel Prize announcements might be expected to be somewhat more difficult.

So how do the betting companies compile their odds when it comes to the Nobels? Ladbrokes has said that, in the absence of information, the best way is consulting literary contacts and following relevant online discussions. This is despite the fact that it only takes about ₤50,000 in bets on the Nobel in literature, compared with a couple of million for a big football match.

Patchy record

How well have the markets performed to date? For the Sveriges Riksbank Prize in Economic Sciences, established in 1968 by Sweden’s central bank and considered an unofficial “Nobel”, the most ironic failure of a sort came in 2009, when the betting market offered by Ladbrokes had Eugene Fama, a pioneering exponent of the theory of efficient markets, as the solid 2 to 1 favourite. Assuming the market was truly efficient in respect of all relevant information, we might have expected him to be well up there among the top contenders. But the prize was shared by Elinor Ostrom and Oliver Williamson, both of whom were trading as 50 to 1 longshots before the announcement. Fama did go on to share the Nobel Prize four years later.

On the other hand, Harvard University had already set up its own dedicated economics prize prediction market, which did much better than Ladbrokes by making Oliver Williamson one of the favourites. In 2010, Peter Diamond shared the prize after having been listed as one of the favourites by Harvard.

Of the others in the top eight in 2010, Jean Tirole went on to win in 2014, Robert Shiller and Lars Peter Hansen in 2013. Thomas Sargent and Christopher Sims, who shared the 2011 prize, were among the favourites in the 2008 Harvard prediction market, which has since closed down.

Most of the market based predictions, however, focus on the Nobel Prizes for Literature and Peace. In 2014, French writer Patrick Modiano won the Literature Prize. Before the announcement, Modiano was trading as a reasonably well fancied joint fourth favourite. The previous year, Canadian Alice Munro was heavily backed into second favourite before claiming the prize. In 2011, Tomas Transtomer won the Literature Prize having been clear favourite in 2010.

The peace prize, which is awarded by a committee of five people who are chosen by the parliament of Norway, is slightly more complicated as awards are sometimes given to organisations rather than individuals. This also makes it less satisfying for potential market players. Still, the 2014 Nobel Peace prize was shared by Malala Yousafzai and Kailash Satyarthi. Malala had actually been backed to win in the previous year.
Malala Yousafzai was the bookmaker’s favourite the year before she actually won. There’s a pattern there.

The physics, chemistry and medicine prizes, on the other hand, have not really attracted market attention to date, probably because it is too niche for the regular player. Instead this role has been taken up by Thomson Reuters, which claims to have identified 37 Nobel Prize winners since 2002, on the basis of an analysis of scientific research citations within the Web of Science. As an interesting development, Thomson Reuters has also now established a People’s Choice Poll, more akin to the “wisdom of crowds” methodology of a prediction market. Scientific society Sigma Xi has a prediction contest that enables people to vote for their favourite.

2015 Nobels: the verdict

This outline of the past few years is pretty much par for the course in the history of Nobel predictions. Far from perfect, but not at all unimpressive. Interestingly, the market is often a better predictor of future Nobel laureates than for that particular year.

In 2015, although the market got the Literature Prize spot on, it had not predicted that the Tunisian National Dialogue Quartet would win the peace prize. So well done to those who placed a bet on “none of the above”. It was trading as close second favourite to Angela Merkel on the PredictIt prediction market before the announcement.

Thomson Reuters got the 2015 physics, chemistry and medicine prizes wrong. This year it also highlighted Richard Blundell, John List and Charles Manski as the leading candidates for the economics prize, making special note of the former, who also won their People’s Choice poll. There was no organised betting on economics this year. This year’s economics Nobel went to Angus Deaton. Deaton, currently Dwight D. Eisenhower Professor of Economics and International Affairs at Princeton (formerly of Cambridge and Bristol universities) for his analysis of consumption, poverty and welfare.

So what will the prediction industry look like in ten years? On current trends, it will have grown up a lot. The science of forecasting and the power of prediction markets are currently growing apace. Will there ever come a time, I wonder, when we don’t need to wait for the announcement, but instead just look to the odds? Maybe we should set up a prediction market to answer that question.

Reference:

Leighton Vaughan Williams and David Paton, ‘Forecasting the Outcome of Closed-Door Decisions: Evidence from 500 Years of betting on Papal Conclaves, Journal of Forecasting, 34 (5), August, 2015, pp. 391-404. http://onlinelibrary.wiley.com/doi/10.1002/for.2339/full

March 14, 2016

Beware the Ides of March! US Election Special

The Ides of March, or March 15, has long been associated with doom and destruction. In 44BC, confident populist Julius Caesar ignored a soothsayer’s warning and met his demise at the height of his adulation by an adoring public. It was also the day that Czar Nicholas II in 1917 formally abdicated his throne, and the day that Germany occupied Czechoslovakia in 1939. And now it’s the turn of the Republican Party.

This year’s Ides of March could prove pivotal for the US presidential race, as the primaries roll into five big states: Florida, Ohio, Illinois, North Carolina and Missouri. With firebrand insurgent Donald Trump still denying all the Republicans’ attempts to stop him, the day’s massive delegate haul threatens to put him firmly on the path to the nomination.

Much will depend on what happens in Florida and Ohio, the home states of Florida Senator Marco Rubio and the Governor of Ohio, John Kasich. Kasich has pledged to withdraw from the contest if he loses Ohio, while Rubio has himself said that whoever wins Florida will be the nominee of the Republican Party. If he falls behind, he will be under enormous pressure to bow out.

This confronts Trump’s conservative rival, Ted Cruz, with a fiendish dilemma. He’s won a fair number of states, but to have a decent chance at winning the nomination, Cruz needs Kasich and especially Rubio to drop out. So Cruz wants them to do poorly. But if either or both lose their home state, it’s Trump, not Cruz, who’s most likely to grab their delegates – a hefty 99 in Florida and a chunky 66 in Ohio, all allocated on a winner-take-all basis.

On the other hand, if Rubio somehow rallies to win Florida, he’s very likely to stay in, as is Kasich if he wins Ohio. This puts Cruz and other anti-Trump forces in the awkward position of needing Rubio and Kasich both to trump Trump and to fall short.

The best outcome Cruz can hope for is for Rubio and Kasich to do just enough to win Florida and Ohio respectively, therefore denying Trump the winner-take-all delegates, but to do so badly elsewhere that they drop out anyway. Not impossible, but unlikely.

So where does that leave us?

Splitting the difference

Trump just needs to seize Ohio and Florida to put him in touching distance of the prize, but that’s a big task, especially in Ohio. Illinois and Missouri offer a combined total of 121 delegates. North Carolina’s 72 delegates are in play as well, but those are allocated on a proportional basis, so grabbing the gold isn’t quite as important there.

So if Trump picks up Florida, Ohio and does well in Illinois and/or Missouri, the fight for the Republican nomination could be all but over by Wednesday morning. But that outcome is far from pre-ordained.

Let’s say Trump loses either Ohio or (less likely) Florida, but not both. That puts his chance of clinching a majority of delegates before the convention in jeopardy, maybe Illinois and/or Missouri tipping the scale. But if he loses both Ohio and Florida, he’s extremely unlikely to win a majority of the delegates before the convention in July.

If that’s the case, anything could happen. If it is ultimately not possible to construct a winning coalition of delegates around any of the current four horsemen of the Republican Party’s political apocalypse, the party could even turn outwards, to anoint a different saviour. This would presumably be someone undamaged by the internecine warfare that would have brought the party to that impasse. That would now seem to rule out Mitt Romney, given his recent full-on personal attacks upon Donald Trump. Instead they are more likely to look to a unifier, though they would need to change the convention rules to do so.

They have called upon someone fresh in dire straits before. At the end of 2015, the party could find nobody to replace John Boehner when he suddenly stood down as speaker of the House of Representatives. Then they found someone who at first said he wasn’t interested, but later relented: Paul Ryan, Mitt Romney’s running mate in 2012.

Is this a likely outcome? Not at all. While chatter around a possible Ryan candidacy suddenly spiked as March 15 loomed, a fundraising group formed to “draft” him recently shut down after his aides disavowed its work.

It’s far more likely that Trump will emerge as the Republican nominee, followed by Cruz, then Kasich and Rubio. But if no one can garner a majority of delegates to win the first ballot at the convention, any number of scenarios could play out.

As the betting markets currently see things, by far the most electable against a Democratic opponent in the general election are John Kasich and Marco Rubio. Of these two, Kasich is rated by the markets as much more likely to win the nomination. If he scrapes a win in the Ohio primary and finally starts winning delegates, might he somehow emerge from the pack at a contested convention, perhaps with Rubio or even Cruz in tow as his running mate? We shall see.

Further Reading and Links

https://selectnetworks.net

Splitting the difference

Prof. Leighton Vaughan Williams

Recent Posts

Categories

A+ links

All Conversation articles

All Select Networks

Audio Files

Betting

Betting Taxation

Book Chapters

Books

Centres

Charity

Choice and Reason

Competition Commission

David Henry Morris Williams, C. Eng.

Editorial

Employment

Evidence to UK Parliament

Gambling Commission

HM Revenue and Customs

Memberships and Fellowships

My Adobe Voice

National Audit Office

Other Publications

Papers Online

Personal

Political Forecasting

Press and media

Probability

Profile

Published Papers

Radio Interviews

Select Abstracts

Select Books

Select Broadcasts

Select Clippings

Select Pages

Select Papers

Select Presentations

Select Social Media

Select Stories

Select Websites

Select Wiki

Selected Talks

Short stories

Thought Experiment

Twisted Logic

Twitter

Useful Links

Various Blogs

XYZ

Flickr Photos