Skip to content

The Doomsday argument – in a nutshell.

Can we demonstrate, purely from the way that probability works, that the human race is likely to go extinct in the relatively foreseeable future, regardless of what humanity might do to try and prevent it? Yes, according to the so-called Doomsday argument, and this argument derived from basic probability theory has never been refuted.

Here’s how the argument goes. Let’s say you want to estimate how many tanks the enemy has to deploy against you, and you know that the tanks have been manufactured with serial numbers starting at 1 and ascending from there. Now let’s say you identify the serial numbers on five random tanks and they all have serial numbers under 10. Even an intuitive understanding of the workings of probability would lead you to conclude that the number of tanks possessed by the enemy is pretty small. On the other hand, if they are identified as serial numbers 2524, 7866, 5285, 3609 and 8,009, you are unlikely to be way out if you estimate the enemy has more than 10,000 of them.

Let’s say that you only have one serial number to work with, and that it shows the number 18. On the basis of just this information, you would do well to estimate that the total number of enemy tanks is more likely to be 36 than 360, and even more likely than the total tank account being 36,000.

This way of thinking is an aspect of what is known as the mediocrity principle, which is the notion that an item that is drawn at random from one of several sets or categories is more likely to come from the most numerous category than any of the less numerous categories.

The principle has been used to suggest that, given the existence of life on Earth, life typically exists on Earth-like planets throughout the universe. The idea is to assume mediocrity rather than starting with the assumption that a phenomenon is special, privileged, exceptional or better. As such, it stands in contrast to the anthropic principle, which is the idea that the presence of an intelligent observer (homo sapiens) limits the circumstances to those under which intelligent life can be observed to exist, no matter how improbable. Linked to this is the Copernican principle, the idea in cosmology that we are not privileged or special observers of the universe. It is based on the observation of Nicolaus Copernicus in the 16th century that the Earth is not at the centre of the universe, the Copernican principle invoking the idea that the Earth is nowhere special at all.

It was a principle notably used by astrophysicist John Richard Gott when arriving at the Berlin Wall. He asked himself whether, in the absence of other knowledge, there was any reason to believe that the moment in time that he came upon the Wall was likely to be any special time in the lifetime of the wall. He decided that there was not and that because any moment was equally likely, his best estimate was that there was as much time before he met the wall as there would be for the wall after he met it. In other words, his best guess as to how long the wall would last was exactly as long as it has already been in existence. That was eight years. This form of reasoning was termed the ‘Copernican principle’ by Gott.

It is related to the ‘Lindy effect’, the name of which is derived from a New York delicatessen, famous for its cheesecakes, which was frequented by actors playing in Broadway shows. The Lindy effect was the observation that a Broadway show could expect to last for a further period equal to the length of time it had already been playing. So a show that had been on Broadway for three years could, as a best guess, be expected to last another three years before closing. More generally, the Lindy effect has come to represent the idea that the life expectancy going forward of a non-perishable thing such as a technology or an idea is proportional to its current period of existence, so that every additional period of survival implies a greater future life expectancy.

To return to the Copernican principle, in Bayesian terms it can be viewed as Bayes’ Rule with an uninformative prior. When we want to estimate how long something will last, in the absence of other knowledge, this principle suggests assuming we are at the mid-point of the timeline.

Imagine, in another scenario, that you are made aware that a selected box of numbered balls contains either ten balls (numbered from 1 to 10) or ten thousand balls (numbered 1 to 10,000), and you are asked to guess which. Before you do so, one is drawn for you. It reveals the number seven. That would be a 1 in 10 chance if the box contains ten balls, but a 1 in 10,000 chance if it contained 10,000 balls. You would he right on the basis of this information to conclude that the box very probably contains ten balls, not ten thousand.

Let’s look at the same argument another way. As a thought experiment, imagine a world made up of 100 pods. In each pod, there is one human. Ninety of the pods are painted black on the exterior and the other ten are white. This is known information, available to you and all the other humans. You are one of these people and you are asked to estimate the likelihood that you are inside a black pod. A reasonable way to go about this is to adopt what philosophers call the Self-Sampling Assumption. As explored, it goes like this. “All other things equal, an observer should reason as if they are randomly selected from the set of all existing observers in their reference class (in this case, humans in pods).”  Since nine in ten of all people are in the black pods, and since you don’t have any other relevant information, it seems clear that you should estimate the probability that you are in a black pod as 10 per cent. A good way of testing the good sense of this reasoning is to ask what would happen if everyone bet this way. Well, 90 per cent of the wagers would win and ten per cent would lose. In contrast, assume that the people ignore the self-sampling assumption and adopt the assumption that (since they don’t know which) they are equally likely to be in a black as a white pod. In this case, they might as well toss a coin and bet on the outcome. If they do so, only 50 per cent (as opposed to 90 per cent) will win the bet. As demonstrated, it seems clearly rational here to accept the self-sampling assumption.

Now let’s make the pod example more similar to the tank and ‘balls in the box’ cases. We keep the hundred pods but this time they are distinguished by being numbered from 1 to 100, painted on the exterior of the pods. Then a fair coin is tossed by an external Being. If the coin lands on heads, one person is created in each of the hundred pods. If the coin lands tails, then people are only created in pods 1 to 10. Now, you are in one of the pods and must estimate whether there are ten or a hundred people created in total. Since the number was determined by the toss of a fair coin, and since you don’t know the outcome of the coin toss, and have no access to any other relevant information, it could be argued that you should believe there is a probability of 1/2 that it landed on heads and thus a probability of 1/2 that there are a hundred people. You can, however, use the self-sampling assumption to assess the conditional probability of a number between 1 and 10 being painted on your cubicle given how the coin landed. For example, conditional upon it landing on heads, the probability that the number on your pod is between 1 and 10 is 1/10, since one person in ten will find themselves in these pods. Conditional on tails, the probability that you are in number 1 through 10 is 1, since everybody created (ten of them) must be in one those pods.

Suppose now that you open the door and discover that you are in pod number 6. Again you are asked, how did the coin land? Now you deduce that the probability is somewhat greater than 1/2 that it landed on tails.

The final step is to transpose this reasoning to our actual situation here on Earth. Let’s assume for simplicity there are just two possibilities. Early extinction: the human race goes extinct in the next century and the total number of humans that will have existed is, say, 200 billion. Late extinction: the human race survives the next century, spreads through the Milky Way and the total number of humans is 200,000 billion. Corresponding to the prior probability of the coin landing heads or tails, we now have some prior probability of early or late extinction, based on current existential threats such as nuclear annihilation. Finally, corresponding to finding you are in pod number 6 we have the fact that you find that your birth rank is about 108 billion (that’s approximately how many humans have lived before you). Just as finding you are in pod 6 increased the probability of the coin having landed tails, so finding you are human number 108 billion (about half way to 200 billion) gives you much more reason, whatever the prior probability of extinction based on other factors, to think that Early Extinction (200 billion humans) is much more probable than Late Extinction (200,000 billion humans).

Essentially, then, the Doomsday Argument transfers the logic of the laws of probability to the survival of the human race. To date there have been about 110 billion humans on earth, about 7 per cent of whom are alive today. At least these are indicative estimates. On the same basis as the tank and the balls in the box and the pods problems, a reasonable estimate, other things equal, is that we are about half way along the timeline. Projecting demographic trends forward, this makes our best estimate of the termination of the timeline of the human race as we know it to be within this millennium.

That is the Doomsday argument.

References and Links

Nick Bostrom (2002). A Primer on the Doomsday Argument. http://www.anthropic-principle.com/?q=anthropic_principle/doomsday_argument

Doomsday Argument. Lesswrongwiki. https://wiki.lesswrong.com/wiki/Doomsday_argument

Doomsday Argument. RationalWiki. https://rationalwiki.org/wiki/Doomsday_argument

Doomsday Argument. Wikipedia. https://en.wikipedia.org/wiki/Doomsday_argument

The Keynesian Number Puzzle: An exploration in rationality.


Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

Choose an integer number between 0 and 100. You win a prize if your number is equal or closest to 2/3 of the average number chosen by all other participants. What number should you choose?

If you think that the other participants will choose a random number within the range, the average will be 50. Hence you choose 33. That seems right, intuitively, to many people. But hang on. Just as you chose 33, so presumably will other participants, at least on average, based on your same line of reasoning. So if the average number chosen by all participants is 33, then the smart thing to do is to choose 22.

But do you really think you are smarter than the others? Just as you figured out that 22 is the smart choice, so will others, at least on average. So the super smart thing to do is to choose 15. But … We are heading towards 0 (you get there after 12 iterations). Zero is the only rational choice to make if you don’t think you are smarter than the other participants.

You start to get the strong feeling that if you choose 0 you are not going to win the prize. This is because, although you don’t think you are smarter than most, it is reasonable to assume that at least some of the players are not as smart or rational as you. For example, if 10 per cent of players are totally naïve and choose a random number – 50 on average – then the overall average will be 5 and the right answer will be 3. However, if the rest of the players share your thoughts and assumptions, they will also choose 3, thereby increasing the average to 8 and the right answer to 5. Then you answer 5, but so will the rest, thus increasing the right answer to 6.

The process converges to 8. Well, 8 is the right answer if 90 per cent of players are as smart as you are and 10 per cent are totally naïve. If 20 per cent are naïve, the process converges to 14; with 30 per cent it converges to 18, and so on. But then it may also be the case that the less rational players are not totally naïve (Level 0 rationality) but, for example, exhibit Level 1 rationality, where the average answer is 33. In this case, with 10 per cent Level 1 players the process converges to 5; with 20 per cent to 9; with 30 per cent to 12, and so on. Of course, there are plenty more combinations, with varying proportions of players at Level 0, Level 1, Level 2 and so on. The higher the winning number, the larger is the percentage of less rational players in the game.

In an experiment conducted with Financial Times readers by economist Richard Thaler, made up of 1,476 participants, the winning number was in fact 13. This is roughly consistent with:

  1. All players exhibit Level 3 rationality

OR 2. 80% are fully rational and 20% are totally naïve.

OR 3. 70% are fully rational and 30% exhibit Level 1 rationality.

Etc.

John Maynard Keynes, in Chapter 12 of his ‘General Theory of Employment, Interest and Money’, frames the paradox in terms of the money markets, in a more prosaic way:

“Professional investment may be likened to those newspaper competitions in which the competitors have to pick out the six prettiest faces from a hundred photographs, the prize being awarded to the competitor whose choice most nearly corresponds to the average preferences of the competitors as a whole; so that each competitor has to pick, not those faces which he himself finds prettiest, but those which he thinks likeliest to catch the fancy of the other competitors, all of whom are looking at the problem from the same point of view. It is not a case of choosing those which, to the best of one’s judgment, are really the prettiest, not even those which average opinion genuinely thinks the prettiest. We have reached the third degree where we devote our intelligences to anticipating what average opinion expects the average opinion to be. And there are some, I believe, who practice the fourth, fifth and higher degrees.”

In other words, it is those who are able to best out-guess the best guesses of the rest of the crowd, who stand to win the prize. Or put another way, the ten pound note you spot lying on the floor might well be real after all. Nobody has picked it up yet because they have all assumed that someone else would have picked it up if it were real. You realise that everyone else is thinking like this, and you win yourself a tenner. Let’s call that super-rationality.

Exercise

Choose an integer number between 0 and 100. You win a prize if your number is equal or closest to 2/3 of the average number chosen by all other participants. What number should you choose?

Reference and Links

Keynes’ Beauty Contest. By Richard Thaler in the Financial Times, July 10, 2015. https://www.ft.com/content/6149527a-25b8-11e5-bd83-71cb60e8f08c

Keynesian Beauty Contest. Wikipedia. https://en.wikipedia.org/wiki/Keynesian_beauty_contest

Why listening to Blaise Pascal might just save the planet.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

Blaise Pascal was a 17th century French mathematician and philosopher, who laid some of the main foundations of modern probability theory. He is particularly celebrated for his correspondence with mathematician Pierre Fermat, forever associated with Fermat’s Last Theorem. Schoolchildren learning mathematics are more familiar with him courtesy of Pascal’s Triangle. Increasingly, though, it is Pascal’s Wager, and latterly the Pascal’s Mugging puzzle, that has entertained modern philosophers. Simply stated, Pascal’s Wager can be stated thus: If God exists and you wager that He does not, your penalty relative to betting correctly is enormous. If God does not exist and you wager that He does, your penalty relative to betting correctly is inconsequential. In other words, there’s a lot to gain if it turns out He does and not much lost if He doesn’t. So, unless it can be proved that God does not exist, you should always side with him existing, and act accordingly. Put another way, Pascal points out that if a wager was between the equal chance of gaining two lifetimes of happiness and gaining nothing, then a person would be foolish to bet on the latter. The same would go if it was three lifetimes of happiness versus nothing. He then argues that it is simply unconscionable by comparison to bet against an eternal life of happiness for the possibility of gaining nothing. The wise decision is to wager that God exists, since “If you gain, you gain all; if you lose, you lose nothing”, meaning one can gain eternal life if God exists, but if not, one will be no worse off in death than by not believing. On the other hand, if you bet against God, win or lose, you either gain nothing or lose everything.

It seems intuitively like there’s something wrong with this argument. The problem arises in trying to find out what it is. One good try is known as the ‘many gods’ objection. The argument here is that one can in principle come up about with multiple different characterisations of a god, including a god that punishes people for siding with his existence. But this assumes that all representations of what God is are equally probable. In fact, some representations must be more plausible than others, if the alternatives are properly investigated. A characterisation that has hundreds of million of followers, for example, and a strongly developed set of apologetics is at least a bit more likely to be true than a theory based on an evil teapot.

Once we begin to drop the equal-probability assumption, we severely weaken the ‘many gods’ objection. Basically, if it is more likely that the God of a major established religion is possibly true (however almost vanishingly unlikely any individual might think that to be) relative to the evil teapot religion, the ‘many gods’ objection very quickly begins to crumble to dust. At that point, one needs to take seriously the stratospherically high rewards of siding with belief (at whatever long odds one might set for that) compared to the stakes.

It is true that infinities swamp decisions, but we need not even go as far as positing infinite reward for the decision problem relative to the stakes to become a relatively straightforward one. It’s also true that future rewards tend to be seriously under-weighted by most human decision-makers. In truth, pain suffered in the future will feel just as bad as pain suffered today, but most of us don’t think or behave as if that’s so. The attraction of delaying an unwelcome decision is well documented. In the immortal words of St. Augustine of Hippo in his ‘Confessions’, “Lord make me pure – but not yet!”

A second major objection is the ‘inauthentic beliefs’ criticism, that for those who cannot believe to feign belief to gain eternal reward invalidates the reward. What such critics are pointing to is the unbeliever who says to Pascal that he cannot make himself believe. Pascal’s response is that if the principle of the wager is valid, then the inability to believe is irrational. “Your inability to believe, because reason compels you to and yet you cannot, [comes] from your passions.” This inability, therefore, can be overcome by diminishing these irrational sentiments: “Learn from those who were bound like you. . . . Follow the way by which they began; by acting as if they believed.”
Even some modern atheist philosophers admit to struggling with the problem set by Blaise Pascal. One attempt to square the circle is by saying that in the world where God, as conventionally conceived, exists with a non-zero probability, there is a case for pushing a hypothetical button to make them believe if offered just one chance, and that chance was now or never. Given the chance of delaying the decision as long as possible, however, it seems they would side with St. Augustine’s approach to the matter of his purity.

Pascal’s Wager has taken on new life in the last couple of decades as it has come to be applied to the problems of existential threats like Climate Change. This issue bears a similarity to Pascal’s Wager on the existence of God. Let’s say, for example, there is only a one per cent chance that the planet is on course for catastrophic climatic disaster and that delay means passing a point of no return where we would be powerless to stop it. In that case, not acting now would seem a kind of crazy. It certainly breaches the terms of Pascal’s Wager. This has fittingly been termed Noah’s Law: If an ark may be essential for survival, get building, however sunny a day it is overhead. Yes, when the cost of getting it wrong is just too high, it probably pays to hedge your bets.

Pascal’s Mugging is a new twist on the problem, which can if wrongly interpreted give comfort to the naysayers. It can be put this way. You are offered a proposal by someone who turns up on your doorstep. Give me £10, the door-stepper says, and I will return tomorrow and give you £100. I desperately need the money today, for reasons I’m not at liberty to divulge. I can easily pay you anything you like tomorrow, though. You turn down the deal because you don’t believe he will follow through on his promise. So he asks you how likely you think it is that he will honour any deal you are offered. You say 100 to 1. In that case, I will bring you £1100 tomorrow in return for the £10. You work out the expected value of this proposal to be 1/100 times £1100 or £11, and hand over the tenner. He never comes back and you have, in a way, been intellectually mugged. But was handing over the note irrational? The mugger won the argument that for any low probability of being able to pay back a large amount of money there exists a finite amount that makes it rational to take the bet. In particular, a rational person must admit there is at least some non-zero chance that such a deal would be possible. However low the probability you assign to being paid out, you can be assigned a potential reward, which need not be monetary, which would outweigh it.

Pascal’s mugging has more generally been used to consider the appropriate course of action when confronted more systemically by low-probability, high-stakes events such as existential risk or charitable interventions with a low probability of success but extremely high rewards. Common sense might seem to suggest that spending money and effort on extremely unlikely scenarios is irrational, but since when can we trust common sense? And there’s no reason to believe that it serves us well here either.

Blaise Pascal was a very clever guy and those who over the centuries have too quickly dismissed his ideas have paid the intellectual (and perhaps a much bigger) price. Today, in an age when global existential risk is for obvious reasons (nuclear annihilation not least) a whole lot higher up the agenda than it was in Pascal’s day, it is time that we revisit (atheists, agnostics and believers alike) the lessons to be learned from ‘The Wager’, and that we do so with renewed urgency. The future of the planet just might depend on it.

Exercise

In the Pascal’s Mugging Problem you are offered £3,000 tomorrow if you pay the stranger £25 today. You believe that there is a 1 in 100 chance that the stranger will return to pay you.

Is handing over the £25 rational from an economic point of view? Would you hand over the £25? What if the stranger offered to pay you £10,000 tomorrow, and you believe there is a 1 in 125 chance that he will return to pay you?

Would your answer be different if any of the sums involved were different?

References and Links

Nick Bostrom. Pascal’s Mugging. 443-444. https://nickbostrom.com/papers/pascal.pdf

Pascal’s mugging. Wikipedia. https://en.wikipedia.org/wiki/Pascal%27s_mugging

Amanda Askell on Pascal’s Wager and other low risks with high stakes. Rationally Speaking. Podcast. http://rationallyspeakingpodcast.org/show/rs-190-amanda-askell-on-pascals-wager-and-other-low-risks-wi.html

Transcript of Amanda Askell Podcast. http://static1.1.sqspcdn.com/static/f/468275/27648050/1502083126473/rs190transcript.pdf?token=xQdh8%2B1IgicYGsJS5D%2Fa%2BB0sFMo%3D

Is Believing in God Worth It? SALIENT. http://salient.org.nz/2018/03/is-believing-in-god-worth-it/

The Secretary Problem – in a nutshell.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

In ‘The Merchant of Venice’, by William Shakespeare, as we have seen in an earlier chapter, Portia sets her suitors a problem to solve to find who is right for her. In the play, there are just three suitors and they are asked to choose between a gold, a silver and a lead casket, one of which contains a portrait which is the key to her hand in marriage.

Let us base a thought experiment around Portia’s quest for love in which she meets the successive suitors in turn. Her problem is when to stop looking and start choosing.

To make the problem of more general interest, let’s say she has 100 suitors to choose from. Each will be presented to her in random order and she has twenty minutes to decide whether he is the one for her. If she turns someone down there is no going back, but the good news is that she is guaranteed not to be turned down by anyone she selects. If she comes to the end of the line and has still not chosen a partner, she will have to take whomever is left, even if he is the worst of the hundred. All she has to go on in guiding her decision are the relative merits of the pool of suitors.

Let’s say that the first presented to her, whom we shall call No.1, is perfectly charming but she has some doubts. Should she choose him anyway, in case those to follow will be worse? With 99 potential matches left, it seems more than possible that there will be at least one who is a better match than No.1.

The problem facing Portia is that she knows that if she dismisses No. 1, he will be gone forever, to be betrothed to someone else.

She decides to move on. The second suitor turns out to be far worse than the first, as does the third and fourth. She starts to think that she may have made a mistake in not accepting the first. Still, there are potentially 96 more to see. This goes on until she sees No. 20, whom she actually prefers to No. 1. Should she now grasp her opportunity before it is too late? Or should she wait for someone even better?

She is looking for the best of the hundred, and this is the best so far. But there are still 80 suitors left, one of whom might be better than No. 20. Should she take a chance?

What is Portia’s optimal strategy in finding Mr. Right?

This is an example of an ‘Optimal Stopping Problem’, which has come to be known as the ‘Secretary Problem.’  In this variation, you are interviewing for a secretary, with you aim being to maximise your chance of hiring the single best applicant out of the pool of applicants. Your only criterion to measure suitability is their relative merits, i.e. who is better than whom. As with Portia’s Problem, you can offer the post to any of the applicants at any time before seeing any more candidates, but you lose the opportunity to hire that applicant if you decide to move on to the next in line.

This sort of stopping strategy can be extended to anything including the search for a place to live to a place, a place to eat, the choice of a used car, and so on.

In each of these cases, there are two ways you can fail to meet your goal of finding the best option out there. The first is by stopping too early, and the second is by stopping too late.

By stopping too early, you leave the best option out there. By stopping too late, you have waited for a better option that turns out not to exist. So how do you find the right balance?

Let’s consider the intuition. Obviously, the first option is the best yet, and the second option (assuming we are taking the options in a random order) has a 50% chance of being the best yet. Likewise, the tenth option has a 10% chance of being the best to that point.

It follows logically that the chance of any given option being the best to that point declines as the number of options there have been before increases. So the chance of coming across the ‘best yet’ becomes more and more infrequent as we go through the process.

To see how we might best approach the problem, let’s go back to Portia and her suitors and look at her best strategy when faced with different-sized pools of suitors. Can she do better using some strategy other than choosing at some random position in the order of presentation to her?

It can be shown mathematically that she can certainly expect to do better, given that there are more than two to choose from. Let’s return to the original play where she has three potential matches.

We can look at it this way. If she chooses No. 1, she has no information with which to compare the relative merits of her suitors. On the other hand, by the time she reaches No. 3, she must choose him, even if he’s the worst of the three. In this way, she has maximum information but no choice.

In the case of No. 2, she has more information than she did when she saw No. 1, as she can compare the two. She also has more control over her choice than she will if she leaves it until she meets No. 3.

So she turns down No. 1 to give herself more information about the relative merits of those available. But what if she finds that No. 2 is worse than No. 1? What should she do?

It can in fact be shown that she should wait and take the risk of ending up with No. 3, as she must do if she leaves it to the last. On the other hand, if she finds that she prefers No. 2 to No. 1, she should chose him on the spot and forego the chance that No. 3 will be a better match.

It can also be shown that in the three-suitor scenario, she will succeed in finding her best available match exactly half the time by selecting No. 2 if he is better than No. 1. If she chooses No. 1 or No. 3, on the other hand, she will only have met that aim one time in three.

If there are four suitors, Portia should use No. 1 to gain information on what she should be measuring her standards against, and select No. 2 if he is a better choice than No. 1. If he is not, do the same with No. 3. If he is still not better than No. 1, go to No. 4 and hope for the best. The same strategy can be applied to any number of people in the pool.

So, in the case of a hundred suitors, how many should she see to gain information before deciding to choose someone?

It can, in fact, be demonstrated mathematically that her optimal strategy (‘stopping strategy’) before turning looking into leaping is 37.

She should meet with 37 of the suitors, then choose the first of those to come after who is better than the best of the first 37. By following this rule, she will find the best of the princely bunch of a hundred with a probability, strangely enough, of 37 per cent.

By choosing randomly, on the other hand, she has a chance of 1 in 100 (1%) of settling upon the best.

This stopping rule of 37% applies to any similar decision, such as the secretary problem or looking for a house in a fast-moving market. It doesn’t matter how many options are on the table. You should always use the first 37% as your baseline, and then select the first of those coming after that is better than any of the first 37 per cent.

The mathematical proof is based on the mathematical constant, e (sometimes known as Euler’s number) and specifically 1/e, which can be shown to be the stopping point along a range from 0 to 1, after which it is optimal to choose the first option that is better than any of those before. The value of e is approximately equal to 2.71828, so 1/e is about 0.36788 or 36.788%. This has simply been rounded up to 37 per cent in explaining the stopping rule. It can also be shown that the chance that implementing this stopping rule will yield the very best outcome is also equal to 1/e, i.e. about 37 per cent.

If there is a chance that your selection might actually become unavailable, the rule can be adapted to give a different stopping rule, but the principle remains. For example, if there is a 50% chance that your selection might turn out to be unavailable, than the 37% rule is converted into a 25% rule. The rest of the strategy remains the same. By doing this, you will have a 25% chance of finding the best of the options, however, compared to a 37% chance if you always get to make the final choice. This is still a lot better than the 1 per cent chance of selecting the best out of a hundred options if you choose randomly. The lower percentage here (25% compared to 37%) reflects the additional variable (your choice might not be final) which adds uncertainty into the mix. There are other variations on the same theme, where it is possible to go back with a given probability that the option you initially passed over is no longer available. Take the case, for example, where an immediate proposal will certainly be accepted but a belated proposal is accepted half of the time. The cut-off proportion in one such scenario rises to 61% as the possibility of going back becomes real.

There is also a rule-of-thumb which can be derived when the aim is to maximise the chance of selecting a good option, if not the very best. This strategy has the advantage of reducing the chance of ending up with one of the worst options. It is the square root rule, which simply replaces the 37% criterion with the square root of the number of options available. In the case of Portia’s choice, she would meet the first ten of the hundred (instead of 37) and choose the first of the remaining 90 who is better than the best of those ten.

Whatever variation you adopt, the numbers will change but the principle stays the same.

All this assumes that we are lacking in some objective standard about which we can measure each of our options objectively, without needing to compare which option is better than which. For example, Portia might simply be interested in choosing the richest of the suitors and she knows the distribution of wealth of all potential suitors. This ranges evenly from the bankrupt suitors to those worth 100,000 ducats.

This means that the upper percentile of potential suitors in the whole population are worth upwards of 99,000 ducats. The lowest percentile is worth up to 1,000 ducats. The 50th percentile is worth between 49,000 and 50,000 ducats.

Now Portia is presented with a hundred out of this population of potential suitors, and let’s assume that the suitors presented to her are representative of this population.

Say now that the first to be presented to her is worth 99,500 ducats. Since wealth is her only criterion, and he is in the upper percentile in terms of wealth, her optimal decision is to accept his proposal of marriage. It is possible that one of the next 99 is worth more than 99,500 ducats but that isn’t the way to bet.

On the other hand, say that the first suitor is worth 60,000 ducats. Since there are 99 more to come, it is a good bet that at least one of them will be worth more than this. If she has turned down all suitors, however, until she is being presented with the 99th, her optimal decision now is to accept him. In other words, Portia’s decision as to whether to accept the proposal comes down to how many potential matches she has left to see. When down to the last two, she should choose him if he is above the 50th percentile, in this case 50,000 ducats. The more there are to come the higher the percentile of wealth at which she should accept. She can set a higher threshold. She should never accept anyone who is below the average unless she is out of choices. In this version of the stopping problem, the probability that Portia will end up with the wealthiest of the available suitors turns out to be 58 per cent. More information, of course, increases the chance of success. Indeed, any criterion that provides information on where an option is relative to the relevant population as a whole will increase the probability of finding the best choice of those available. As such, it seems that if Portia is only interested in the money, she is more likely to find it than if she is looking for love.

References and Links

Optimal Stopping: How to find the perfect apartment, partner and parking spot. Brian Christian. Medium.com https://medium.com/galleys/optimal-stopping-45c54da6d8d0

Mathematics, marriage and finding somewhere to eat. +Plus magazine. David K. Smith. https://plus.maths.org/content/os/issue3/marriage/index

Is there a solution to the Sleeping Beauty problem? In a nutshell.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.


Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: On Sunday she will be put to sleep. Once or twice during the experiment, Beauty will be awakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes her forget that awakening.

A fair coin will be tossed on Sunday evening after she is put to sleep, to determine which experimental procedure to undertake: if the coin comes up heads, Beauty will be awakened and interviewed on Monday only. If the coin comes up tails, she will be awakened and interviewed on Monday and Tuesday. In either case, she will be awakened on Wednesday without interview and the experiment ends.

Any time Sleeping Beauty is awakened and interviewed, she is asked, “What is your belief now, as a percentage, in the proposition that the coin landed heads?”

What should Beauty’s answer be?

To one way of thinking about this, the answer is clear. The coin was tossed once prior to her awakening, however many times she is woken, whether once (if it landed heads) or twice (if it landed tails).

Since the fair coin was tossed just once, and no further information is obtained by Beauty at the time she is awoken and interviewed, the answer she should give should be 50 per cent, i.e. a 1 in 2 chance that the fair coin landed heads.

To another way of thinking about it, she is interviewed just once if it landed heads (on the Monday) but she is interviewed twice if it landed tails (on Monday and Tuesday). She does not know which day it is when she is woken and interviewed but from her point of view there are three possibilities. These are:

  1. It landed heads and it is Monday.
  2. It landed tails and it is Monday.
  3. It landed tails and it is Tuesday.

So there are three possibilities, of equal likelihood, and two of these involve the coin landing tails and just one for the coin landing heads. So the answer she should give should be 33.3 per cent, i.e. a 1 in 3 chance that the fair coin landed heads.

So which answer is correct? The world of probability is by and large divided into those who are adamant that she should go with ½ (the so-called ‘halfers’) and those who are equally adamant that she should go with 1/3 (the so-called ‘thirders’). Are they both right, are they both wrong, or somewhere in between?

A way that I usually advocate to resolve seemingly intractable probability paradoxes is to ask at what odds Beauty should be willing to place a bet.

So, if in this experiment Beauty is offered odds of 1.5 to 1 that the coin landed heads, should she take those odds? If the correct answer is a half, those odds are attractive as the correct odds should be 1 to 1 (evens). If the correct answer is a third, those odds are unattractive as the correct odds should be 2 to 1.

So what should Beauty do if offered odds of 1.5 to 1? Bet or decline the bet?

The simplest way to resolve this is to ask what would happen if she accepted the odds of 1.5 to 1 and placed a bet of £10 each time. When the coin came up heads, she would be awoken just once, placed the £10 bet and won £15. However, when the coin landed tails she would be awoken twice and placed two bets of £10, i.e. a total of £20 and lost both bets.

So her net outcome of this betting strategy would be a loss of £5.

This suggests that a half is the wrong answer as to the probability that the coin landed heads. At odds of 2 to 1, on the other hand, she would place £10 on the one occasion she would be awoken, i.e. Monday, and would win £20. However, when the coin came up tails, she would lose £10 on the Monday and £10 on the Tuesday, i.e. £20. Her expected outcome would in this case be to break even. This suggests that odds of 2 to 1 are the correct odds, which is consistent with a probability of 1/3. Some ‘Halfers’ argue that Beauty should be assigned a chip of half the value if the coin lands Tails than if it lands Heads, although she will be unaware of the value of the chip when she stakes it. In this case, she would indeed break even by betting at even money odds, but there seems no reasonable case to be made for applying this arbitrary fix to the experiment.

Applying the ‘betting test’ to this problem, therefore, suggests that Beauty’s answer when she is woken up should that there is a 1 in 3 chance that the coin landed heads when tossed after she was put to sleep on the Sunday.

But how can this be right, when the fair coin was tossed just once, and we know that the chance of a fair coin landing heads is ½? If this is the ‘prior probability’ Beauty should assign to the coin landing heads, and she is given no further information about what happened to the coin when she is woken and questioned, on what grounds should the probability she assigns change? The only information she acquires is that she has been woken and questioned, but she knew that would happen in advance, so this is not new information. Given she assigns a prior probability of ½ to the coin coming up heads, and she acquires no new information, it is perhaps difficult to see on what grounds she should change her opinion. The posterior probability she assigns (after she acquires all new information) should be identical to the prior probability, because she has acquired no new information after being put to sleep to change anything.

This is the kernel of the conundrum, and it is why there is a long-standing and ongoing debate between fervent so-called ‘Halfers’ and ‘Thirders.’

So the question is whether there is a correct answer, and that one school of thought is simply wrong, or whether there is no correct answer and both schools of thought are wrong or only right under one interpretation of the question.

It seems to me that there is, in fact, a straightforward answer, which resolves the problem. To see this, we need to identify the actual ‘prior probability’ that the coin tossed after Beauty goes to sleep is Heads.

This depends on the question we are seeking to answer, and what information is available to Beauty before she goes to sleep.

If she is simply told that a coin will be tossed after she goes to sleep, and nothing else, then her correct estimate that the fair coin will land on heads is ½. This is the answer to a simple question of how likely a fair coin is to land Heads with no conditions, i.e. the unconditional probability that the coin will land Heads is 1/2.

If she is given the additional information, however, that she will be woken just once if the coin lands Heads but twice if it lands Tails (albeit she will remember just one of the awakenings), then we are posing a very different question.

The new question she is being asked to answer is to estimate the probability that whenever she awakens, that her awakening resulted from the coin toss landing Heads. Since she has just one awakening when the coin lands Heads, but two awakenings when it lands Tails, the probability that any particular awakening occurred from a Heads flip is 1/3, i.e. the conditional probability that the coin landed Heads given any particular awakening is 1/3.

By extension, if she is told she will be woken 1,000 times if the coin lands Tails but only once if the coin lands Heads, then her correct estimate of the probability that any particular awakening resulted from the coin landing Heads is 1/1001.

So the ‘prior probability’ Beauty should assign to the chance of a coin landing Heads after any particular awakening is actually 1/3 within the terms of the experiment, even before she goes to sleep. It is true that she has access to no new information whenever she awakens, but that simply means that her ‘prior probability’ of being awakened by a Heads flip remains at 1/3 after she is woken. This is totally consistent with Bayesian reasoning which states the prior probability of an event will not change unless there is new information.

Given, therefore, that she assigns a prior probability of 1/3 to any particular awakening arising from a Heads flip, this should be the answer she gives whenever she awakens, and also before she goes to sleep.

So the paradox resolves to the question Beauty is being asked to answer. What is the probability that a fair coin will land Heads? Answer = ½. What is the probability that whenever she is woken this awakening has resulted from a Heads flip? Answer = 1/3. She is consistent in these answers both before she goes to sleep and whenever she wakes. In other words, because Beauty knows that she will correctly answer 1/3 whenever she is woken, given the rules of the experiment, of which she is aware, she will answer 1/3 before she goes to sleep.

This, at least, is one seemingly reasonable way of looking at, and providing a solution to, the classic Sleeping Beauty problem.

Exercise

You volunteer to undergo the following experiment and are told all of the following details: On Sunday you will be put to sleep. Once or twice during the experiment, you will be awakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes you forget that awakening.

A fair coin will be tossed on Sunday evening after you are put to sleep, to determine which experimental procedure to undertake: if the coin comes up heads, you will be awakened and interviewed on Monday only. If the coin comes up tails, you will be awakened and interviewed on Monday and Tuesday. In either case, you will be awakened on Wednesday without interview and the experiment ends.

Any time you are awakened and interviewed, you are asked, “What is your belief now, as a percentage, in the proposition that the coin landed heads?”

What should your answer be?

References and Links

Solution: ‘Sleeping Beaty’s Dilemma’. Quanta magazine. Jan. 29, 2016 https://www.quantamagazine.org/solution-sleeping-beautys-dilemma-20160129/

Probably Overthinking It. The Sleeping Beauty Problem. Jan. 12, 2015.

Sleeping Beauty Problem. Wikipedia. https://en.m.wikipedia.org/wiki/Sleeping_Beauty_problem

The Bus and Bus Plus Problems – in a nutshell.

The Bus Problem

Every day, Fred gets the solitary 8 am bus to work. There is no other bus that will get him to his destination.

10 per cent of the time the bus is early and leaves before he arrives at 8 am.

10 per cent of the time the bus is late and leaves after 8.10 am.

The rest of the time the bus departs between 8 am and 8.10 am.

One morning Fred arrives at the bus stop at 8 am, sees no bus, and waits for 10 minutes without the bus arriving.

Now, what is the probability that Fred’s bus will still arrive?

Think about it:

Fred’s bus could yet arrive or he might have missed it. So there are two possibilities. So is it correct to assume that in the absence of further evidence the chance of each must be equal, so the probability at 8.10am that his bus will still arrive is 50 per cent?

But if that is the answer at 8.10am, was it also the correct answer at 8 am?

Or was 50 per cent the correct answer at 8am but not at 8.10am?

Or is it the wrong answer at both times, but was correct at 8.05am?

 

Solution

When Fred arrives at 8am, there is a 10 per cent chance that his bus will have already left. After Fred has waited for 10 minutes, he can eliminate the 80 per cent chance of the bus arriving in the period between 8 am and 8.10 am. So only two possibilities remain.

Either the bus has arrived ahead of schedule or it will arrive more than ten minutes late.

Both outcomes are unusual, but since the two outcomes are mutually exclusive and equally likely (10 per cent chance of each), and there are no other possibilities, we should update the probability that the bus will still arrive from 10 per cent (the likelihood, or prior probability, when Fred woke up) to 50 per cent, as there is (once the 80 per cent probability is eliminated) an equal probability (out of the remaining 20%) that the bus will still turn up and that he has missed it. So there is a 1 in 2 chance that he will still catch his bus if he has the patience to wait further, and a 1 in 2 chance that he will wait in vain. The follow-up question is how long he should wait. That’s for another day.

 

Exercise

Bus Plus Problem

Every day, Fred gets the solitary 8 am bus to work. There is no other bus that will get him to his destination.

10 per cent of the time the bus is early and leaves before he arrives at 8 am.

30 per cent of the time the bus is late and leaves after 8.10 am.

The rest of the time the bus departs between 8 am and 8.10 am.

One morning Fred arrives at the bus stop at 8 am, sees no bus, and waits for 10 minutes without the bus arriving.

Now, what is the probability that Fred’s bus will still arrive?

How do you choose between reason without evidence and evidence without reason?

You win a quiz show and are offered a choice. You are presented with a transparent box containing £x and an opaque box which contains either £10x or nothing. Now, you can open the opaque box and take what is inside, or you can open both boxes and take the contents inside both. Which should you choose? Well, if that’s all the information you have, it’s obvious that you should open both boxes. You certainly will not win less than by just opening one of the boxes, but you might win a lot more. So far, so good. But now introduce an additional factor. Before making your decision, you had to undergo a computerised sophisticated psychometric test (a Predictor) which you are now told has been unerring in its prediction of what hundreds of previous contestants would decide. Whenever they chose both boxes, there was nothing inside the opaque box. Whenever they had chosen just the opaque box, however, they found £10x inside. When you make your decision the computer’s decision has already been made. The contents of the opaque box have already been placed there. What is happening is that the Predictor informs the game show organisers its prediction of whether a contestant will choose two boxes or one box. Whenever it predicts that the contestant will choose two boxes, no money is placed in the opaque box. Whenever it predicts that the contestant will choose just the opaque box, £10x has been deposited in the box.

This is essentially the basis of what is known as Newcomb’s Paradox or Newcomb’s Problem, a thought experiment devised by William Newcomb of the University of California and popularised by philosopher Roberz Nozick in a paper published in 1969.

So what should you do? Open just the opaque box or open both boxes.

In his paper, Nozick writes that “To almost everyone, it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem with large numbers thinking that the opposing half is just being silly.”

The argument of those who argue for opening both boxes (the so-called ‘two-boxers’) is that the money has already has already been deposited at the time you are asked to make your decision. Taking two boxes can’t change that, so that’s the rational thing to do.

The argument of those who argue for opening just the opaque box (the so-called ‘one-boxers’) is that the psychometric test is either a perfect or near-perfect predictor of what you will do. It has never got it wrong before. Every single previous contestant who has opened two boxes has found the opaque box empty, and every single previous contestant who has opened just the opaque box has won the £10x. So do what all the evidence tells you is the sensible thing to do and open just the opaque box.

One way of considering the question is to ask whether your choice in some way determines the choice of the Predictor, and thereby the decision as to whether to place the £10x in the box. Well, there’s no time-travelling retro-causality involved. The predictor is basically a piece of computer software which bases its prediction on a psychometric test. It just so happens that the test is uncannily accurate in knowing what people will do.

Look at it this way. The bottom line is that you have a free choice, so why not open both boxes? The problem is that if you are the type of person who is a two-boxer, the predictor will have found this out from the super-efficient psychometric test. If you are the type of person, however, who is a one-boxer, the predictor will find that out too.

So it’s not that there is any good reason in itself to open one box rather than two. After all, what you decide now can’t change what is already in the box. But there is a good reason why you should be the type of person who only opens one box. And the best way to be the sort of person who only opens one box is to only open one box. For that reason, the way to win the £10x is to agree to open just the opaque box and leave the other box untouched.

But why leave behind that extra £x when the £10x which you are about to win is already in the box?

That’s Newcomb’s Paradox. You decide! Are you are a one-boxer or two? And does it matter a shred what x is?

Exercise

You are presented with a transparent box containing £100 and an opaque box which contains either £1000 or nothing. Now, you can open the opaque box and take what is inside, or you can open both boxes and take the contents inside both.

Before making your decision, you had to undergo a computerised sophisticated psychometric test (a Predictor) which you are now told has been unerring in its prediction of what all previous contestants would decide. Whenever they chose both boxes, there was nothing inside the opaque box. Whenever they had chosen just the opaque box, however, they found £1000 inside. When you make your decision the computer’s decision has already been made. The contents of the opaque box have already been placed there. What is happening is that the Predictor informs the game show organisers its prediction of whether a contestant will choose two boxes or one box. Whenever it predicts that the contestant will choose two boxes, no money is placed in the opaque box. Whenever it predicts that the contestant will choose just the opaque box, £1000 has been deposited in the box.

Would you open just the opaque box or both?

 

References and Links

Newcomb’s problem divides philosophers. Which side are you on? Bellos, A. Nov.26, 2016. https://www.theguardian.com/science/alexs-adventures-in-numberland/2016/nov/28/newcombs-problem-divides-philosophers-which-side-are-you-on?CMP=Share_iOSApp_Other

Newcomb’s Problem. Which side won the Guardian’s philosophy poll? Nov. 30, 2016. https://www.theguardian.com/science/alexs-adventures-in-numberland/2016/nov/30/newcombs-problem-which-side-won-the-guardians-philosophy-poll

Newcomb’s Paradox. Brilliant.org. https://brilliant.org/wiki/newcombs-paradox/

Newcomb’s Paradox. Wikipedia. https://en.m.wikipedia.org/wiki/Newcomb%27s_paradox

Collider Bias – in a nutshell.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

Collider Bias (also known as Berkson’s bias or Berkson’s Paradox) is a statistical quirk which makes it appear that there is an association between two events or variables which are actually unrelated. Notably, it shows that two values can be negatively correlated in a sample of a population when they are in fact uncorrelated or positively correlated in that population. It arises because of a type of selection bias, which is caused by the observation of some events more than others.

Take the case of a college which admits students based on either musical excellence or sporting excellence. For the sake of argument, assume that there is no link between the two in the total relevant population (say, all students in the country). In other words, a musically talented individual is no more nor less likely to be talented at sport. Because the college admits only students who are excellent at music, or excellent at sport, or both, this creates a group or subset of the population which displays a negative association between musical and sporting excellence.

To illustrate why, let’s make the simplifying assumption that the college admits students who score 9 or 10 out of 10 (on a scale of 0 to 10) on either sporting excellence or musical excellence. In the entire population, however, the average rating of the worst musician and the best sportsman would be equal, i.e. 5 out of 10. Yet within the group of student entrants, the average rating for sporting ability of those admitted for musical ability is still 5 (the population average) compared to an average rating of 9.5 for musical ability. The effect is to imply a negative correlation between sporting and musical ability where no such correlation exists in the wider population.

This has been shown to have important implications for medical statistics. Say, for example, that a hospital conducts a study which admits patients onto the study who are suffering from either eye cataracts or diabetes. In this case, there will appear an association (albeit spurious) between cataracts and diabetes in the set of patients included in the study which does not appear in the wider population. The reason that this paradox occurs is that the probability of one event happening (cataracts, in this example) is higher in the presence of the other event (diabetes, in this example) because cases whether neither occur are excluded.

Similarly, take the idea that there is a negative association in our minds between the quality of movies based on really good books and the quality of the books. One explanation can be derived from Berkson’s Paradox. This interpretation is that we remember the instances where the book is really good or the movie is really good or both. But we forget those cases where both the book and the movie were bad. In this case we find a (spurious) negative correlation between how good the movie is and how good the book is, because the bad movies/bad books element of the population are not included in the set of movies and books under analysis.

Perhaps the most famous example of Collider Bias was proposed by Jordan Ellsberg. This is the ‘attractive people are jerks’ example and is similar to the movies/books example. Say that someone only associates with people who are either pleasant or attractive or both. That eliminates from the sample pool those who are both unpleasant and unattractive. That leaves a sample with attractive people who are unpleasant, and pleasant people who are unattractive, but eliminates those who are neither pleasant nor attractive. So an association is noted between being attractive and being unpleasant, but this is because the unattractive people who are also unpleasant are not observed. So even if no link exists between attractiveness and unpleasantness in the population, it does in an observed world where the counter-examples who exist in the population are avoided and ignored.

To put it more formally, assume there are two independent events, X and Y. These events are not correlated when observed in nature. If one conditions on the fact that either event X or event Y occurred (call this condition Z), however, these events are now correlated. This arises because of selection bias. If we condition on Z (that X OR Y occurs), then if we know that event X did not occur, we know that event Y did occur. This conditioning on Z, what we can call the union of X and Y, leads to a correlation.

Put mathematically, if P (XIY) = P (X), then P (XIY, Z) is less than P (XIZ) where Z = X U Y.

Numerical example of Collider Bias

10% of the population swim and 5% play squash weekly, but there is no correlation between swimming and playing squash in the general population. So someone who plays squash is as likely to swim as any other member of the population and vice-versa.

Of the 200 members of a local health club, 30% swim and 20% play squash.

Based on the health club statistics, is there any evidence of a correlation between those who do not swim and those who play squash?

To answer this, we use the assumption that someone who plays squash is as likely to swim as any other member of the population, i.e. swimming and squash playing can be treated as independent events. So, the percentage of health club members who play squash who also swim would be 10% x 5% = 0.5% of 200 members, i.e. 1 member.

A randomly chosen health club member, however, has a 30% chance of swimming and a 20% chance of playing squash. So, 60 out of 200 members will swim and 40 play squash.

Now, what is the chance that a member who is not a swimmer plays squash?

Of the 60 members who swim, we have calculated above that only 1 also plays squash, i.e. of the 200 members in total, 60 swim and one swims and plays squash.

So, of the remaining 140 patients who do not swim, 39 play squash, i.e. 40 members in total play squash minus one who swims and plays squash. Thus, 39 members who do not swim play squash.

So 39 of the 140 health club members who do not swim play squash, i.e. 39/140 (27.9%). This is higher than the 20% in the population who play squash.

Even though the two events (swimming and squash) are independent, therefore, the health club statistics make it appear that swimming reduces the likelihood of playing squash, i.e. there is a negative correlation between swimming and playing squash. The reason is that we excluding from consideration those members of the general population who neither swim nor pay squash, and only considering those who either swim or play squash or both.

 Exercise

10% of the population is suffering from a flu virus. Of those in a clinic intake of 100 patients, 30% are suffering from a flu virus. 10% of those in the clinic were admitted for appendicitis. Now, assume that someone suffering from appendicitis is as likely to get flu as any other member of the population, and vice-versa.

Is there any evidence from the clinic statistics that having flu reduces the likelihood of having appendicitis?

References and links

Paradoxes of probability and other statistical strangeness. Berkson’s Paradox. The Conversation. https://theconversation.com/paradoxes-of-probability-and-other-statistical-strangeness-74440

Berkson’s paradox. Physics of Risk. Oct. 9, 2018.  http://rf.mokslasplius.lt/berkson-paradox/

Berkson’s paradox explained. Healthcare Economist. July 9, 2013. https://www.healthcare-economist.com/2013/07/09/berksons-paradox-explained/

Berkson’s Paradox. Mathemathinking. Oct. 5, 2014. http://corysimon.github.io/articles/berksons-paradox-are-handsome-men-really-jerks/

Jordan Ellsberg (2014), Why are Handsome Men Such Jerks? June 3. Slate.com https://slate.com/human-interest/2014/06/berksons-fallacy-why-are-handsome-men-such-jerks.html

Simpson’s Paradox – in a nutshell.

Was the University of California, Berkeley, guilty of discrimination in their entry standards? This was a cause of concern in the early 1970s. To show what was behind the concern, we can highlight the admission figures for the Fall term of 1973. This shows that male applicants to the University were significantly more likely to be accepted than females.

Applicants            Admitted

Men               8442              44%

Women         4321              35%

Looks pretty damning, until it was decided to break the admittance figures down by department. In doing so, it revealed a paradox.

Dept.              Men                                       Women

Applicants    Admitted                  Applicants    Admitted

A         825                 62%                108                82%

B         560                 63%                 25                68%

C          325                 37%                593                34%

D         417                 33%                375                35%

E          191                 28%                393                24%

F          373                 6%                   341               7%

In other words, a higher proportion of women were admitted to four of the six departments than men.

So what was going on? Those with statistical training soon realised that this was a simple example of Simpson’s Paradox. Simpson’s Paradox arises when different groups of frequency data are combined, revealing a different performance rate overall than is the case when examining a breakdown of the performance rate. Put another way, Simpson’s paradox is the appearance of trends within different groups which disappear when data for the groups are combined together.

In the case of Berkeley, a study published in 1975 by Bickel, Hammel and O’Connell, in ‘Science’ reached the conclusion that women tended to apply to the more competitive departments with low rates of admission, such as the English Department, while men tended to apply to less competitive departments with high rates of admission, such as engineering and chemistry. As such the University was not actively discriminating against women, at least not on the basis of the statistics used to make the charge.

Ignorance of the implications of Simpson’s Paradox might also generate false conclusions in the case of medical trials.

Take the following drugs, and their success rate in medical trials over two different days.

Drug A                                                           Drug B

Day 1             63/90 = 70%                         8/10 = 80%

Day 2             4/10 = 40%                          45/90 = 50%

Overall, Drug A = 67% success rate; Drug B = 53% success rate.

But Drug B performs better on both days.

So which is the better drug? In the medical trials, I would certainly choose to be treated by Drug A. Others might differ, but I doubt they would persuade any reasonable judge of the outcome of the trials.

Take another example. In this trial, there are two groups, consisting of a control group of 240 patients who are supplied with a placebo drug, such as a sugar pill, which is known to have no effect on the illness under evaluation, and a test group of 240 patients who are supplied with the real drug. The 240 patients are made up of four groups. Group A is elderly adults, Group B is middle-aged adults, Group C is young adults and Group D is children.

Here are the results, with success rate measured by the proportion recovering from the illness within two days of taking the drug:

Those taking the placebo.

Group A: 20; Group B: 40; Group C: 120; Group D: 60

Success rates are:

Group A: 10%; Group B: 20%; Group C: 40%; Group D: 30%

Overall success rate for those taking the placebo = 2+8+48+18 Divided By 240 = 76/240 = 31.7%.

Those taking the real drug.

Group A: 120; Group B: 60; Group C: 20; Group D: 10

Success rates are:

Group A: 15%; Group B: 30%; Group C: 60%; Group D: 45%

Overall success rate for those taking the real drug = 18+18+12+18 Divided By 240 = 66/240 = 27.5%.

This compares with an overall success rate for those taking the placebo of 31.7%.

So the placebo, over the whole sample, produced a higher success rate than the real drug.

Breaking the numbers down by group, however, reveals a discrepancy.

For the real drug

Group A: 10%; Group B: 20%; Group C; 40%; Group D: 30%

For the placebo

Group A: 15%; Group B: 30%; Group C; 60%; Group D: 45%

So, in each individual group (elderly adults, middle-aged adults, young adults, children) the success rate is greater for those taking the real drug, although in the group as a whole, it is less.

How can we resolve the paradox?

The answer lies in the size and age distribution of each group, which differs between those who received the real drug and those who received the placebo. In this study, the group which received the placebo consists of a whole lot more young adults, for example, than the other groups, in contrast with the number taking the real drug. This is important because the natural recovery rates from this illness (as defined in the test) are normally higher in this demographic than the other groups, whether they receive the real drug or the placebo. Again, the elderly (whose recovery rates are normally lower than average) are much more heavily represented among those taking the real drug than the placebo.

Take another example from baseball. In the 1995/96 seasons, fans were divided between those who claimed Derek Jeter as the best performing player and those who claimed that title for David Justice. It is easy to see why. Here are their batting averages.

1995                                                       1996                           Combined

Derek Jeter             12/48 (.250)             183/582 (.314)       195/630 (.310)

David Justice           104/411 (.253)       45/140 (.321)         149/551 (.270)

Here we see that Jeter has the better overall batting average but Justice records a better average in each of the two years making up that overall average. To anyone conversant with Simpson’s Paradox this is nothing weird. It is certainly possible in theory for one player to score a better batting average in successive years than another, yet record a worse batting average overall. The case of Jeter and Justice is an example where the theory clearly shows up in practice.

Indeed, forward to 1997 and the paradox grows even stronger. In that year, Jeter averaged 0.291 (190/654), while Justice scored a better average (163/495). So, in three successive years, Justice recorded a better average than Jeter. Over the whole period, though, the batting average for Derek Jeter was 0.300 (385/1284), superior to David Justice, on 0.298 (312/1046).

So who is the better baseball player? Were the University of California, Berkeley, discriminating on the basis of gender? Which is the better drug? All of these questions are examples of Simpson’s Paradox.

Exercise

In a cricket match, bowlers propel the ball at the wicket defended by batsmen. The batsmen aims to score runs by hitting the ball and running between the wickets. The bowler aims to dismiss (‘take the wicket of’) the batsman by various means including hitting the wicket of the batsman. The bowling average is the number of runs scored by the batsmen off the bowler divided by the number of wickets taken by the bowler. The lower the bowling average the better for the bowler.

Now, let’s take the following example of two mythical cricket matches played by legendary bowlers, Harold Larwood and Bill Voce.

First Match:

Harold Larwood takes 3 wickets while bowling but concedes 60 runs off his bowling (an average of 20 runs conceded per wicket).

Bill Voce takes 2 wickets while bowling but concedes 68 runs (an average of 24 runs conceded per wicket).

Second Match:

Harold Larwood takes 1 wicket and concedes 8 runs (an average of 8 runs conceded per wicket).

Bill Voce takes 6 wickets and concedes 60 runs (an average of 10 runs conceded per wicket).

Question: Which bowler has the superior performance in the first match? Which bowler has the superior performance in the second match? Which bowler has the superior performance overall?

 

References and Links

Maths in a minute: Simpson’s Paradox. +Plus magazine. November 5, 2010. https://plus.maths.org/content/maths-minute-simpsons-paradox

All about averages: Simpson’s Paradox. +Plus magazine. January 1, 2005. https://plus.maths.org/content/all-about-averages

Paradoxes of probability and other statistical strangeness. Simpson’s Paradox. The Conversation. https://theconversation.com/paradoxes-of-probability-and-other-statistical-strangeness-74440

Simpson’s Paradox. Wikipedia. https://en.m.wikipedia.org/wiki/Simpson%27s_paradox

The Birthday Problem – in a nutshell.

How large should a randomly chosen group of people be, to make it more likely than not that at least two of them share a birthday?

For convenience, assume that all dates in the calendar are equally likely as birthdays, and ignore the Leap Year special of February 29th

The first thing to look at is the likelihood that two randomly chosen people would share the same birthday.

Let’s call them Felix and Felicity. Say Felicity’s birthday is May 1st. What is the chance that Felix shares this birthday with Felicity? Well there are 365 days in the year, and only one of these is May 1st and we are assuming that all dates in the calendar are equally likely as birthdays. What we call the sample space is, therefore 365 days and each particular birthday is an ‘event’ in that sample space.

So, the probability that Felix’s birthday is May 1st is 1/365, and the chance he shares a birthday with Felicity is 1/365.

So what is the probability that Felix’s birthday is not May 1st? It is 364/365. This is the probability that Felix doesn’t share a birthday with Felicity.

More generally, for any randomly chosen group of two people, the probability that the second person has a different birthday to the first is 364/365.

With 3 people, the chance that all three are different is the chance that the first two are different (364/365) multiplied by the chance that the third birthday is different (363/365).

So, the probability that 3 people have different birthdays = 364/365 x 363/365

Now, suppose that the room contains four people. What is the probability that at least two of these people share the same birthday?

The probability that 4 people have different birthdays = 364 x 363 x 362 / 365 x 365 x 365

We can then subtract this probability from 1 to establish the probability that at least two of the four share a birthday.

Probability that none of the four people share the same birthday =

365 x 364 x 363 x 362 / 365 x 365 x 365 x 365 = 0.984

Probability that at least two of them share the same birthday = 1 – 0.984 = 0.016

Similarly, it can be calculated that the probability of at least two sharing a birthday increases as n, the number in the room, increases, as below:

n = 16; probability = 0.281

n= 23; probability = 0.505

n = 32; probability = 0.754

n = 40; probability = 0.892

So, the probability that two share a birthday exceeds 0.5 in a room of 23 or more people.

So how large should a randomly chosen group of people be, to make it more likely than not that at least two of them share a birthday? The answer is 23.

The intuition behind this is quite straightforward if we recognise just how many pairs of people there are in a group of 23 people, any pair of which could share a birthday.

In a group of 23 people, there are in fact 253 pairs of people to choose from.  Therefore, a group of 23 people generates 253 chances, each of size 1/365, of having at least two people in the group sharing the same birthday.

The Birthday Problem is in this way notable for being a classic example of the Multiple Comparisons Fallacy. This fallacy arises when, in looking at many variables, the number of possible correlations that are being tested is under-estimated. In particular, multiple comparisons arise when a statistical analysis involves multiple simultaneous statistical tests, each of which has a potential to produce a ‘discovery.’ For example, with a thousand variables, there are almost half a million (1,000×999/2) potential pairs of variables that might appear correlated by chance alone. While each pair is extremely unlikely in itself to show dependence, from the half a million pairs, it is very possible that a large number will appear to be dependent. Say, for example, more than 20 comparisons are made where there is a 95% confidence level for each. In this case, we may well get a false comparison by chance.  This becomes a fallacy when that false comparison is seen as significant rather than a statistical probability. This fallacy can be addressed by the use of more sophisticated statistical tests.

To summarize the Birthday problem, in a group of 23 people (assuming each of their birthdays is an independently chosen day of the year with all days equally likely), there is in fact greater than a 50 per cent chance that at least two of the group share the same birthday. This seems counter-intuitive, since it is rare to meet someone that shares a birthday. Indeed, if you select two random people, the chance that they share a birthday is about 1 in 365. With 23 people, however, there are 253 (23×22/2) pairs of people who might have a common birthday. So by looking across the whole group, we are checking whether any one of these 253 pairings, each of which independently has a tiny chance of coinciding, does indeed match. Because there are so many possibilities of a pair , it makes it more likely than not, statistically, for coincidental matches to arise. For a group of as 40 people, say, it is nearly nine times as likely that at least share a birthday than that they do not.

To be technical about it, in a group of 23 people, there are, according to the standard formula, 23C2 pairs of people (called 23 Choose 2) pairs of people.

Generally, the number of ways k things can be chosen from n is:

n C k = n! / (n-k)! k!

Here n! (n factorial) is n x n-1 x n-2 … down to 1. Similarly for k!

Thus, 23C2 = 23! / 21! 2! = 23 x 22 / 2 = 253

These chances have some overlap: if A and B have a common birthday, and A and C have a common birthday, then inevitably so do B and C.

So the probability of at least two people sharing a birthday in a group of 23 is less than 253/365 (69.3%).

The probability that at least two people in the group of 23 do not share a birthday is:

(364/365)253 = 0.4995

Essentially, making 253 comparisons and having them all be different is like getting heads 253 times in a row, i.e. you avoided tails 253 times in a row.

The odds of two people having different birthdays is 1 – 1/365 = 364/365 = 0.99726.

The odds of 23 people having different birthdays is (364/365)253   = 0.4995

The odds that at least two of the 23 people share the same birthday = 1 – 0.4995 = 0.505 = 50.5%

So the next time you see two football teams line up, with the referee, it is more likely than not that two of those on the pitch share the same birthday.

Exercise

What is the probability that a randomly selected group of 24 people share a birthday? Assume that all dates in the calendar are equally likely as birthdays, and ignore the Leap Year February 29.th

 

References and Links

Probability and the Birthday Paradox. Scientific American. March 29, 2012. https://www.scientificamerican.com/article/bring-science-home-probability-birthday-paradox/

Understanding the Birthday Paradox. Better Explained. https://betterexplained.com/articles/understanding-the-birthday-paradox/

Birthday Problem. Wikipedia. https://en.wikipedia.org/wiki/Birthday_problem

Multiple Comparisons Fallacy. In: Paradoxes of Probability and other statistical strangeness. The Conversation. Woodcock, S. April 4, 2017. https://theconversation.com/paradoxes-of-probability-and-other-statistical-strangeness-74440

Multiple Comparisons Fallacy. Logically Fallacious. https://www.logicallyfallacious.com/tools/lp/Bo/LogicalFallacies/130/Multiple-Comparisons-Fallacy

The Multiple Comparisons Fallacy. Fallacy Files. http://www.fallacyfiles.org/multcomp.html

The Misleading Effect of Noise: The Misleading Comparisons Problem. Koehrsen, W. Feb. 7, 2018. whttps://towardsdatascience.com/the-multiple-comparisons-problem-e5573e8b9578

Multiple Comparisons. https://youtu.be/EMzcZFtGZZE

The Multiple Comparisons Problem. https://youtu.be/dzi1CSvzCoU