Skip to content

Introducing Bayes

Bayes’ theorem concerns how we should update our beliefs about the world when we encounter new evidence. The original presentation of Rev. Thomas Bayes’ work, ‘An Essay toward Solving a Problem in the Doctrine of Chances’, was given in 1763, after Bayes’ death, to the Royal Society, by Richard Price. In framing Bayes’ work, Price gave the example of a person who emerges into the world and sees the sun rise for the first time. At first, he does not know whether this is typical or unusual, or even a one-off event. However, each day that he sees the sun rise again, his confidence increases that it is a permanent feature of nature. Gradually, through this purely statistical form of inference, the probability he assigns to his prediction that the sun will rise again tomorrow approaches (although never exactly reaches) 100 per cent. The Bayesian viewpoint is that we learn about the universe and everything in it through approximation, getting closer to the truth as we gather more evidence, and thus rationality is regarded as a probabilistic matter. As such, Bayes applies and formalises the laws of probability to the science of reason, to the issue of cause and effect.

In its most basic form, Bayes’ Theorem is simply an algebraic expression with three known variables and one unknown. It is true by construction. But this simple formula can lead to important predictive insights. So Bayes’ Theorem is concerned with conditional probability. That is, it tells us the probability that a theory or hypothesis is true if some new information comes to light, based on the probability we attach to it being true before the new information is known, updated in light of the new information.

Presented most simply, it looks like this:

Probability that a hypothesis is true given some new evidence (‘Posterior Probability’) =

xy/[xy+z(1-x)], where:

x is the prior probability of the hypothesis being true (the probability we attach before the new evidence arises)

y is the probability that the new evidence would arise if the hypothesis is true

z is probability the new evidence would arise if the hypothesis is not true

1-x is the prior probability that the hypothesis is not true


Using more traditional notation,

P(H) = probability the hypothesis is true (x)

P(E) = probability of the evidence

P(H’) = probability the hypothesis is not true

P(EIH) = probability of the evidence given that the hypothesis is true (y)

P(EIH’) = probability of the evidence given that the hypothesis is not true (z)

P(HIE) = posterior (updated) probability (PP)


The equation is easily derived.

The entry point is this equation, both sides of which are equal to the probability of the evidence and hypothesis taken together.

P(HIE).P(E) = P(H).P(EIH)

To derive Bayes’ Theorem, divide through by P (E).

P (HIE) = P (H).P(EIH) / P(E) … Bayes’ Theorem

P (E) = P (H). P (EIH) + P (EIH’).P (H’)

So P (HIE) = P (H).P (EIH) / [P (H). P (EIH) + P (EIH’).P(H’)]


Take a simple card example.

There are 52 cards in the deck. 26 are black, 26 are red. One of the cards is the Ace of Spades.

Hypothesis: A selected card is the Ace of Spades.

Evidence: The card is revealed to be black.

So P (HIE) = 1/26 (there are 26 black cards, of which one is the Ace of Spades).

P (E) = ½ (1/2 of the cards are black); P(H) = 1/52 (there are 52 cards of which one is the Ace of Spades)

P (EIH) = 1 (if it’s the Ace of Spades, it must be black).

So P (HIE). P (E) = 1/52

And P (H). P (EIH) = 1/52

As expected, and as always, P (HIE). P (E) = P (EIH).P(H). This is the fundamental expression from which Bayes’ Theorem is easily derived, as above by dividing both sides by P (E).

Thus, Bayes’ Theorem states that: P (HIE) = P (EIH).P (H) / P(E)

As before, P (E) = P (H). P (EIH) + P (EIH’).P (H’), i.e. Probability of the evidence = Probability of the evidence if the hypothesis is true times the probability the hypothesis is true PLUS probability of the evidence if the hypothesis is not true times the probability the hypothesis is not true.

So, P (HIE) = P (H).P (EIH) / [P (H).P(EIH) + P(EIH’).P(H’)] – Longer expression of Bayes’ Theorem

OR, P (HIE) = xy / [xy + z (1-x)] – Bayes’ Theorem.


Does P (HIE) = P (EIH)?

Is the probability that a selected card is the Ace of Spades given the evidence (it is a black card) equal to the probability it is a black card given that it is the Ace of Spades?

In this example, P(HIE) = 1/26, there is one Ace of Spades out of 26 black cards.

P(EIH) = 1, since the probability it is a black card if it is the Ace of Spades is certain.

So, P(HIE) is not equal to P(EIH).

To claim they are equal is more generally known as the Prosecutor’s Fallacy (also known as the Inverse Fallacy).


What is the chance that a card in a four-card deck is the Ace of Spades?

Four cards in the deck. Ace of Spades, Ace of Clubs, Ace of Diamonds, Ace of Hearts

Hypothesis: The selected card is the Ace of Spades.

Prior probability of Ace of Spades (AS) = ¼

What is the posterior probability it is Ace of Spades given evidence that the card is black?

P(HIE) = P(H).P(EIH)/P(E) = ¼.1 / (1/2) = ½

PP = xy/[xy+z(1-x)] = ¼.1/[1/4 + 1/3(3/4)] = ¼ / ½ = ½

NB z = P(EIH’). This is the probability of a black card if the card is not the Ace of Spades. There are three other cards, only one of which is black, so z = 1/3.

So either formula generates the same correct answer, that the posterior probability that the hypothesis is true (the card is the Ace of Spades given that it is black) is ½.


Dice Example

Two dice are thrown. The hypothesis is that two sixes will be thrown. The new evidence is that a six is thrown on the first one.

P (H) = x = 1/36

P (EIH) = y = 1 (for a double six, a six must be thrown on the first one).

P (E) = 1/6 (there is a 1 in 6 chance of throwing a six on the first die)

P (HIE) = posterior probability = P(EIH). P(H) /P(E) = 1/36 / 1/6 = 1/6 (there is a 1 in 6 chance of a double six if the first die lands on a six).

Note : P(H).P(EIH) = P(E).P(HIE) = 1/36

Note also: P(E) = P(H).P(EIH) + P(H’).P(EIH’) = 1/36 + 35/36 . 5/35 = 1/36 + 5/36 = 1/6

Similarly, PP = xy/[xy+z(1-x)] = 1/6

Note: z = P(EIH’) = 5/35 because if not a 6,6 (H’), 35 options left and chance of a single six occurs in 5 of them, i.e. 6,1; 6,2; 6,3; 6,4; 6,5


Does P(HIE) = P(EIH)?

Is the probability of obtaining six on two dice, if the first comes up six, the same as the probability of the first coming up six if both come up six?

In this example, P (HIE) = 1/6, which is the chance the second die will come up six if the first does.

P (EIH) = 1, since the first die must come up six if both dice are to come up six.

So, P (HIE) is not equal to P (EIH), highlighting again the classic Prosecutor’s Fallacy.

The key contributions of Bayesian analysis to our understanding of the world are fivefold

  1. It clearly shows that P (HIE) is not the same thing as P (EIH). The conflation of these two expressions is known as the Prosecutor’s Fallacy and has been sufficient in itself to cause countless miscarriages of justice and to reach erroneous conclusions more generally about the likelihood that a hypothesis is true in the context of observed evidence.
  2. So what is P (HIE) equal to?
  3. P (HIE) = P (EIH). P (H) /P (E), where P (E) = P (H). P (EIH) + P (H’). P(EIH’).
  4. Bayes’ Theorem makes clear the importance not just of new evidence but also the (prior) probability that the hypothesis was true before the new evidence was observed. This prior probability is generally given far too little weight compared to the new evidence in common intuition about probability. Bayes’ Theorem makes it explicit.
  5. Bayes’ Theorem allows us a way to calculate the updated probability as accurately as allowed by our assessment of the prior probability of the hypothesis being true and the probability of the evidence arising given the hypothesis being true and being false.


In all these ways, Bayes’ Theorem replaces often faulty intuition and logic with a rigorous application of conditional probability theory, to give us as accurate as possible a representation of how probable a hypothesis is to be true given the available evidence at any given point in time.






Who should get what when the game is interrupted early?

What is the fair division of stakes in a game which is interrupted before its conclusion? This was the problem posed in 1654 by French gambler Chevalier de Mere to philosopher and mathematician, Blaise Pascal, of Pascal’s Triangle and Pascal’s Wager fame, who shared it in now-famous correspondence with mathematician Pierre de Fermat (best known these days perhaps for Fermat’s Last Theorem). It has come to be known as the Problem of Points.

The question had first been formally addressed by Franciscan Friar and Leonardo Da Vinci collaborator, Luca Bartolomeo de Pacioli, father of the double-entry system of bookkeeping. Pacioli’s method was to divide the stakes in proportion to the number of rounds won by each player to that point. There is an obvious problem with this method, however. What happens, for example, if only a single round of many has been played. Should the entire pot be allocated to the winner of that single round? In the mid-1500s, Venetian mathematician and founder of the theory of ballistics, Niccolo Fontana Tartaglia, proposed basing the division on the ratio between the size of the lead and the length of the game. But this method is not without its own problems. For example, it would split the stakes in the same proportion whether one player was ahead by 40-30 or by 99-89, although the latter situation is hugely more advantageous than the former.

The solution adopted by Pascal and Fermat based the division of the stakes not on the history of the interrupted game to that point as on the possible ways the game might have continued were it not interrupted. In this method, a player leading by 6-4 in a game to 10 would have the same chance of winning as a player leading by 16-14 in a game to 20, so that an interruption at either point should lead to the same division of stakes. As such, what is important in the Pascal-Fermat solution is not the number of rounds each player has yet won but the number of rounds each player still needs to win.

Take another example. Suppose that two players agree to play a game of coin-tossing repeatedly to won £32, and the winner is the first player to win four times.

If the game is interrupted when one of the players is ahead by two games to one, how should the  £32 be divided fairly between the players?

In Fermat’s method, imagine playing another four games. Outcomes of each coin-tossing game are equally likely and are P (won by Player 1) and Q (won by Player 2).

The possible outcomes of the next four games are as follows:





The probability that Player 1 would have won is 11/16  (in bold) = 68.75%.

The probability that Player 2 would have won is 5/16 = 31.25%.

Pascal simplified the method by devising a principle of smaller steps, which dispenses with the need to consider possible steps after the game had already been won. In doing so, he was able to devise a relatively simple formula which would solve all possible Problems of Points, without needing to go beyond the point at which the game resolves in favour of one or other of the players, based on Pascal’s Triangle, demonstrated below.

1                                     = 1

1       1                                 = 2

1    2      1                            = 4

1   3    3     1                        = 8

1   4   6   4    1                   = 16

1  5  10  10  5   1                = 32

1 6 15  20 15  6   1            = 64

1 7 21 35 35 21 7 1           = 128

Each of the numbers in Pascal’s Triangle is the sum of the adjacent numbers immediately above it.

If the game is interrupted, as above, 2-1, after three games, in a first to four match, the resolution is 1+4+6 / 16 to 4+1 / 16, i.e. 11/16 to Player 1 and 5/16 to Player 2.

More generally, Pascal’s method establishes the modern method of expected value when reasoning about probability.

To show this, consider the probability that Player 1 would win if leading 3-2 in a game in which the first player to win four games is the outright winner. If Player 1 wins the next coin toss, he goes ahead 4-2 and wins outright (value = 1). There is a 50% chance of this. There is a 50% chance of player 2 winning the coin toss, however, in which case the game is level (3-3). If the game is level, there is a 50% chance of player 1 winning (and a 50% chance of player 2 winning).

So the expected chance of player 1 winning when leading 3-2 = 50% x 1 + 50% x 50% = 50% + 25% = 75%. Expected chance of player 2 winning = 25%.

Now consider the probability that Player 1 would win if leading 3-1 in a game in which the first player to win four games is the outright winner. If Player 1 wins the next coin toss, he goes ahead 4-1 and wins outright (value = 1). There is a 50% chance of this. There is a 50% chance of player 2 winning the coin toss, however, in which case the game goes to 3-2. We know that the expected chance of player 1 winning if ahead 3-2 is 75% (derived above).

So the expected chance of player 1 winning when leading 3-1 = 50% x 1 + 50% x 75% = 50% + 25% = 87.5%. Expected chance of player 2 winning = 12.5%.

Now consider the question that we solved using Fermat’s method, i.e. the probability that Player 1 would win if leading 2-1 in a game in which the first player to win four games is the outright winner. If Player 1 wins the next coin toss, he goes ahead 3-1, and has an expected chance of winning of 87.5%.  and wins outright (value = 1). There is a 50% chance of this. There is a 50% chance of player 2 winning the coin toss, however, in which case the game goes to 2-2. We know that the expected chance of player 1 winning if tied is 50%. There is a 50% chance of this.

So the expected chance of player 1 winning when leading 2-1 = 50% x 87.5% + 50% x 50% = 43.75% + 25% = 68.75% (i.e. 11/16). Expected chance of player 2 winning = 31.25% (i.e. 5/16).

So both Fermat’s method and Pascal’s method yield the same solution, by different routes, and will always do so in determining the correct division of stakes in an interrupted game of this nature.



Can you solve the problem that inspired probability theory? (Problem of the Points).



Why do we always find ourselves in the slower lane?

Is the line next to you at the check-in at the airport or the check-out at the supermarket really always quicker than the one you are in? Is the traffic in the neighbouring lane always moving a bit more quickly than your lane? Or does it just seem that way?

One explanation is to appeal to basic human psychology. For example, is it an illusion caused by us being more likely to glance over at the neighbouring lane when we are progressing forward slowly than quickly? Is it a consequence of the fact that we tend to look forwards rather than backwards, so vehicles that are overtaken become forgotten very quickly, whereas those that remain in front continue to torment us? Do we take more notice, or remember for longer the times we are passed than when we pass others? If this is the complete explanation, it seems we should passively accept our lot. On the other hand, perhaps we really are more often than not in the slower lane. If so, there may be a reason. Let me explain using an example.

How big is the smallest fish in the pond? You catch sixty fish, all of which are more than six inches long. Does this evidence add support to a hypothesis that all the fish in the pond are longer than six inches? Only if your net is able to catch fish smaller than six inches. What if the holes in the net allow smaller fish to pass through? This may be described as a selection effect, or an observation bias.

Apply the same principle to your place in the line or the lane.

To understand the effect in this context we need to ask, ‘For a randomly selected person, are the people or vehicles in the next line or lane actually moving faster?’

Well, one obvious reason why we might be in a slower lane is that there are more vehicles in it than in the neighbouring lane. This means that more of our time is spent in the slower lane. In particular, cars travelling at greater speeds are normally more spread out than slower cars, so that over a given stretch of road there are likely to be more cars in the slower lane, which means that more of the average driver’s time is spent in the slower lane or lanes. This is known as an observer selection effect, a key idea in the theory of which is that observers should reason as if they were a random sample from the set of all observers. In other words, when making observations of the speed of cars in the next lane, or the progress of the neighbouring line to the cashier, it is important to consider yourself as a random observer, and think about the implications of this for your observation.

To put it another way, if you are in a line and think of your present observation as a random sample from all the observations made by all relevant observers, then the probability is that your observation will be made from the perspective that most drivers have, which is the viewpoint of the slower moving queue, as that is where more observers are likely to be. It is because most observers are in the slower lane, therefore, that a typical or randomly selected driver will not only seem to be in the slower lane but actually will be in the slower lane. Let’s put it this way. If there are 20 in the slower lane and 10 in the equivalent section of the fast lane, there is a 2/3 chance that you are in the slow lane.

So the next time you think that the other lane is faster, be aware that it very probably is.



Nick Bostrom, Anthropic Bias: Observation Selection Effects in Science and Philosophy, Chapter 1, Routledge, 2002.

The God’s Dice Problem

Imagine a world created by an external Being based on the toss of a fair coin, or a roll of the dice. It’s a thought experiment sometimes called the ‘God’s Dice Problem.’ In the simplest version, Heads creates a version of the world in which one blue-bearded person is created. Let’s call that World A. Tails creates a version of the world in which a blue-bearded and a black-bearded person are created. Let’s call that World B.

You wake up in the dark in one of these worlds, but you don’t know which, and you can’t see what colour your beard is, though you do know the rule that created the world. What likelihood do you now assign to the hypothesis that the coin landed Tails and  you have been born into world B?

This depends on what fundamental assumption you make about existence itself. One way of approaching this is to adopt what has been called the ‘Self Sampling Assumption’. This states that “you should reason as if you are randomly selected from everyone who exists in your  reference class.” What do we mean by reference class? As an example, we can reasonably take it that in terms of our shared common existence a blue-bearded person and a black-bearded person belong in the same reference class, whereas a blue-bearded person and a black-coloured ball do not. Looked at another way, we need to ask, “What do you mean by you?” Before the lights came on, you don’t know what colour your beard is. It could be blue or it could be black, unless you mean that it’s part of the “essence” of who you are to have a blue-coloured beard. In other words, that there is no possible state of the world in which you had a black beard but otherwise would have been you.

Using this assumption, we see ourselves simply as a randomly selected bearded person among the reference class of blue and black bearded people. The coin could have landed Heads in which case we are in World A or if could have landed Tails in which case we are in World B. There is an equal chance that the coin landed Heads or Tails, so we should assign a credence of 1/2 to being in World A and similarly for World B. In World B the probability is 1/2 that we have a blue beard and 1/2 that we have a black beard.

The light is now turned on and we see that we are sporting a blue beard. What is the probability now that the coin landed Tails and we are in World B? Well, the probability we would sport a blue beard conditional on living in World A is 1, i.e. 100%. This is because we know that the one person who lives in World A has a blue beard. The conditional probability of having a blue beard in World B, in the other hand, is 1/2. The other inhabitant has a black beard. So there is twice the chance that we live on World A as World B conditional on finding out that we have a blue beard, i.e. a 2/3 chance the coin landed Heads and we live in World A.

Let’s now say you make a different assumption about existence itself. Your worldview in this alternative scenario is based in what has been termed the ‘Self-Indication Assumption.’ It can he stated like this. “Given the fact that you exist, you should (other things equal) favour a hypothesis according to which many observers exist over a hypothesis according to which few observers exist.”

According to this assumption, you note that there is one hypothesis (the World B hypothesis) according to which there are two observers (one blue-bearded and one black-bearded) and another hypothesis (the World A hypothesis) in which there is only one observer (who is blue-bearded). Since there are twice as many observers in World B as World A, then according to the Self-Indication Assumption, it is twice as likely (a 2/3 chance) that you live in World B as World A (a 1/3 chance). This is your best guess while the lights are out. When the lights are turned on, you find out that you have a blue beard. The new conditional probability you attach to living in World B is 1/2, as there is an equal chance that as a blue-bearded person you live in World B as World A.

So, under the Self-Sampling Assumption, your initial credence that you lived in World B was 1/2, which fell to 1/3 when you found out you had a blue beard. Under the Self-Indication Assumption, on the other hand, your initial credence of living on World B was 2/3, which fell to 1/2 when the lights came on.

So which is right and what are the wider implications?

Let us first consider the impact of changing the reference class of the ‘companion’ on World B. Instead of this being another bearded person, it is a black ball. In this case, what is the probability you should attribute to living on World B given the Self-Sampling Assumption? While the lights are out, you consider that there is a probability of 1/2 that the ball landed Tails, so the probability that you live on World B is 1/2.

When the lights are turned on, no new relevant information is added as you knew you were blue-bearded. There is one blue-bearded person on World A, therefore, and one on World B. So the chance that you are in World B is unchanged. It is 1/2.

Given the Self-Indication Assumption, the credence you should assign to being on World B given that your companion is a ball instead of a bearded person is now 1/2 as the number of relevant observers inhabiting World B is now one, the same as on World B. When the lights come on, you learn nothing new, and the chance the coin landed Tails and you are on World B stays unchanged at 1/2.

Unlike with the Self-Indicating Assumption (SIA), the Self-Sampling Assumption is dependent on the choice of reference class. The SIA is not dependent on the choice of reference class, as long as the reference class is large enough to contain all subjectively indistinguishable observers. If the reference class is large, SIA will make it more likely that the hypothesis is true but this is compensated by the much reduced probability that the observer will be any particular observer in the larger reference class.The choice of underlying assumption has implications elsewhere, most famously in regard to the Sleeping Beauty Problem, which I have addressed from a different angle in a separate blog.

In that problem, Sleeping Beauty is woken either once (on Monday) if a coin lands Heads or twice (on Monday and Tuesday) if it lands Tails. She knows these rules. Either way, she will only be aware of the immediate awakening (whether she is woken once or twice). The question is how Sleeping Beauty should answer if she is asked how likely it is the coin landed Tails when she is woken and asked.

If she adopts the Self-Sampling Assumption, she will give an answer of 1/2. The coin will have landed Tails with a probability of 1/2 and there is no other observer than her. Only if she is told that this is her second awakening will she change her credence that it landed Tails to 1 and that it landed Heads to 1.

If she adopts the Self-Indication Assumption, she has a different worldview in which there are three observation points. In fact, there are two prevalent propositions which have been called the Self-Indication Assumption, the first of which is stated above, i.e. “Given the fact that you exist, you should (other things equal) favour a hypothesis according to which many observers exist over a hypothesis according to which few observers exist.” The other can be stated thus: “All other things equal, an observer should reason as if they are randomly selected from the set of all possible observers.”

According to this assumption, stated either way, there is one hypothesis (the Heads hypothesis) according to which there is one observer opportunity (Monday awakening) and another hypothesis (the Tails hypothesis) in which there are two observer opportunities (the Monday awakening and the Tuesday awakening). Since there are twice as many observation opportunities in the Tails hypothesis according to the Self-Indication Assumption, it is twice as likely (a 2/3 chance) that the coin landed Tails as that it landed Heads (a 1/3 chance).

Looked at another way, if there is a coin toss that on heads will create one observer, while on tails it will create two, then we have three possible observers (observer on heads, first observer on tails and second observer on tails, each existing with equal probability, so the Self-Indication Assumption assigns a probability of 1/3 to each. Alternatively, this could be interpreted as saying there are two possible observers (first observer on either heads or tails, second observer on tails), the first existing with probability one and the second existing with probability 1/2. So the Self-Indication Assumption assigns a  2/3 probability to being the first observer and 1/3 to being the second observer, which is the same as before. Whichever way we prefer to look at it, the Self-Indication Assumption gives a 1/3 probability of heads and 2/3 probability of tails in the Sleeping Beauty Problem.

Depending on which Assumption we adopt, however, very different implications for our wider view of the world obtain.

One of the most well-known of these is the so-called Doomsday Argument, which I have explored elsewhere in my blog, ‘Is Humanity Doomed?’

The argument that I’ve been talking about goes by the name of The Doomsday Argument. It goes like this. Imagine for the sake of simplicity that there are only two possibilities: Extinction Soon and Extinction Late. In one, the human race goes extinct very soon, whereas in the other it spreads and multiplies through the Milky Way. In each case, we can write down the number of people who will ever have existed. Suppose that 100 billion people will have existed in the Extinction Soon case, as opposed to a 100 trillion people in the Extinction Late case. So now, say that we’re at the point in history where 100 billion people have lived. If we’re in the Extinction Late situation, then the great majority of the people who will ever have lived will be born after us. We’re in the very special position of being in the first 100 billion humans. Conditional in that, the probability of being in the Extinction Late case is overwhelmingly greater than of being in the Extinction Soon case. Using Bayes’ Theorem, which I explore in a separate blog, to perform the calculation, we can conclude for example that if we view the two cases as equally likely, then after applying the Doomsday reasoning, we’re almost certainly in the Extinction Soon case. For conditioned on being in the Extinction Late case (100 trillion people), we almost certainly would not be in the special position of being amongst the first 100 billion people.

We can look at it another way. If we view the entire history of the human race from a timeless perspective, then all else being equal we should be somewhere in the middle of that history. That is, the number of people who live after us should not be too much different from the number of people who lived before us. If the population is increasing exponentially, it seems to imply that humanity has a relatively short time left. Of course, you may have special information that indicates that you aren’t likely to be in the middle, but that would simply mitigate the problem, not remove it.

A modern form of the Doomsday holds that the resolution of the Doomsday Argument depends on how you resolve the Blue Beard or Sleeping Beauty Problems. If you give ⅓ as your answer to the puzzle, that corresponds to the Self-Sampling Assumption (SSA). If you make that assumption about how to apply Bayes’s Theorem, then it seems very difficult  to escape the early Doom conclusion.

If you want to challenge that conclusion, then you can use the Self-Indication Assumption (SIA). That assumption says that you are more likely to exist in a world with more beings than one with less beings. You would say in the Doomsday Argument that if the “real” case is the Extinction Late case, then while it’s true that you are much less likely to be one of the first 100 billion people, it’s also true that because there are so many more people, you’re much more likely to exist in the first place. If you make both assumptions, then they cancel each other out, taking you back to your prior assessment of the probabilities of Extinction Soon and Extinction Late.

On this view, the fate of humanity, in probabilistic terms, depends on which Assumption we adopt.

One problem that has been flagged with the SSA assumption is that what applies to the first million people out of a possible trillion people applies just as well in principle to the first two people out of billions. This is known as the Adam and Eve problem. According to the SSA, the chance (without an effectively certain prior knowledge) that they are the first two people as opposed to two out of countless billions which (it is assumed) would be produced by their offspring is so vanishingly small that they could act and cause ourcomes as if it is impossible that they are the potential ancestors of billions. For example, they decide they will have a child unless Eve draws the Ace of Spades from a deck of cards. According to the logic of this thought experiment, then that that makes the card definitely the Ace of Spades. If it wasn’t, they would be two out of billions of people which is such a small probability as to be effectively precluded by the SSA, i.e. that you can reason as if you are simply randomly selected from all humans. In this way, their world would be one in which all sorts of strange coincidences, precognition, psychokinesis and backward causation could occur. There are defences thst have been proposed to save the SSA and the debate continues around these.

So what about the Self-Indication Assumption. Here the Presumptuous Philosopher Problem has been well flagged. Here it is. Imagine that scientists have narrowed the possibilities for a final theory of physics down to two equally likely possibilities. The main difference between them is that Theory 1 predicts that the universe contains a billion times more observers than Theory 2 does. There is a plan to build a state of the art particle accelerator to arbitrate between the two theories. Now, philosophers using the SIA come along and say that Theory 2 is almost certainly correct to within a billion-to-one confidence, since conditional on Theory 2 being correct, we’re a billion times more likely to exist in the first place. So we can save a billion pounds on building the particular accelerator. Indeed, even if we did, and it produced evidence that was a million times more consistent with Theory 1,  we should still to with the view of  the philosophers who are sticking to their assertion that Theory 2 is the correct one. Indeed, we should award the Nobel Prize in Physics to them for their “discovery.”

So we are left with a choice between the Self-Sampling Assumption which leads to the Doomsday Argument, and the Self-Indication Assumption which leads to Presumptuous Philosophers. And we need to choose a side.

For reasons I explore in a separate blog, I identify the answer to the Sleeping Beauty Problem as 1/3, which is consistent with an answer of 2/3 for the Blue Beard Problem. This is all consistent with the Self-Indication Assumption, but not the Self-Sampling Assumption.

The debate continues.



We can address this problem using Bayes’ Theorem.

We are seeking to calculate the probability, P(H I Blue) that the coin landed heads, given that you have a blue beard. In the problem as posed, there are two people, and you are not more likely, a priori, to be either the blue-bearded or the black-bearded person. Now the probability, with a fair coin, of throwing heads as opposed to tails is 1/2. Adopting the Self-Sampling Assumption, we sample a person within their world at random.

First, what is the probability that you have a blue beard, P(Blue).

This is given by: P (Blue I Heads). P (Heads) + P (Blue I Tails) . P (Tails) = 1 . 1/2 + 1/2 . 1/2 = 3/4

Since if the coin lands Heads, P (Blue) = 1; P (Heads) = 1/2.

If the coin lands Tails, P (Blue) = 1/2; P (Tails) = 1/2.

By Bayes’ Theorem, P (Tails I Blue) = P (Blue I Tails) . P (Tails) / P (Blue) =  1/2 . 1/2 / (3/4) = 1/3

So the probability that you have a blue beard if the coin landed tails (World B) is 1/3.

What assumption needs to be made so that the probability of having a blue beard in World 2 is 1/2.

You could assume that whenever you exist, you have a blue beard. In that case, P (Blue I Heads) = 1. P (Blue B I Tails) = 1.

Now, P (Blue) =   P (Blue I Heads). P (Heads) + P (Blue I Tails) . P (Tails) = 1 . 1/2 + 1 . 1/2 = 1

Now, by Bayes’ Theorem, P (Tails I Blue) = P (Blue I Tails) . P (Tails) / P (Blue) =  1 . 1/2 / 1 = 1/2

Is there a way, however, to do so without a prior commitment about beard colour?

One approach is to note that there are twice as many people in the Tails world as in the Heads world in the first place. This is known as the Self-Indication Assumption, So you could argue that you are a priori twice as likely to exist in a world with twice as many people. In a world with more people, you are simply more likely to be picked at all. Put another way, your own existence is a piece of evidence that you should condition upon.

Now, P (Blue) = P(Blue I Heads world) . P (Heads world) + P (Blue I Tails world) . P (Tails world) = 1. 1/3 + 1/2 . 2/3 = 1/3 + 1/3 = 2/3

So using Bayes’ Theorem, P (Tails world I Blue) = P (Blue I Tails world) . P (Tails world) / P (Blue) = 1/2 . (2/3) / 2/3 = 1/2.




Is Humanity Doomed?

Can we demonstrate, purely from the way that probability works, that the human race is likely to go extinct in the relatively foreseeable future, regardless of what humanity might do to try and prevent it? Yes, according to the so-called Doomsday argument, and this argument derived from basic probability theory has never been refuted.

Here’s how the argument goes. Let’s say you want to estimate how many tanks the enemy has to deploy against you, and you know that the tanks have been manufactured with serial numbers starting at 1 and ascending from there. Now let’s say you identify the serial numbers on five random tanks and they all have serial numbers under 10. Even an intuitive understanding of the workings of probability would lead you to conclude that the number of tanks possessed by the enemy is pretty small. On the other hand, if they are identified as serial numbers 2524, 7866, 5285, 3609 and 8,009, you are unlikely to be way out if you estimate the enemy has more than 10,000 of them.

Let’s say that you only have one serial number to work with, and that it shows the number 18. On the basis of just this information, you would do well to estimate that the total number of enemy tanks is more likely to be 36 than 360, and even more likely than the total tank account being 36,000.

Similarly, imagine that you are made aware that a selected box of numbered balls contains either ten balls (numbered from 1 to 10) or ten thousand balls (numbered 1 to 10,000), and you are asked to guess which. Before you do so, one is drawn for you. It reveals the number seven. That would be a 1 in 10 chance if the box contains ten balls, but a 1 in 10,000 chance if it contained 10,000 balls. You would he right on the basis of this information to conclude that the box very probably contains ten balls, not ten thousand.

Let’s look at the same argument another way. As a thought experiment, imagine a world made up of 100 pods. In each pod, there is one human. Ninety of the pods are painted black on the exterior and the other ten are white. This is known information, available to you and all the other humans. You are one of these people and you are asked to estimate the likelihood that you are inside a black pod. A reasonable way to go about this is to adopt what philosophers call the Self-Sampling Assumption. It goes like this. “All other things equal, an observer should reason as if they are randomly selected from the set of all existing observers in their reference class (in this case, humans in pods).”  Since nine in ten of all people are in the black pods, and since you don’t have any other relevant information, it seems clear that you should estimate the probability that you are in a black pod as 10 per cent. A good way of testing the good sense of this reasoning is to ask what would happen if everyone bet this way. Well, 90 per cent of the wagers would win and ten per cent would lose. In contrast, assume that the people ignore the self-sampling assumption and adopt the assumption that (since they don’t know which) they are equally likely to be in a black as a white pod. In this case, they might as well toss a coin and bet on the outcome. If they do so, only 50 per cent (as opposed to 90 per cent) will win the bet. As demonstrated, it seems clearly rational here to accept the self-sampling assumption.

Now let’s make the pod example more similar to the tank and ‘balls in the box’ cases. We keep the hundred pods but this time they are distinguished by being numbered from 1 to 100, painted on the exterior of the pods. Then a fair coin is tossed by an external Being. If the coin lands on heads, one person is created in each of the hundred pods. If the coin lands tails, then people are only created in pods 1 to 10. Now, you are in one of the pods and must estimate whether there are ten or a hundred people created in total. Since the number was determined by the toss of a fair coin, and since you don’t know the outcome of the coin toss, and have no access to any other relevant information, it could be argued that you should believe there is a probability of 1/2 that it landed on heads and thus a probability of 1/2 that there are a hundred people. You can, however, use the self-sampling assumption to assess the conditional probability of a number between 1 and 10 being painted on your cubicle given how the coin landed. For example, conditional upon it landing on heads, the probability that the number on your pod is between 1 and 10 is 1/10, since one person in ten  will find themselves in these pods. Conditional on tails, the probability that you are in number 1 through 10 is 1, since everybody created (ten of them) must be in one those pods.

Suppose now that you open the door and discover that you are in pod number 6. Again you are asked, how did the coin land? Now you deduce that the probability is somewhat greater than 1/2 that it landed on tails.

The final step is to transpose this reasoning to our actual situation here on Earth. Let’s assume for simplicity there are just two possibilities. Early extinction: the human race goes extinct in the next century and the total number of humans that will have existed is, say, 200 billion. Late extinction: the human race survives the next century, spreads through the Milky Way and the total number of humans is 200,000 billion. Corresponding to the prior probability of the coin landing heads or tails, we now have some prior probability of early or late extinction, based on current existential threats such as nuclear annihilation. Finally, corresponding to finding you are in pod number 6 we have the fact that you find that your birth rank is about 108 billion (that’s approximately how many humans have lived before you). Just as finding you are in pod 6 increased the probability of the coin having landed tails, so finding you are human number 108 billion (about half way to 200 billion) gives you much more reason, whatever the prior probability of extinction based on other factors, to think that Early Extinction (200 billion humans) is much more probable than Late Extinction (200,000 billion humans).

Essentially, then, the Doomsday Argument transfers the logic of the laws of probability to the survival of the human race. To date there have been about 110 billion humans on earth, about 7 per cent of whom are alive today. At least these are indicative estimates. On the same basis as the tank and the balls in the box and the pods problems, a reasonable estimate, other things equal, is that we are about half way along the timeline. Projecting demographic trends forward, this makes our best estimate of the termination of the timeline of the human race as we know it to be within this millennium.

That is the Doomsday argument. Refute it if you can. Many have tried, but none have yet succeeded, although some attempts have been made. The best of these relates to the reference class of what being human is. That’s a question perhaps best saved for, and savoured on, another day.

This has been explored in my blog, ‘The God’s Dice Problem.’



Nick Bostrom, Anthropic Bias: Observation Selection Effects in Science and Philosophy. Routledge, 2002




Forecasting Elections and Other Things – Where did it all go wrong?

There are a number of ways that have been used over the years to forecast the outcome of elections. These include betting markets, opinion polls, expert analysis, crystal balls, tea leaves, Tarot cards and astrology! Let’s start by looking at the historical performance of betting markets in forecasting elections.

The recorded history of election betting markets can be traced as far back as 1868 for US presidential elections and 1503 for papal conclaves. In both years, the betting favourite won (Ulysses S. Grant, 1868 elected President; 1503 Cardinal Francesco Piccolomini elected Pope Pius III). From 1868 up to 2016, no clear favourite for the White House had lost the presidential election other than in 1948, when longshot Harry Truman defeated his Republican rival, Thomas Dewey. The record of the betting markets in predicting the outcome of papal conclaves since 1503 is less complete, however, and a little more chequered. The potential of the betting markets and prediction markets (markets created to provide forecasts) to assimilate collective knowledge and wisdom has increased in recent years as the volume of money wagered and number of market participants has soared. Betting exchanges (where people offer and take bets directly, person-to-person) now see tens of millions of pounds trading on a single election. An argument made for the value of betting markets in predicting the probable outcome of elections is that the collective wisdom of many people is greater than that of the few. We might also expect that those who know more, and are better able to process the available information, would on average tend to bet more. Moreover, the lower the transactions costs of betting and the lower the cost of accessing and processing information, the more efficient we might expect betting markets to become in translating information into forecasts. In fact, the betting public have not paid tax on their bets in the UK since 2001, and margins have fallen significantly since the advent of person-to-person betting exchanges which cut out the middleman bookmaker. Information costs have also plummeted as we have witnessed the development of the Internet and search engines. Modern betting markets might be expected for these reasons to provide better forecasts than ever.

There is indeed plenty of solid anecdotal evidence about the accuracy of betting markets, especially compared to the opinion polls. The 1985 by-election for the vacant parliamentary seat of Brecon and Radnor in Wales offers a classic example. Mori, the polling organisation, had the Labour candidate on the eve of poll leading by a massive 18%, while Ladbrokes, the bookmaker, simultaneously quoted the Liberal Alliance candidate as odds-on 4/7 favourite. When the result was declared, there were  two winners – the Liberal candidate and the bookmaker.

In the 2000 US presidential election, IG Index, the spread betting company, offered a spread on the day of 265 to 275 electoral college votes about both Bush and Gore. Meanwhile, Rasmussen, the polling company, had Bush leading Gore by 9% in the popular vote. In the event, the electoral college (courtesy of a controversial US Supreme Court judgment) split 271 to 266 in favour of Bush, both within the quoted spreads. Gore also won the popular vote, putting the pollster out by almost 10 percentage points.

In the 2004 US presidential election, the polls were mixed. Fox had Kerry up by 2 per cent, for example, while GW/Battleground had Bush up 4. There was no consensus nationally, much less state by state. Meanwhile, the favourite on the Intrade prediction market for each state won every single one of those states.

In 2005, I was asked on to a BBC World Service live radio debate in the immediate run-up to the UK general election, where I swapped forecasts with Sir Robert Worcester, Head of the Mori polling organisation. I predicted a Labour majority of about 60, as I had done a few days earlier in the Economist magazine and on BBC Radio 4 Today, based on the betting at the time. Mori had Labour on a projected majority of over 100 based on their polling. The majority was 66.

In the 2008 US presidential election, the Betfair exchange market’s state-by-state predictions called 49 out of 50 states correctly. Only Indiana was called wrong.  While the betting markets always had Obama as firm favourite, the polls had shown different candidates winning at different times in the run-up to the election. On polling day, Obama was as short as 1 to 20 to win on the betting exchanges, but some polls still had it well within the margin of error. He won by 7.2%. By 365 Electoral Votes to 173.

In the 2012 US presidential election, the RealClearPolitics average of national polls on election day showed Obama and Romney essentially tied. Gallup and Rasmussen had Romney leading, others had Obama narrowly ahead. To be precise, the average of all polls had Obama up 0.7%. Obama won by 4% and by 332 electoral votes to 206.

In the week running up to polling day in the 2014 Scottish referendum, polls had No to independence with leads of between 1% (Panelbase and TNS BMRB) to, at the very top end, Survation (7%), and YES to independence with leads of between 2% (YouGov) and 7% (ICM/Sunday Telegraph). The final polls had No to independence between 2% and 5% ahead. The actual result was No by 10.6%. The result had been reflected in the betting markets throughout, with No to independence always a short odds-on favourite. To give an example of the general bookmaker prices, one client of William Hill staked a total of £900,000 to win £193,000 which works out at an average price of about 1 to 5.

In the 2015 Irish referendum on same-sex marriage, the final polls broke down as a vote share of 70% for Yes to 30% for No. In the spread betting markets, the vote share was being quoted with mid-points of 60% Yes  and 40% No. The final result was 62% Yes, 38% No, almost exactly in line with the betting markets.

In the Israeli election of 2015, the final polls showed Netanyahu’s Likud party trailing the main opposition party by 4% (Cannel 2, Channel 10, Jerusalem Post, by 2% (Teleseker/Walla) and by 3% (Channel 1). Meanwhile, Israel’s Channel 2 television news on election day featured the betting odds on the online prediction market service, Predictwise. PredictWise had Netanyahu as 80% favourite. The next day, Netanyahu declared that he won “against the odds.” In fact, he did not. He won against the polls.

In the 2015 UK general election, the polling averages throughout the campaign had the Conservatives and Labour neck and neck, within a percentage point or so of each other. Meanwhile, the betting odds always had Tory most seats at very short odds-on. To compare at a point in time, three days before polling, the polling average had it tied. Simultaneously, Conservatives most seats was trading on the markets as short as 1 to 6.

If this anecdotal evidence is correct, it is natural to ask why the betting markets outperform the opinion polls in terms of forecasting accuracy. One obvious reason is that there is an asymmetry. People who bet in significant sums on an election outcome will usually have access to the polling evidence, while opinion polls do not take account of information contained in the betting odds (though the opinions expressed might, if voters are influenced by the betting odds). Sophisticated political bettors also take account of how good different pollsters are, what tends to happen to those who are undecided when they actually vote, differential turnout of voters, what might drive the agenda between the dates of the polling surveys and election day itself, and so on. All of this can in principle be captured in the markets.

Pollsters, except perhaps with their final polls (and sometimes even then) tend to claim that they are not producing a forecast, but a snapshot of opinion. This is the classic ‘snapshot defence’ wheeled out by the pollsters when things go badly wrong. In contrast, the betting markets are generating odds about the final result, so can’t use this questionable defence. In any case, polls are used by those trading the markets to improve their forecasts, so they are (or should be) a valuable input. But they are only one input. Those betting in the markets have access to much other information as well including, for example, informed political analysis, statistical modelling, focus groups and on-the-ground information including local canvass returns.

Does Big Data back up the anecdotal evidence? To test the reliability of the anecdotal evidence pointing to the superior forecasting performance of the betting markets over the polls, we collected vast data sets for a paper published in the Journal of Forecasting (‘Forecasting Elections’, 2016, by Vaughan Williams and Reade) of every matched contract placed on two leading betting exchanges and from a dedicated prediction market for US elections, since 2000. This was collected over 900 days before the 2008 election alone, and to indicate the size, a single data set was made up of 411,858 observations from one exchange alone for that year. Data was derived notably from presidential elections at national and state level, Senate elections, House elections, and elections for Governor and Mayor. Democrat and Republican selection primaries were also included. Information was collected on the polling company, the length of time over which the poll was conducted, and the type of poll. The betting was compared over the entire period with the opinion polls published over that period, and also with expert opinion and a statistical model. In this paper, as well as in Vaughan Williams and Reade – ‘Polls and Probabilities: Prediction Markets and Opinion Polls’, we specifically assessed opinion polls, prediction and betting markets, expert opinion and statistical modelling over this vast data set of elections in order to determine which performed better in term of forecasting outcomes. We considered accuracy, bias and precision over different time horizons before an election.

A very simple measure of accuracy is the percentage of correct forecasts, i.e. how often a forecast correctly predicts the election outcome. We also identified the precision of the forecasts, which relates to the spread of the forecasts. A related but distinctly different concept to accuracy is unbiasedness. An unbiased probability forecast is also, on average, equal to the probability that the candidate wins the election. Forecasts that are accurate can also be biased, provided the bias is in the correct direction. If polls are consistently upward biased for candidates that eventually win, then despite being biased they will be vey accurate in predicting the outcome, whereas polls that are consistently downward biased for candidates that eventually win will be very inaccurate as well as biased.

We considered accuracy, precision and bias over different time horizons before an election. We found that the betting/prediction market forecasts provided the most accurate and precise forecasts and were similar in terms of bias to opinion polls. We found that betting/prediction market forecasts also tended to improve as the elections approached, while we found evidence of opinion polls tending to perform worse.

In summary, we concluded that betting and prediction markets provide the most accurate and precise forecasts. We noted that forecast horizon matters: whereas betting/prediction market forecasts tend to improve nearer an election, opinion polls tend to perform worse, while expert opinion performs consistently throughout, though not as well as betting markets. There was also a systematic small bias against favourites, so that most likely outcome is actually usually a little more likely than suggested in the odds. Finally, if the polls and betting markets say different things, it is normally advisable to look to the betting markets.

So let’s turn again to why might we expect the betting markets to beat the polls. Most fundamentally, opinion polls, like all market research, provide a valuable source of information, but they are only one source of information, and some polls have historically been more accurate than others. Traders in the markets consider such things as what tends to happen to ‘undecideds’. Is there a late swing to incumbents or ‘status quo’? What is the likely impact of late endorsements by press or potential late announcements? Late on-the-day ‘tabloid press effect’, esp. on emotions. Influences undecideds, drives turnout to chosen editorial line. What is the likely turnout? What is the impact of differential turnout. Finally, sophisticated bettors take account of the relative accuracy of different polls and look behind the headline results to the detailed breakdown and the methodology used the poll. Betting markets should aggregate all the available information and analysis.

Moreover, people who know the most, and are best able to process the information, will tend to bet the most, but people who know only a little tend to bet only a little. The more money involved, or the greater the incentives, the more efficient and accurate will the market tend to be. It really is in this sense a case of “follow the money”.

Sometimes it is even possible to follow the money all the way to the future. To capture tomorrow’s news today. A classic example is the ‘Will Saddam Hussein be captured or neutralised by the end of the month’ Intrade exchange market? Early on 13 December, 2003, the market moved from 20 (per cent chance) to 100. The capture was announced early on 14 December, 2003, and officially took place at 20:30 hours Iraqi time, several hours after the Intrade market moved to 100. I call these, with due deference to Star Trek,  ‘Warp speed markets’.

But we need to be cautious. With rare exceptions, betting markets don’t tell us what the future will be. They tell us at best what the probable future will be. They are, in general, not a crystal ball. And we need to be very aware of this. Even so, the overwhelming consensus of evidence prior to the 2015 UK General Election pointed to the success of political betting markets in predicting the outcome of elections.

And then the tide turned.

The 2016 EU referendum in the UK (Brexit), the 2016 US presidential election (Trump) and the 2017 UK General Election (No overall majority) produced results that were a shock to the great majority of pollsters as well as to the betting markets. The turning of the tide could be traced, however, to the Conservative overall majority in 2015, which came as a shock to the markets and pollsters alike. After broadly 150 years of unparalleled success for the betting markets, questions were being asked. The polls were equally unsuccessful, as were most expert analysts and statistical models.

The Meltdown could be summarised in two short words. Brexit and Trump. Both broadly unforeseen by the pollsters, pundits, political scientists or prediction markets. But two big events in need of a big explanation. So where did it all go wrong?  There are various theories to explain why the markets broke down in these recent big votes.

Theory 1: The simple laws of probability. An 80% favourite can be expected to lose one time in five, if the odds are correct. In the long run, according to this explanation, things should balance out. It’s like there are five parallel universes. The UK on four of the parallel universes votes to Remain in the EU, but not in the fifth.Hillary Clinton wins in four of the parallel universes but not in the fifth. In other words, it’s just chance, no more strange than a racehorse starting at 4/1 winning the race. But for that to be a convincing explanation, it would need to assume that 2015 election, Brexit, Trump and 2017 election were totally correlated. Even if there is some correlation of outcome, the markets were aware of each of the predictive failures in the previous votes and still favoured the losing outcome by a factor of 4 or 5 to 1. That means we can multiply the probabilities. 1/5×1/5×1/5×1/5 = 1/625.   1/6×1/6×1/6×1/6 = 1/1296. Either way, its starting to look unlikely.

Theory 2: A second theory to explain recent surprise results is that something fundamental has changed in the way that information contained in political betting markets is perceived and processed. One interpretation is that the hitherto widespread success of the betting markets in forecasting election outcomes, and the publicity that was given to this, turned them into an accepted measure of the state of a race, creating a perception which was difficult to shift in response to new information. This is a form of ‘anchoring’. To this extent, market prices to some extent led opinion rather than simply reflecting it. From this perspective, the prices in the markets became a yardstick of the true probabilities and thus somewhat inflexible in response to the weight of new information.This leads to the herding hypothesis. Because the prediction markets had by 2015 become so firmly entrenched in conventional wisdom as an accurate forecasting tool, people herded around the forecasts, propelling the implied probabilities of existing forecasts upwards. So a 55% probability of victory, for example, became transformed into something much higher. In consequence, a prediction market implied probability of 70%, say, might be properly adjusted to a true probability of, say, 55%. In principle, it is possible to de-bias (or de-herd) each prediction market probability into a more accurate adjusted probability. We also need to look at the idea of self-reinforcing feedback loops. City traders look to the betting exchanges and the fixed-odds and spread bookmakers’ odds for evidence of what is the true state of play in each race. That influences the futures markets, which in turn influences perceptions among bettors. A sort of prediction market loop, in which expectations become self-reinforcing. This is a form of ‘groupthink’ in which those trading the futures and prediction markets are taking the position they are simply because others are doing so. This is further reinforced by the key arbitrating divide which more than anything acts as a distinguishing marker between Brexit supporters and Remain supporters, between Trump voters and Hillary Clinton voters – educational level. More than any other factor, it is the ‘University education’ marker that identifies the Remain voter, the Clinton voter. Also, the vast majority of City traders as well as betting exchange traders are University-educated, and tend to mix with similar, which may have reinforced the perception that Trump and Brexit were losing tickets. Indeed, more than ever before, as the volume of information increases, and people’s ability to sort between and navigate and share these information sources increases, there is a growing disjoint between the information being seen and processed by different population silos. This is making it increasingly difficult for those inhabiting these different information universes to make any sense of what is driving the preferences of those in alternative information universes, and therefore engaging with them and forming accurate expectations of their likely voting behaviour and likelihood of voting. The divide is increasingly linked to age and educational profile, reducing the diversity of opinion which is conventionally critical in driving the crowd wisdom aspect of prediction markets. It also helps explain the broad cluelessness of the political and political commentating classes in understanding and forecasting these event outcomes. Of course, the pollsters, pundits, political scientists and politicians were broadly speaking just as clueless. So why?

Theory 3: Conventional patterns of voting broke down in 2015 and subsequently, primarily due to unprecedented differential voter turnout patterns across key demographics, which were not correctly modelled in most of the polling and which were missed by political pundits, political scientists, politicians and those trading the betting markets. In particular, there was unprecedented turnout in favour of Brexit and Trump by demographics that usually voted in relatively low numbers, notably the more educationally disadvantaged sections of society. And this may be linked to a breakdown of the conventional political wisdom. This wisdom holds that campaigns don’t matter, that swings of support between parties are broadly similar across the country, that elections can only be won from the centre, and that the so-called ‘Overton window’ must be observed. This idea, conceived by political scientist Joseph Overton, is that for any political issue there’s a range of socially acceptable and broadly tolerated positions (the ‘Overton window’) that’s narrower than the range of possible positions. It’s an idea which in a Brexit/Trump age seems to have gone very much out of the window.

Theory 4: Manipulation. Robin Hanson and Ryan Oprea co-authored a paper titled, ‘A Manipulator Can Aid Prediction Market Accuracy‘, in a special issue of Economica in 2009 which I co-edited. Manipulation can actually improve prediction markets, they argue, for the simple reason that manipulation offers informed investors a proverbial ‘free lunch.’ In a stock market, a manipulator sells and buys based on reasons other than expectations and so offers other investors a greater than normal return. The more manipulation, therefore, the greater the expected profit from betting. For this reason, investors should soon move to take advantage of any price discrepancies thus created within and between markets, as well as to take advantage of any perceived mispricing relative to fundamentals. Thus the expected value of the trading is a loss for the manipulator and a profit for the investors who exploit the mispricing. Manipulation creates liquidity, which draws in informed investors and provides the incentive to acquire and process further information, which makes the market ever more efficient.

Theory 5: Fake News. There are other theories, which may be linked to the demographic turnout theory, including notably the impact of misinformation (fake news stories), of hacked campaign email accounts, and direct manipulation of social media accounts. In fact, we know when it all started to go wrong. That was 7th May, 2015, when the Conservatives won an unforeseen overall majority in the General Election. That result led to Brexit. That in turn arguably helped propel Trump to power. And it led to the shock 2017 UK election result. Common to all these unexpected outcomes is the existence of a post-truth misinformation age of ‘fake news’ and the potential to exploit our exposure to social media platforms by those with the money, power and motivation to do so. The weaponisation of fake news might explain the breakdown in the forecasting power of the betting markets and pollsters, commencing in 2015, as well as the breakdown of the traditional forecasting methodologies in predicting Brexit and Trump. This has in large part been driven by the power of fake news distribution and the targeting of such via social media platforms, to alter traditional demographic turnout patterns. This is by boosting turnout among certain demographics and suppressing it among others. The weaponisation of fake news by the tabloid press is of course nothing new but it has become increasingly virulent and sophisticated and its online presence amplifies its reach and influence. The weaponisation of fake news by the tabloid press can also help explain on-the-day shifts in turnout patterns.

What it does not explain is some very odd happenings in recent times. Besides Brexit and Trump, Leicester City became 5.000/1 winners of the English Premier League. The makers and cast of La La Land had accepted the Oscar for Best Picture before it was snatched away in front of billions to be handed to Moonlight. This only echoed the exact same thing happening to Miss Venezuela when her Miss Universe crown was snatched away after her ceremonial walk to be awarded to Miss Philippines.  And did the Atlanta Falcons really lose the SuperBowl after building an unassailable lead? And did the BBC Sports Personality of the Year Award go to someone whose chance of winning was so small he didn’t even turn up to the ceremony, while the 1/10 favourite was beaten by a little-known motorcyclist and didn’t even make the podium.  Which leads us to Theory 6.

Theory 6: We live in a simulation. In the words of a New Yorker columnist in February 2017: “Whether we are at the mercy of an omniscient adolescent prankster or suddenly the subjects of a more harrowing experiment than any we have been subject to before … we can now expect nothing remotely normal to take place for a long time to come. They’re fiddling with our knobs, and nobody knows the end.”

So maybe the aliens are in control in which case all bets are off. Or have we simply been buffeted as never before by media manipulation and fake news? Or is it something else? Whatever the truth, we seem to be at the cusp of a new age. We know not yet which way that will lead us. Hopefully, the choice is still in our hands.


The Day Zero was Banned from Roulette Wheels – How times have changed!

On December 30, 1967, senior detectives from Scotland Yard sent owners of gambling clubs into a proverbial spin. Anyone operating a roulette wheel that contained the number zero would be prosecuted, they warned. From now on the whirl of numbers would all be reds and blacks – starting with the number one. This warning 50 years ago followed a judgment in the House of Lords, the country’s highest court of appeal at the time, that the green zero was illegal under gaming law. According to these so-called “law lords”, this was because the chances must be equally favourable to all players in the game.The Lords’ problem with the zero was that players betting on the ball landing on an individual number were being offered odds of 35/1 – put £1 on number 7 and if it came up you got £35 back plus your stake. But standard British roulette wheels have 37 numbers including zero, so the odds should have been 36/1. This discrepancy gave the house an edge of 2.7% – the proportion of times the ball would randomly fall into the zero slot. (Note that in the US and South America roulette wheels normally have both a zero and double zero, giving them a house edge of just over 5%). The British edge on roulette wheels was a small one, such that someone staking £10 on a spin would expect statistically to lose an average of 27 pence. But it’s a vital one. Without an edge on a game the operator would expect only to break even, and that’s before accounting for running costs. The Lords’ decision also looked like the back door to banning every other game with a house edge, such as blackjack and baccarat.

It had been illegal in the UK to organise and manage the playing of games of chance since the Gaming Act of 1845. The Betting and Gaming Act 1960 was the most substantive change to gambling regulation since then. As well as permitting the likes of betting shops and pub fruit machines, it opened the door to gambling halls – though only in a very restricted way.Designed to permit small-stakes play on bridge in members’ clubs, the act legalised gaming clubs so long as they took their money from membership fees and from charges to cover the cost of the gaming facilities. Casinos soon proliferated, however, and by the mid-1960s around a thousand had sprung up. Many introduced French-style roulette, with wheels that included a single zero, since the law had arguably not been clear as to whether the house could have an edge. The one variation thought necessary by some to comply with the legislation was that when the ball landed on zero the house and player split the stake, instead of it being kept by the house. Not only had the law liberalised gambling more than had been envisaged by the government of the day, many casinos had apparent ties to organised crime. London gaming quickly became notorious. Film star George Raft, a man once linked to such shady characters as Las Vegas mobster Benjamin “Bugsy” Siegel, was one of the more high-profile names associated with the scene.

When the Lords drew a line in the sand in 1967 by banning zeros in roulette, gaming bodies went into overdrive. One proposal designed to save the zero was to offer odds of 36/1 on individual numbers, and instead levy a playing charge on the players. The government was soon persuaded it needed to legislate again. In 1968 a new Gaming Act introduced a Gaming Board and strict measures to regulate and police gaming in Great Britain. New licensing rules, including a “fit and proper persons” test, pushed out the shady operators.

The one concession to the industry was that gaming clubs and casinos would be permitted to play roulette with a zero. Other games with a house edge, such as baccarat, blackjack and craps were also explicitly permitted. In an environment of regulated, licensed gaming establishments, the government was saying, a small edge was acceptable as a way of paying for costs and turning a profit. This came on the back of another reform that was vital for developing the industry that we see today. Following the legalisation of betting shops in 1960, the government began taxing their turnover in 1966. It was the first tax on betting since the one introduced in 1926 by then Chancellor of the Exchequer, Winston Churchill, in the days before cash bookmaking was legal and above board. “I am not looking for trouble. I am looking for revenue,” Churchill declared at the time. He didn’t see much of the latter and got a lot of the former: endless enforcement difficulties and opposition from lobby groups and in parliament. The tax was gone by 1930. Yet the 1966 tax stuck, and today the UK gambling landscape is much changed – not only because of the introduction of the National Lottery in 1994 but thanks also in large measure to two key pieces of modernising legislation. The first was the radical overhaul of betting taxation in 2001 and the other was the Gambling Act of 2005, both of which I was closely involved with as an adviser.

Instead of taxing betting turnover, now operators are taxed on their winnings (gross profits). Casinos, betting shops and online operators can advertise on radio and TV; players no longer need to be members of casinos to visit them; and online operators based overseas but active in the UK market must comply with UK licence requirements. Betting exchanges allow people to bet person-to-person, a Gambling Commission regulates betting and gaming, and electronic roulette with a zero is legally available in betting shops and casinos.

The industry as a whole has grown very significantly in size and employs a lot of people, and there is more evidence-based research and focus on the issue of gambling prevalence and problem gambling than ever before. The wheel has certainly turned a long way since that Lords decision in 1967, when the country was still trying to decide what kind of gambling system it wanted. The question that now divides opinion is how far the wheel has turned for the better.



Leighton Vaughan Williams, The Day Zero was banned from British roulette: How times have changed. Article in The Conversation. Link below: