Skip to content

Repeated Game Strategies – in a nutshell.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

If there is a set of ‘game’ strategies with the property that no ‘player’ can benefit by changing their strategy while the other players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute what is known as the ‘Nash equilibrium’.

This leads us to the classic ‘Prisoner’s Dilemma’ problem. In this scenario, two prisoners, linked to the same crime, are offered a discount on their prison terms for confessing if the other prisoner continues to deny it, in which case the other prisoner will receive a much stiffer sentence. However, they will both be better off if both deny the crime than if both confess to it. The problem each faces is that they can’t communicate and strike an enforceable deal. The box diagram below shows an example of the Prisoner’s Dilemma in action.

Prisoner 2 Confesses Prisoner 2 Denies
Prisoner 1 Confesses 2 years each Freedom for P1; 8 years for P2
Prisoner 1 Denies 8 years for P1; Freedom for P2 1  year each

The Nash Equilibrium is for both to confess, in which case they will both receive 2 years. But this is not the outcome they would have chosen if they could have agreed in advance to a mutually enforceable deal. In that case they would have chosen a scenario where both denied the crime and received 1 year each.

So a Nash equilibrium is a stable state that involves interacting participants in which none can gain by a change of strategy as long as the other participants remain unchanged. It is not necessarily the best outcome for the parties involved, but it is the outcome we would most likely predict.

The Prisoner’s Dilemma is a one-stage game, however. What happens in games with more than one round, where players can learn from the previous moves of the other players?

Take the case of a 2-round game. The payoff from the game will equal the sum of payoffs from both moves.

The game starts with two players, each of whom is given £100 to place into a pot. They can then secretly choose to honour the deal or to cheat on the deal, by means of giving an envelope to the host containing the card ‘Honour’ or ‘Cheat’.  If they both choose to ‘Honour’ the deal, an additional £100 is added to the pot, yielding each an additional £50. So they end up with £150 each. But if one honours the deal and the other cheats on the deal, the ‘Cheat’ wins the original pot (£200) and the ‘Honour’ player loses all the money in that round.  A third outcome is that both players choose to ‘Cheat’, in which case each keeps the original £100. So in this round, the dominant strategy for each player (assuming no further rounds) is to ‘Cheat’, as this yields a higher payoff if the opponent ‘Honours’ the deal (£200 instead of £150) and a higher payoff if the opponent ‘Cheats’ (£100 instead of zero). The negotiated, mutually enforceable outcome, on the other hand, would be to agree to both ‘Honour’ the deal and go away with £150.

But how does this change in a 2-round game.

Actually, it makes no difference. In this scenario, the next round is the final round, in which you may as well ‘Cheat’ as there are no future rounds to realise the benefit of any goodwill realised from honouring the deal. Your opponent knows this, so you can assume your opponent who wishes to maximise his total payoff, will be hostile on the second move. He will assume the same about you.

Since you will both ‘Cheat’ on the second and final move, why be friendly on the first move?

So the dominant strategy is to ‘Cheat’ on the first round.

What if there are three rounds? The same applies. You know that your opponent will ‘Cheat’ on the final round and therefore the penultimate round as well. So your dominant strategy is to ‘Cheat’ on the first round, the second round and the final round. The same goes for your opponent. And so on. In any finite, pre-determined number of rounds, the dominant strategy in any round is to ‘Cheat.’

But what if the game involves an indeterminate number of moves? Suppose that after each move, you roll two dice. If you get a double-six, the game ends. Any other combination of numbers, play another round. Keep playing until you get a double-six. Your score for the game is the sum of your payoffs.

This sort of game in fact mirrors many real-world situations. In real life, you often don’t know when the game will end.

What is the best strategy in repeated play? For the game outlined above, we shall denote ‘Honour the deal’ as a ‘Friendly’ move and ‘Cheat’ as a hostile move. But the notion of a Friendly or Hostile approach can adopt other guises in different games.

There are seven proposed strategies here.

  1. Always Friendly. Be friendly every time
  2. Always Hostile. Be hostile every time
  3. Retaliate. Be Friendly as long as your opponent is Friendly but if your opponent is ever Hostile, you be Hostile from that point on.
  4. Tit for tat. Be Friendly on the first move. Thereafter, do whatever your opponent did on the previous move.
  5. Random. On each move, toss a coin. If Heads, be Friendly. If tails, be Hostile.
  6. Alternate. Be Friendly on even-numbered moves, and Hostile on odd-numbered moves, or vice-versa.
  7. Fraction. Be Friendly on the first move. Thereafter, be Friendly if the fraction of times your opponent has been Friendly until that point is less than a half. Be Hostile if it is less than or equal to a half.

Which of these is the dominant strategy in this game of iterated play? Actually, there is no dominant strategy in an iterated game, but which strategy actually wins if every strategy plays every other strategy.

‘Always Hostile’ does best against ‘Always Friendly’ because every time you are Friendly against an ‘Always Hostile’, you are punished with the ‘sucker’ payoff.

‘Always Friendly’ does best against Retaliation, because the extra payoff you get from a Hostile move is eventually negated by the Retaliation.

Thus even the choice of whether to be Friendly or Hostile on the first move depends on the opponent’s strategy.

For every two distinct strategies, A and B, there is a strategy C against which A does better than B, and a strategy D against which B does better than A.

So which strategy wins when every strategy plays every other strategy in a tournament? This has been computer simulated many times. And the winner is Tit for Tat.

It’s true that Tit for Tat can never get a higher score than a particular opponent, but it wins tournaments where each strategy plays every other strategy. In particular, it does well against Friendly strategies, while it is not exploited by Hostile strategies. So you can trust Tit for Tat. It won’t take advantage of another strategy. Tit for Tat and its opponents both do best when both are Friendly. Look at this way. There are two reasons for a player to be unilaterally hostile, i.e. to take advantage of an opponent or to avoid being taken advantage of by an opponent. Tit for Tat eliminates the reasons for being Hostile.

What accounts for Tit for Tat’s success, therefore, is its combination of being nice, retaliatory, forgiving and clear.

In other words, success in an evolutionary ‘game’ is correlated with the following characteristics:

Be willing to be nice: cooperate, never be the first to defect.

Don’t be played for a sucker: return defection for defection, cooperation for cooperation.

Don’t be envious: focus on how well you are doing, as opposed to ensuring you are doing better than everyone else.

Be forgiving if someone is willing to change their ways and co-operate with you. Don’t bear grudges for old actions.

Don’t be too clever or too tricky. Clarity is essential for others to cooperate with you.

As Robert Axelrod, who pioneered this area of game theory in his book, ‘The Evolution of Cooperation’: Tit for Tat’s “niceness prevents it from getting into unnecessary trouble. Its retaliation discourages the other side from persisting whenever defection is tried. Its forgiveness helps restore mutual cooperation. And its clarity makes it intelligible to the other player, thereby eliciting long-term cooperation.”

How about the bigger picture? Can Tit for Tat perhaps teach us a lesson in how to play the game of life? Yes, in my view it probably can.

Further Reading and Links

Axelrod, Robert (1984), The Evolution of Cooperation, Basic Books

Axelrod, Robert (2006), The Evolution of Cooperation (Revised ed.), Perseus Books Group

Axelrod, R. and Hamilton, W.D. (1981), The Evolution of Cooperation, Science, 211, 1390-96.

How to use game theory to take a penalty – in a nutshell.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

It’s 2020 and our mythical El Clasico game between Real Madrid and Barcelona is in the 23rd minute at the Santiago Bernabeu when Lionel Messi is brought down in the penalty box. He is rewarded with a spot kick against the custodian of the Los Blancos net, Keylor Navas.

Messi knows from the team statistician that if he aims straight and the goalkeeper stands still, his chance of scoring is just 30%. But if he aims straight and Navas dives to one corner, his chance of converting the penalty rises to 90%.

On the other hand, if Messi aims at a corner and the goalkeeper stands still, his chance of scoring is a solid 80%, while it falls to 50% if the goalkeeper dives to a corner.

We are here simplifying the choices to two distinct options, for the sake of simplicity and clarity.

Navas also knows from his team statistician that if he dives to one corner and Messi aims straight, his chance of saving is just 10%. But if he stands still and Messi aims at one corner, his chance of saving the penalty rises to 50%.

On the other hand, if Navas stands still and Messi aims at a corner, his chance of making the save is just 20%, while it rises to 70% if Messi aims straight.

So this is the payoff matrix, so to speak, facing Messi as he weighs up his decision.

Goalkeeper – Stands Still Goalkeeper – dive to one corner
Lionel Messi – Aims straight 30% 90%
Lionel Messi – Aims at corner 80% 50%

So what should he do? Aim straight or to a corner. And what should Navas do? Stand still or dive?

Here is the payoff matrix facing Navas.

Messi – Aims straight Messi – Aims at a corner
Navas – Stands still 70% 20%
Navas – Dives to one corner 10% 50%

Game theory can help here.

Neither player has what is called a dominant strategy in game-theoretic terms, i.e. a strategy that is better than the other, no matter what the opponent does. The optimal strategy will depend on what the opponent’s strategy is.

In such a situation, game theory indicates that both players should mix their strategies, in Messi’s case aiming for the corner with a two-thirds chance, while the goalkeeper should dive with a 5/9 chance.

These figures are derived by finding the ratio where the chance of scoring (or saving) is the same, whichever of the two tactics the other player uses.

 The Proof

Suppose the goalkeeper opts to stand still, then Messi’s chance (if he aims for the corner 2/3 of the time) = 1/3 x 30% + 2/3 x 80% = 10% + 53.3% = 63.3%

If the goalkeeper opts to dive, Messi’s chance = 1/3 x 90% + 2/3 x 50% = 30% + 33.3% = 63.3%

Adopting this mixed strategy (aim for the corner 2/3 of the time and shoot straight 1/3 of the time), the chance of scoring is therefore the same. This is the ideal mixed strategy, according to standard game theory.

From the point of view of Navas, on the other hand, if Messi aims straight, his  chance of saving the penalty kick (if he dives 5/9 of the time) = 5/9 x 10% + 4/9 x 70% = 5.6% + 31.1% = 36.7%

If Messi opts to aim for the corner, Navas’ chance = 5/9 x 50% + 4/9 x 20% = 27.8% + 8.9% = 36.7%

Adopting this mixed strategy (dive for the corner 5/9 of the time and stand still 4/9 of the time), the chance of scoring is therefore the same. This is the ideal mixed strategy, according to standard game theory.

The chances of Messi scoring and Navas making the save in each case add up to 100%, which cross-checks the calculations.

Of course, if the striker or the goalkeeper gives away real new information about what he will do, then each of them can adjust tactics and increase their chance of scoring or saving.

To properly operationalise a mixed strategy requires one extra element, and that is the ability to truly randomise the choices, so that Messi actually does have exactly a 2/3 chance of aiming for the corner, and Navas actually does have a 5/9 chance of diving for the corner. There are different ways of achieving this. One method of achieving a 2/3 ratio is  to roll a die and go for the corner if it comes up 1, 2, 3 or 4, and aim straight if it comes up 5 or 6. Or perhaps not! But you get the idea.

For the record, Messi aimed at the left corner, Navas guessed correctly and got an outstretched hand to it, pushing it back into play. Leo stepped forward deftly to score the rebound. Cristiano Ronaldo equalised from the spot eight minutes later. And that’s how it ended at the Bernabeu. Real Madrid 1 Barcelona 1. Honours even in El Clasico.


Messi’s strategy

x = chance that Messi should aim at corner

y = chance that Messi should aim straight


80x + 30y (if Navas stands still) = 50x + 90y (if Navas dives)

x + y = 1


30x = 60y

30x = 60 (1-x)

90x = 60

x = 2/3


Navas’ strategy

x = chance that Navas should dive to corner

y  = chance that Navas should stand still


10x + 70y (if Messi aims straight) = 50x + 20y (if Messi aims at corner)

x+y = 1


10x + 70y = 50x + 20y

40x = 50y

40x = 50(1-x)

90x = 50

x = 5/9

y = 4/9

References and Links

Game Theory: Mixed Strategies Explained.

Simplicity and the Search for Truth – Guide Notes.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

William of Occam (also spelled William of Ockham) was a 14th century English philosopher. At the heart of Occam’s philosophy is the principle of simplicity, and Occam’s Razor has come to embody the method of eliminating unnecessary hypotheses. Essentially, Occam’s Razor holds that the theory which explains all (or the most) while assuming the least is the most likely to be correct. This is the principle of parsimony – explain more, assume less. Put more elegantly, it is the principle of ‘pluritas non est ponenda sine necessitate’ (plurality must never be posited beyond necessity).

Yet empirical support for the Razor can be drawn from the principle of ‘overfitting.’ In statistics, ‘overfitting’ occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. Critically, a model that has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data.

We can also look at it through the lens of what is known as Solomonoff Induction. Whether a detective trying to solve a crime, a physicist trying to discover a new universal law, or an entrepreneur seeking to interpret some latest sales figures, all are involved in collecting information and trying to infer the underlying causes. The problem of induction is this: We have a set of observations (or data), and want to find the underlying causes of those observations, i.e. to find hypotheses that explain our data. We’d like to know which hypothesis is correct, so we can use that knowledge to predict future events. In doing so, we need to create a set of defined steps to arrive at the truth, a so-called algorithm for truth.

In particular, if all of the hypotheses are possible but some are more likely than others, how do you weight the various hypotheses? This is where Occam’s Razor comes in.

Consider, for example, the two 32 character sequences:



The first can be written “ab 16 times”. The second probably cannot be simplified further.

Now consider the following problem. A computer program outputs the following sequence of numbers: 1, 3, 5, 7. What rule do you think gave rise to the number sequence 1,3,5,7? If we know this, it will help us to predict what the next number in the sequence is likely to be, if there is one. Two hypotheses spring instantly to mind. It could be: 2n-1, where n is the step in the sequence. So the third step, for example, gives 2×3-1 = 5. If this is the correct rule generating the observations, the next step in the sequence will be 9 (5×2-1).

But it’s possible that the rule generating the number sequence is: 2n-1 + (n-1)(n-2)(n-3)(n-4). So the third step, for example, gives 2×3-1 + (3-1)(3-2)(3-3)(3-4) = 7. In this case, however, the next step in the sequence will be 33.

But doesn’t the first hypothesis seem more likely? Occam’s Razor is the principle behind this intuition. “Among all hypotheses consistent with the observations, the simplest is the most likely.”

More generally, say we have two different hypotheses about the rule generating the data. How do we decide which is more likely to be true? To start, is there a language in which we can express all problems, all data, all hypotheses? Let’s look at binary data. This is the name for representing information using only the characters ‘0’ and ‘1’. In a sense, binary is the simplest possible alphabet. With these two characters we can encode information. Each 0 or 1 in a binary sequence (e. g. 01001011) can be considered the answer to a yes-or-no question. And in principle, all information can be represented in binary sequences. Indeed, being able to do everything in the language of binary sequences simplifies things greatly, and gives us great power. We can treat everything contained in the data in the same way.

Now that we have a simple way to deal with all types of data, we need to look at the hypotheses, in particular how to assign prior probabilities to the hypotheses. When we encounter new data, we can then use Bayes’ Theorem to update these probabilities.

To be complete, to guarantee we find the real explanation for our data, we have to consider all possible hypotheses. But how could we ever find all possible explanations for our data?

By using the language of binary, we can do so.

Here we look to the concept of Solomonoff induction, in which the assumption we make about our data is that it was generated by some algorithm, i.e. the hypothesis that explains the data is an algorithm. Now we can find all the hypotheses that would predict the data we have observed. Given our data, we find potential hypotheses to explain it by running every hypothesis, one at a time. If the output matches our data, we keep it. Otherwise, we discard it. We now have a methodology, at least in theory, to examine the whole list of hypotheses that might be the true cause behind our observations.

The first thing is to imagine that for each bit of the hypothesis, we toss a coin. Heads will be 0, and tails will be 1. Take as an example, 01001101, so the coin landed heads, tails, heads, tails and so on. Because each toss of the coin has a 50% probability, each bit contributes ½ to the final probability. Therefore, an algorithm that is one bit longer is half as likely to be the true algorithm. This intuitively fits with Occam’s Razor: a hypothesis that is 8 bits long is much more likely than a hypothesis that is 34 bits long. Why bother with extra bits? We’d need evidence to show that they were necessary. So why not take the shortest hypothesis and call that the truth? Because all of the hypotheses predict the data we have so far, and in the future we might get data to rule out the shortest one. The more data we get, the easier it is likely to become to pare down the number of competing hypotheses which fit the data.

Turning now to ‘ad hoc’ hypotheses and the Razor. In science and philosophy, an ‘ad hoc hypothesis’ is a hypothesis added to a theory in order to save it from being falsified. Ad hoc hypothesising is compensating for anomalies not anticipated by the theory in its unmodified form. For example, you say that there is a leprechaun in your garden shed. A visitor to the shed sees no leprechaun. This is because he is invisible, you say. He spreads flour on the ground to see the footprints. He floats, you declare. He wants you to ask him to speak. He has no voice, you say. More generally, for each accepted explanation of a phenomenon, there is generally an infinite number of possible, more complex alternatives. Each true explanation may therefore have had many alternatives that were simpler and false, but also approaching an infinite number of alternatives that are more complex and false.

This leads us the idea of what I term ‘Occam’s Leprechaun.’ Any new and more complex theory can always be possibly true. For example, if an individual claims that leprechauns were responsible for breaking a vase that he is suspected of breaking, the simpler explanation is that he is not telling the truth, but ongoing ad hoc explanations (e.g. “That’s not me on the CCTV, it’s a leprechaun disguised as me) prevent outright falsification. An endless supply of elaborate competing explanations, called ‘saving hypotheses’, prevent ultimate falsification of the leprechaun hypothesis, but appeal to Occam’s Razor helps steer us towards the probable truth. Another way of looking at this is that simpler theories are more easily falsifiable, and hence possess more empirical content.

All assumptions introduce possibilities for error; if an assumption does not improve the accuracy of a theory, its only effect is to increase the probability that the overall theory is wrong.

It can also be looked at this way. The prior probability that a theory based on n+1 assumptions is true must be less than a theory based on n assumptions, unless the additional assumption is a consequence of the previous assumptions. For example, the prior probability that Jack is a train driver must be less than the prior probability that Jack is a train driver AND that he owns a Mini Cooper, unless all train drivers own Mini Coopers, in which case the prior probabilities are identical.

Again, the prior probability that Jack is a train driver and a Mini Cooper owner and a ballet dancer is less than the prior probability that he is just the first two, unless all train drivers are not only Mini Cooper owners but also ballet dancers. In the latter case, the prior probabilities of the n and n+1 assumptions are the same.

From Bayes’ Theorem, we know that reducing the prior probability will reduce the posterior probability, i.e. the probability that a proposition is true after new evidence arises.

Science prefers the simplest explanation that is consistent with the data available at a given time, but even so the simplest explanation may be ruled out as new data become available. This does not invalidate the Razor, which does not state that simpler theories are necessarily more true than more complex theories, but that when more than one theory explains the same data, the simpler should be accorded more probabilistic weight. The theory which explains all (or the most) and assumes the least is most likely. So Occam’s Razor advises us to keep explanations simple. But it is also consistent with multiplying entities necessary to explain a phenomenon. A simpler explanation which fails to explain as much as another more complex explanation is not necessarily the better one. So if leprechauns don’t explain anything they cannot be used as proxies for something else which can explain something.

More generally, we can now unify Epicurus and Occam. From Epicurus’ Principle we need to keep open all hypotheses consistent with the known evidence which are true with a probability of more than zero. From Occam’s Razor we prefer from among all hypotheses that are consistent with the known evidence, the simplest. In terms of a prior distribution over hypotheses, this is the same as giving simpler hypotheses higher a priori probability, and more complex ones lower probability.

From here we can move to the wider problem of induction about the unknown by extrapolating a pattern from the known. Specifically, the problem of induction is how we can justify inductive inference. According to Hume’s ‘Enquiry Concerning Human Understanding’ (1748), if we justify induction on the basis that it has worked in the past, then we have to use induction to justify why it will continue to work in the future. This is circular reasoning. This is faulty theory. “Induction is just a mental habit, and necessity is something in the mind and not in the events.” Yet in practice we cannot help but rely on induction. We are working from the idea that it works in practice if not in theory – so far. Induction is thus related to an assumption about the uniformity of nature. Of course, induction can be turned into deduction by adding principles about the world (such as ‘the future resembles the past’, or ‘space-time is homogeneous.’) We can also assign to inductive generalisations probabilities that increase as the generalisations are supported by more and more independent events. This is the Bayesian approach, and it is a response to the perspective pioneered by Karl Popper. From the Popperian perspective, a single observational event may prove hypotheses wrong, but no finite sequence of events can verify them correct. Induction is from this perspective theoretically unjustifiable and becomes in practice the choice of the simplest generalisation that resists falsification. The simpler a hypothesis, the easier it is to be falsified. Induction and falsifiability are in practice, from this viewpoint, is as good as it gets in science. Take an inductive inference problem where there is some observed data and a set of hypotheses, one of which may be the true hypothesis generating the data. The task then is to decide which hypothesis, or hypotheses, are the most likely to be responsible for the observations.

A better way of looking at this seems to be to abandon certainties and think probabilistically. Entropy is the tendency of isolated systems to move toward disorder and a quantification of that disorder, e.g. assembling a deck of cards in a defined order requires introducing some energy to the system. If you drop the deck, they become disorganised and won’t re-organise themselves automatically. This is the tendency in all systems to disorder. This is the Second Law of Thermodynamics, which implies that time is asymmetrical with respect to the amount of order: as the system, advances through time, it will statistically become more disordered. By ‘Order’ and ‘Disorder’ we mean how compressed the information is that is describing the system. So if all your papers are in one neat pile, then the description is “All paper in one neat pile.” If you drop them, the description becomes ‘One paper to the right, another to the left, one above, one below, etc. etc.” The longer the description, the higher the entropy. According to Occam’s Razor, we want a theory with low entropy, i.e. low disorder, high simplicity. The lower the entropy, the more likely it is that the theory is the true explanation of the data, and hence that theory should be assigned a higher probability.

More generally, whatever theory we develop, say to explain the origin of the universe, or consciousness, or non-material morality, must itself be based on some theory, which is based on some other theory, and so on. At some point we need to rely on some statement which is true but not provable, and so we think may be false, although it is actually true. We can never solve the ultimate problem of induction, but Occam’s Razor combined with Epicurus, Bayes and Popper is as good as it gets if we accept that. So Epicurus, Occam, Bayes and Popper help us pose the right questions, and help us to establish a good framework for thinking about the answers.

At least that applies to the realm of established scientific enquiry and the pursuit of scientific truth. How far it can properly be extended beyond that is a subject of intense and continuing debate.

References and Links

McFadden, Johnjoe. 2021. Life is Simple. London: Basic Books.

Occam’s Razor. Principia Cybernetica Web.

What is Occam’s Razor. UCR Math.

Occam’s Razor. Simple English Wikipedia.

Occam’s Razor. Wikipedia.

An Intuitive Explanation of Solomonoff Induction. LESSWRONG. Alex Altair. July 11, 2012.

The Four Card Task – in a nutshell.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

You are presented with four cards, with the face-up side on display, showing either a letter or a number. You are promised that each has a letter on one side and a number on the other.

Red Card displays the letter D

Orange Card displays the letter N

Blue Card displays the number 21

Yellow Card displays the number 16

You are now presented with the following statement: Every card with D on one side has 21 on the other side.


a. What is the minimum number of cards needed to determine whether this statement is true? What are the colours of the cards you need to turn over to determine this?

b. Four cards are placed on a table, each of which has a number on one side and a patch of colour on the other side. The visible faces of the cards display 1, 4, red and yellow. What is the minimum number of cards you need to turn over to test the truth of the proposition that a card with an even number on one face is red on the other side?

3. Four cards are placed on a table, two of which display a number, 16 or 25. The other two cards display a soft drink and an alcoholic drink. The minimum age for drinking alcohol is 18. What is the minimum number of cards you need to turn over to test the truth of the proposition that a card with a number greater than 18 on one side has an alcoholic drink on the other side?

References and Links

The Famous Four Card Task. Social Psychology Network.

Wason Selection Task. Wikipedia.

Biases at the racetrack – in a nutshell.

The Favourite-Longshot Bias is the well-established tendency in most betting markets for bettors to bet too much on ‘longshots’ (events with long odds, i.e. low probability events) and to relatively under-bet ‘favourites’ (events with short odds, i.e. high probability events). This is strangely counterintuitive as it seems to offer a sure-fire way to make above-average returns in the betting booth. Assume, for example, that Mr. Smith and Mr. Jones both start with £1,000. Now Mr. Smith places a level £10 stake on 100 horses quoted at 2 to 1. Meanwhile, Mr. Jones places a level £10 stake on 100 horses quoted at 20 to 1.

Who is likely to end up with more money at the end? Surely the answer should be the same for both. Otherwise, either Mr. Smith or Mr. Jones would seem to be doing something very wrong. So let’s take a look.

The Ladbrokes Flat Season Pocket Companion for 1990 provides a nicely laid out piece of evidence here for British flat horse racing between 1985 and 1989, but the same sort of pattern applies for any set of years we care to choose, or (with a few rare exceptions) pretty much any sport, anywhere.

In fact, the table conveniently presented in the Companion shows that not one out of 35 favourites sent off at 1/8 or shorter (as short as 1/25) lost between 1985 and 1989. This means a return of between 4% and 12.5% in a couple of minutes, which is an astronomical rate of interest.  The point being made is that broadly speaking the shorter the odds, the better the return. The group of ‘white hot’ favourites (odds between 1/5 and 1/25) won 88 out of 96 races for a 6.5% profit.  The following table looks at other odds groupings.

Odds               Wins               Runs               Profit              %

1/5-1/2             249                   344               +£1.80          +0.52

4/7-5/4             881                 1780              -£82.60         -4.64

6/4 -3/1            2187               7774              -£629             -8.09

7/2-6/1             3464              21681             -£2237          -10.32

8/1-20/1           2566              53741             -£19823        -36.89

25/1-100/1       441              43426             -£29424        -67.76

An interesting argument advanced by Robert Henery in 1985 is that the favourite-longshot bias is a consequence of bettors discounting a fixed fraction of their losses, i.e. they underweight their losses compared to their gains, and this causes them to bias their perceptions of what they have won or lost in favour of longshots. The rationale behind Henery’s hypothesis is that bettors will tend to explain away and therefore discount losses as atypical, or unrelated to the judgment of the bettor. This is consistent with contemporaneous work on the psychology of gambling. These studies demonstrate how gamblers tend to discount their losses, often as ‘near wins’ or the outcome of ‘fluke’ events, while bolstering their wins.

If the Henery Hypothesis is correct as a way of explaining the favourite-longshot bias, the bias can be explained as the natural outcome of bettors’ pre-existing perceptions and preferences. There is little evidence that the market offers opportunities for players to earn consistent profits, but they certainly do much better (lose a lot less) by a blind level-stakes strategy of backing favourites instead of longshots. Intuitively, we would think that people would wise up and switch their money away from the longshots to the favourites. In that case, favourites would become less good value, as their odds would shorten, and longshots would become better value as their odds would lengthen. But is doesn’t happen, despite a host of published papers pointing this out, as well as the Ladbrokes Pocket Companion. People continue to love their longshots, and are happy to pay a price for this love.

Are there other explanations for the persistence of the favourite-longshot bias? One explanation is based on consumer preference for risk. The idea here is that bettors are risk-loving and so prefer the risky series of long runs of losses followed by the odd big win to the less risky strategy of betting on favourites that will win more often albeit pay out less for each win. Such an assumption of risk-love by bettors, however, runs contrary to conventional explanations of financial behaviour which tend to assume people like to avoid risk. It’s also been argued that bettors are actually not risk-lovers but skewness-lovers, which would also explain a preference for backing longshots over favourites.

Another explanation that has been proposed for the existence of the bias is based on the existence of unskilled bettors in the context of high margins and other costs of betting which deter more skilled agents. These unskilled bettors find it more difficult to arbitrate between the true win probabilities of different horses, and so over-bet those offered at longer odds. One test of this hypothesis is to compare the size of the bias in person-to-person betting exchanges (characterised by lower margins) and bookmaker markets (higher margins). The bias was indeed lower in the former, a finding which is at least consistent with this theory.

So far, it should be noted that these are all demand-side explanations, i.e. based on the behaviour of bettors. Another explanation of at least some of the bias is the idea that odds-setters defend themselves against bettors who potentially have superior information to bookmakers by artificially squeezing odds at the longer end of the market. Even so, the favourite-longshot bias continues to exist in so-called ‘pari-mutuel’ markets, in which there are no odds-setters, but instead a pool of all bets which is paid out (minus fixed operator deductions) to winning bets. To the extent that the favourite-longshot bias cannot be fully explained by this odds-squeezing explanation, we can classify the remaining explanations as either preference-based or perception-based. Risk love or skewness love are examples of preference-based explanations. Discounting of losses or other explanations based on a poor assessment of the true probabilities can be categorized as perception-based explanations.

The favourite-longshot bias has even been found in online poker, especially in lower-stake games. In that context, the evidence suggests that it was misperception of probabilities rather than risk-love that offered the best explanation for the bias.

In conclusion, the favourite-longshot bias is a well-established market anomaly in sports betting markets, which can be traced in the published academic literature as far back as 1949. Explanations can broadly be divided into demand-based and supply-based, preference-based and perceptions-based. A significant amount of modern research has been focused on seeking to arbitrate between these competing explanations of the bias by formulating predictions as to how data derived from these markets would behave if one or other explanation was correct. A compromise position, which may or may not be correct, is that all of these explanations have some merit, the relative merit of each depending on the market context.


Let’s look more closely at how the Henery odds transformation works.

If the true probability of a horse losing a race is q, then the true odds against winning are q/(1-q).

For example, if the true probability of a horse losing a race (q) is ¾, the chance that it will win the race is ¼, i.e. 1- ¾.  The odds against it winning are: q/(1-q) = 3/4/(1-3/4) = 3/4/(1/4) = 3/1.

Henery now applies a transformation whereby the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.

If, for example, f = ¾, and the true chance of a horse losing is ½ (q=1/2), then the bettor will rate subjectively the chance of the horse losing as Q = fq.

So Q = ½. ¾ = 3/8, i.e. a subjective chance of winning of 5/8.

So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 50% (Evens, i.e. q=1/2) is 3/5 (60%), i.e. odds-on.

This is derived as follows:

Q/(1-Q) = fq/(1-fq) = 3/8/(1-3/8) = 3/8/(5/8) = 3/5

If the true probability of a horse losing a race is 80%, so that the true odds against winning are 4/1 (q = 0.8), then the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.

If, for example, f = ¾, and the true chance of a horse losing is 4/5 (q=0.8), then the bettor will rate subjectively the chance of the horse losing as Q = fq.

So Q = 3/4. 4/5 = 12/20, i.e. a subjective chance of winning of 8/20 (2/5).

So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 80% (4 to 1, i.e. q=0.8) is 6/4 (40%).

This is derived as follows:

Q/(1-Q) = fq/(1-fq) = 12/20 / (1-12/20) = 12/8 = 6/4

To take this to the limit, if the true probability of a horse losing a race is 100%, so that the true odds against winning are ∞ to 1 against (q = 1), then the bettor will again assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.

If, for example, f = ¾, and the true chance of a horse losing is 100% (q=1), then the bettor will rate subjectively the chance of the horse losing as Q = fq.

So Q = 3/4. 1 = 3/4, i.e. a subjective chance of winning of 1/4.

So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 100% (∞ to 1, i.e. q=1) is 3/1 (25%).

This is derived as follows:

Q/(1-Q) = fq/(1-fq) = 3/4 / (1/4) = 3/1

Similarly, if the true probability of a horse losing a race is 0%, so that the true odds against winning are 0 to 1 against (q = 0), then the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.

If, for example, f = ¾, and the true chance of a horse losing is 0% (q=0), then the bettor will rate subjectively the chance of the horse losing as Q = fq.

So Q = 3/4. 0 = 0, i.e. a subjective chance of winning of 1.

So the perceived (subjective) odds associated of winning with true (objective odds) of losing of 0% (0 to 1, i.e. q=0) is also 0/1.

This is derived as follows:

Q/(1-Q) = fq/(1-fq) = 0 / 1 = 0/1

This can all be summarised in a table.

Objective odds (against)       Subjective odds (against)  
Evens                                                                      3/5
4/1                                                                           6/4
Infinity to 1                                                             3/1
0/1                                                                           0/1

We can now use these stylised examples to establish the bias.

In particular, the implication of the Henery odds transformation is that, for a given f of ¾, 3/5 is perceived as fair odds for a horse with a 1 in 2 chance of winning.

In fact, £100 wagered at 3/5 yields £160 (3/5 x £100, plus stake returned) half of the time (true odds = evens), i.e. an expected return of £80.

£100 wagered at 6/4 yields £250 (6/4 x £100, plus the stake back) one fifth of the time (true odds = 4/1), i.e. an expected return of £50.

£100 wagered at 3/1 yields £0 (3/1 x £100, plus the stake back) none of the time (true odds = Infinity to 1), i.e. an expected return of £0.

It can be shown that the higher the odds the lower is the expected rate of return on the stake, although the relationship between the subjective and objective probabilities remains at a fixed fraction throughout.

Now on to the over-round.

The same simple assumption about bettors’ behaviour can explain the observed relationship between the over-round (sum of win probabilities minus 1) and the number of runners in a race, n.

If each horse is priced according to its true win probability, then over-round = 0. So in a six horse race, where each has a 1 in 6 chance, each would be priced at 5 to 1, so none of the lose probability is shaded by the bookmaker. Here the sum of probabilities = (6 x 1/6) – 1 = 0.

If only a fixed fraction of losses, f, is counted by bettors, the subjective probability of losing on any horse is f(qi), where qi is the objective probability of losing for horse i, and the odds will reflect this bias, i.e. they will be shorter than the true probabilities would imply. The subjective win probabilities in this case are now 1-f(qi), and the sum of these minus 1 gives the over-round.

Where there is no discounting of the odds, the over-round (OR) = 0, i.e. n times  correct odds minus 1. Assume now that f = ¾, i.e. ¾ of losses are counted by the bettor.

If there is discounting, then the odds will reflect this, and the more runners the bigger will be the over-round.

So in a race with 5 runners, q is 4/5, but fq = 3/4 x 4/5 = 12/20, so subjective win probability = 1-fq = 8/20, not 1/5. So OR = (5 x 8/20) – 1 = 1.

With 6 runners, fq = ¾ x 5/6 = 15/24, so subjective win probability = 1 – fq = 9/24. OR = (6x 9/24) – 1 = (54/24) -1 = 11/4.

With 7 runners, fq = ¾ x 6/7 = 18/28, so subjective win probability = 1-fq = 10/28. OR = (7 x 10/28) – 1 = 42/28 = 11/2

If there is no discounting, then the subjective win probability equals the actual win probability, so an example in a 5-horse is that each has a win probability of 1/5. Here, OR = (5×1/5) – 1 = 0. In a 6-horse race, with no discounting, subjective probability = 1/6. OR = (6 x 1/6) – 1 = 0.

Hence, the over-round is linearly related to the number of runners, assuming that bettors discount a fixed fraction of losses (the ‘Henery Hypothesis’).


Calculate the subjective odds (against) in this table assuming that f, the fixed fraction of losses undiscounted by the bettor, is a half.

Objective odds (against)       Subjective odds (against)



Infinity to 1


References and Links

Henery, R.J. (1985). On the average probability of losing bets on horses with given starting price odds. Journal of the Royal Statistical Society. Series A (General). 148, 4. 342-349.

Vaughan Williams, L. and Paton, D. (1997). Why is there a favourite-longshot bias in British racetrack betting markets? Economics Journal, 107, 150-158.

The Kelly Criterion – in a nutshell.

How much should we bet when we believe the odds are in our favour. The answer to this question was first formalised in 1956, by daredevil pilot, recreational gunslinger and physicist John L. Kelly, Jr. at Bell Labs. The so-called Kelly Criterion is a formula employed to determine the optimal size of a series of bets when we have the advantage, in other words when the odds favour us. It takes account of the size of our edge over the market as well as the adverse impact of volatility. In other words, even when we have the edge, we can still go bankrupt along the way if we stake too much on any individual wager or series of wagers.

Essentially, the Kelly strategy is to wager a proportion of our capital which is equivalent to our advantage at the available odds. So if we are being offered even money, and we back heads, and we are certain that the coin will come down heads, we have a 100% advantage. So the recommended wager is the total of our capital. If there is a 60% chance of heads, and a 40% chance of tails, our advantage is now 20%, and we are advised to stake accordingly. This is a simplified representation of the literature on Kelly, Half-Kelly, and other derivatives of same, but the bottom line is clear. It is just as important to know how much to stake as it is to gauge when we have the advantage. But it’s not easy unless we can accurately identify that advantage.

Put more technically, the Kelly criterion is the fraction of capital to wager to maximise compounded growth of capital. The problem it seeks to address is that even when there is an edge, beyond some threshold larger bets will result in lower compounded return because of the adverse impact of volatility. The Kelly criterion defines the threshold, and indicates the fraction that should be wagered to maximise compounded return over the long run (F), which is given by:

F = Pw – (Pl/W)


F = Kelly criterion fraction of capital to bet

W = Amount won per amount wagered (i.e. win size divided by lose size)

Pw = Probability of winning

Pl = Probability of losing

When win size and loss size are equal, W = 1, and the formula reduces to:

F = Pw – Pl

For example, if a trader loses £1,000 on losing trades and gains £1,000 on winning trades, and 60 per cent of all trades are winning trades, the Kelly criterion indicates an optimal  trade size equal to 20 per cent (0.60-0.40 = 0.20). As another example, if a trader wins £2,000 on winning trades and loses £1,000 on losing trades, and the probability of winning and losing are both equal to 50 per cent, the Kelly criterion indicates an optimal trade size equal to 25 per cent of capital: 0.50- (0.50/2) = 0.25.

In other words, Kelly argues that, in the long run, we should wager a percentage of our bankroll equal to the expected profit divided by than the amount we would receive if we win.

Proportional over-betting is more harmful than under-betting. For example, betting half the Kelly criterion will reduce compounded return by 25 per cent, while betting double the Kelly criterion will eliminate 100 per cent of the gain. Betting more than double the Kelly criterion will result in an expected negative compounded return, regardless of the edge on any individual bet. The Kelly criterion implicitly assumes that there is no minimum bet size. This assumption prevents the possibility of total loss. If there is a minimum trade size, as is the case in most practical investment and trading situations, then ruin is possible if the amount falls below the minimum possible bet size.

So should we bet the full amount recommended by the Kelly criterion? In fact, betting the full amount recommended by the Kelly formula may be unwise for a number of reasons. Notably, accurate estimation of the advantage of the bets is critical; if we overestimate the advantage by more than a factor of two, Kelly betting will cause a negative rate of capital growth, and this is easily done. So, full Kelly betting may be a rough ride, and a fractional Kelly betting strategy might be substituted, i.e. a strategy wherein we bets some fraction of the recommended Kelly bet, such as a half or a third.

Ironically, John Kelly himself died in 1965, never having used his own criterion to make money.

So that’s the Kelly criterion. In a nutshell, the advice is only to bet when you believe you have the edge, and to do so using a stake size related to the size of the edge. Mathematically, it means betting a fraction of your capital equal to the size of your advantage. So, if you have a 20% edge at the odds, bet 20% of your capital. In the real world, however, we need to allow for errors that can creep in, like uncertainty as to the true edge, if any, that we have at the odds. So, unless we’re happy to risk a very bumpy ride, and we have total confidence in our judgment, a preferred strategy may to be stake a defined fraction of that amount, known as a fractional Kelly strategy.


If a trader is offered even money on a heads/tails bet, and knows that the chance of heads is 70%, the Kelly criterion indicates an optimal trade size equal to x per cent of capital. Calculate x.


References and Links

The Kelly Criterion. LessWrong. 15 October, 2018.

Kelly Criterion. Wikipedia.


What happened when the Butch challenged the Brain to a game of dice?

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

This is a true story about New York gambling-house operator, The Butch, who made his fortune booking dice games. In 1952 he was famously challenged by a bigtime gambler known as The Brain to a simple wager. The bet was an even-money proposition that the Butch could throw a double-six in 21 rolls of two dice. We can assume symmetry – the dice were not loaded or biased in any way. All faces were equally likely to come up. So the probability of any number appearing on a given roll of either one of the dice is 1/6.

On the face of it, the edge seems to be with Butch. After all, there are 36 possible combinations that could come up when throwing two dice, from 1-1, 1-2, 1-3, to 6-4, 6-5, 6-6. Intuition might suggest, therefore, that 18 throws should give you a 50-50 chance of throwing any one of these combinations, including a double-six. In 21 throws, the chance of a double-six should, therefore, be more than 50-50. On this basis, the Butch accepted the even money bet at $1,000 a roll. After twelve hours of rolling, the Brain was $49,000 up, at which point the Butch called it a day, sensing that something was wrong with his strategy.

The Brain had in fact profited from a classic probability puzzle known as the Chevalier’s Dice problem, which can be traced to the 17th French gambler and bon vivant, Antoine Gombaud, better known as the Chevalier de Méré. The Chevalier would agree even money odds that in four rolls of a single die he would get at least one six. His logic seemed impeccable. The Chevalier reasoned that since the chance that a 6 will come up in any one roll of the die is 1 in 6, then the chance of getting a 6 in four rolls is 4/6, or 2/3, which is a good bet at even money. If the probability was a half, he would break even at even money. For example, in 300 games, at 1 French franc a game, he would stake 300 francs and expect to win 150 times, returning him 150 francs for each win with his stake returned on each occasion (total of 300 francs). With a probability of 2/3, he would expect to win 200 times, yielding a good profit.

In fact, it is straightforward to show that this reasoning is faulty, for if it were correct, then we would calculate the chance of a 6 in five rolls of the die as 5/6, and therefore the chance of a 6 in six rolls of the die would be 6/6 = 100%, and in 7 rolls, 7/6!!! Something is therefore clearly wrong here.

Still, even though his reasoning was faulty, he continued to make a profit by playing the game at even money. To see why, we need to calculate the true probability of getting a 6 in four rolls of the die. The key idea here is that the number that comes up on each roll is independent of any other rolls, i.e. dice have no memory. Since each event is independent, we can (according to the laws of probability) multiply the probabilities.

So the probability of a 6 followed by a 6, followed by a 6, followed by a 6, is: 1/6 x 1/6 x 1/6 x 1/6 = 1/1296.

So what is the chance of getting at least one six in four rolls of the die?

Since the probability of getting a 6 in any one roll of the die = 1/6, the probability of NOT getting a 6 in any one roll of the die = 5/6.

So the chance of NOT getting a 6 in four rolls of the die is:

5/6 x 5/6 x 5/6 x 5/6 = 625/1296

So the chance of getting at least one 6 is 1 minus this, i.e. 1 – (625/1296) = 671/1296 = 0.5177, which > 0.5.

So, the odds are still in favour of the Chevalier, since he is agreeing even money odds on an event with a probability of 51.77%.

This was all very well as long as it lasted, but eventually the Chevalier decided to branch out and invent a new, slightly modified game. In the new game, he asked for even money odds that a pair of dice, when rolled 24 times, will come up with a double-6 at least once. His reasoning was the same as before, and quite similar to the reasoning employed by the Butch. If the chance of a 6 on one roll of the die is 1/6, then the chance of a double-6 when two dice are thrown = 1/6 x 1/6 (as they are independent events) = 1/36.

So, reasoned the Chevalier, the chance of at least one double-6 in 24 throws is: 24/36 = 2/3.

So this is very profitable game for the Chevalier. Or is it? No it isn’t, and this time Monsieur Gombaud paid for his faulty reasoning. He started losing. In desperation, he consulted the mathematician and philosopher, Blaise Pascal. Pascal derived the correct probabilities as follows:

The probability of a double-6 in one throw of a pair of dice = 1/6 x 1/6 = 1/36.

So the probability of NO double-6 in one throw of a pair of dice = 35/36.

So, the probability of no double-6 in 24 throws of a pair of dice = 35/36 x 35/36 …  24 times = 35/36 to the power of 24, i.e. (35/36)24  = 0.5086.

So probability of at least one double-6 is 1 minus this, i.e. 1 – 0.5086 = 0.4914, i.e. less than 0.5. Under the terms of the new game, the Chevalier was betting at even money on a game which he lost more often than he won. It was an error that the Butch was to repeat almost 300 years later!

What if the Chevalier had changed the game to give himself 25 throws?

Now, the probability of throwing at least one double-6 in 25 throws of a pair of dice is:

1 – (35/36)25 = 0.5055.

These odds, at even money, are in favour of the Chevalier, but this probability is still lower than the probability of obtaining one ‘6’ in four throws of a single die.

In the single-die game, the Chevalier has a house edge of 51.77% – 48.23% = 3.54%.

In the ‘pair of dice’ game (24 throws), the Chevalier’s edge =

49.14% – 50.81% = -1.72%

In the ‘pair of dice’ game (25 throws), the Chevalier’s edge =

50.55% – 49.45% = 1.1%

A better game for the Chevalier would have been to offer even money that he could get at least one run of ten heads in a row in 1024 tosses of a coin. The derivation of this probability is similar in method to the dice problem.

First, we need to determine the probability of 10 heads in 10 tosses of a fair coin.

The odds are: ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½

Odds = (1/2)10 = 1/1024, i.e. 1023/1.

Based on this, what is the probability of at least one run of 10 heads in 1024 tosses of the coin? Is it 0.5? No, because although you can expect ONE run of 10 heads on average, you could obtain zero, 2, 3, 4, etc.

So what is the probability of NO RUN of 10 heads in 1024 tosses of the coin?

This is: (1-1/1024)1024

The probability of NO RUNS OF TEN HEADS = (1023/1024)1024 = 37%

So probability of AT LEAST one run of 10 heads = 63%.

Now assume you have tossed the coin already 234 times out of 1024, without a run of 10 heads, what is your chance now of getting 10 heads?

Probability of NO RUNS OF TEN HEADS in remaining 790 tosses = (1023/1024)790 = 46%

So probability of at least one success = 54%.

The Chevalier could have played either of these games and expected to come out ahead. But the game would have taken a long time. He preferred the shorter game, which produced the longer loss.

Until he was put right by Monsieur Pascal.

Most importantly, though, the Chevalier’s question led to a correspondence, most of which has survived, which led to the foundations of modern probability theory.

Out of this correspondence emerged quite a few jewels, one of which has become known as the ‘Gambler’s Ruin’ problem.

This is an idea set in the form of a problem by Pascal for Fermat, subsequently published by Christiaan Huygens (‘On reasoning in games of chance’, 1657) and formally solved by Jacobus Bernoulli (‘Ars Conjectandi’, 1713).

One way of stating the problem is as follows. If you play any gambling game long enough, will you eventually go bankrupt, even if the odds are in your favour, if your opponent has unlimited funds?

For example say that you and your opponent toss a coin, where the loser pays the winner £1. The game continues until either you or your opponent has all the money. Suppose you have £10 to start and your opponent has £20. What are the probabilities that a) you and b) your opponent, will end up with all the money?

The answer is that the player who starts with more money has more chance of ending up with all of it. The formula is:

P1 = n1 / (n1 + n2)

P2 = n2   / (n1 + n2)

Where n1 is the amount of money that player 1 starts with, and n2 is the amount of money that player 2 starts with, and P1 and P2 are the probabilities that player 1 or player 2, your opponent, wins.

In this case, you start with £10 of the £30 total, and so have a 10/ (10+20) = 10/30 = 1/3 chance of winning the £30; your opponent has a 2/3 chance of winning the £30. But even if you do win this game, and you play the game again and again, against different opponents, or the same one who has borrowed more money, eventually you will lose your entire bankroll. This is true even if the odds are in your favour. Eventually you will meet a long-enough bad streak to bankrupt you. In other words, infinite capital will overcome any finite odds against it. This is one version of the ‘Gambler’s Ruin’ problem, and many gamblers over the years have been ruined because of their unawareness of it.


  1. What is the probability of throwing at least one double-six in 26 throws of a pair of dice?
  2. You and your opponent toss a coin, where the loser pays the winner £10. The game continues until either you or your opponent has all the money. Suppose you have £100 to start and your opponent has £400. What are the probabilities that a) you and b) your opponent, will end up with all the money?

References and Links

DeMere’s Paradox. ProofWiki.

One gambling problem that launched modern probability theory. Introductory Statistics.

deMere’s Problem. WolframMathWorld.

Gambler’s Ruin. WolframMathWorld.

The Problem of Existence – Guide Notes.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

It shouldn’t be possible for us to exist. But we do. That’s counterintuitive. Take, for example the ‘Cosmological Constant.’ What it represents is a sort of unobserved ‘energy’ in the vacuum of space which possesses density and pressure, which prevents a static universe from collapsing in upon itself. We know how much unobserved energy there is because we know how it affects the Universe’s expansion. But how much should there be? The easiest way to picture this is to visualise ‘empty space’ as containing ‘virtual’ particles that continually form and then disappear. This ‘empty space’, it turns out, ‘weighs’ 10 to the power of 93 grams per cubic centimetre. Yet the actual figure differs from that predicted by a factor of 10 to the power of 120. The ‘vacuum energy density’ as predicted is simply 10120 times too big. That’s a 1 with 120 zeros after it. So there is something cancelling out all this energy, to make it 10 to the power of 120 smaller in practice than it should be in theory. In other words, the various components of vacuum energy are arranged so that they essentially cancel out.

Now this is very fortuitous. If the cancellation figure was one power of ten different, 10 to the power of 119, then galaxies could not form, as matter would not be able to condense, so no stars, no planets, no life. So we are faced with the fact that the positive and negative contributions to the cosmological constant cancel to 120 digit accuracy, yet fail to cancel beginning at the 121st digit. In fact, the cosmological constant must be zero to within one part in roughly 10120 (and yet be nonzero), or else the universe either would have dispersed too fast for stars and galaxies to have formed, or else would have collapsed upon itself long ago. How likely is this by chance? Essentially, it is the equivalent of tossing a coin and needing to get heads 400 times in a row and achieving it.

Now, that’s just one constant that needs to be just right for galaxies and stars and planets and life to exist. There are quite a few, independent of this, which have to be equally just right, most notably the strength of gravity and of the strong nuclear force relative to electromagnetism and the observed strength of the weak nuclear force. Others include the difference between the masses of the two lightest quarks and the mass of the electron relative to the quark masses, the value of the global cosmic energy density in the very early universe, and the relative amplitude of density fluctuations in the early universe. If any of these constants had been slightly different, stars and galaxies could not have formed.

There is also the symmetry/asymmetry paradox. When symmetry is required of the Universe, for example in a perfect balance of positive and negative charge, conservation of electric charge is critically ensured. If there were an equal number of protons and antiprotons, of matter and antimatter, produced by the Big Bang, they would have annihilated each other, leaving a Universe empty of its atomic building blocks. Fortuitously for the existence of a live Universe, protons actually outnumbered antiprotons by a factor of just one in one billion. If the perfect symmetry of the charge and almost vanishingly tiny asymmetry of matter and antimatter were reversed, if protons and antiprotons had not differed in number by that one part in a billion, there would be no galaxies, no stars, no planets, no life, no consciousness, no question for us to consider.

In summary, then, if the conditions in the Big Bang which started our Universe had been even a tiniest of a tiniest of a tiny bit different, with regard to a number of independent physical constants, our galaxies, stars and planets would not have been able to exist, let alone lead to the existence of living, thinking, feeling things. So why are they so right?

Let us first tackle those who say that if they hadn’t been right we would not have been able to even ask the question. This sounds a clever point but in fact it is not. For example it would be absolutely bewildering how I could have survived a fall out of an aeroplane from 39,000 feet onto tarmac without a parachute, but it would still be a question very much in need of an answer. To say that I couldn’t have posed the question if I hadn’t survived the fall is no answer at all.

Others propose the argument that since there must be some initial conditions, these conditions which gave rise to the Universe and life within it possible were just as likely to prevail as any others, so there is no puzzle to be explained.

But this is like saying that there are two people, Jack and Jill, who are arguing over whether Jill can control whether a fair coin lands heads or tails. Jack challenges Jill to toss the coin 400 times. He says he will be convinced of Jill’s amazing skill if she can toss heads followed by tails 200 times in a row, and she proceeds to do so. Jack could now argue that a head was equally likely as a tail on every single toss of the coin, so this sequence of heads and tails was, in retrospect, just as likely as any other outcome. But clearly that would be a very poor explanation of the pattern that just occurred. That particular pattern was clearly not produced by coincidence. Yet it’s the same argument as saying that it is just as likely that the initial conditions were just right to produce the Universe and life to exist as that any of the other pattern of billions of initial conditions that would not have done so. There may be a reason for the pattern that was produced, but it needs a more profound explanation than proposing that it was just coincidence.

A second example. There is one lottery draw, devised by an alien civilisation. The lottery balls, numbered from 1 to 59, are to be drawn, and the only way that we will escape destruction, we are told, is if the first 59 balls out of the drum emerge as 1 to 59 in sequence. The numbers duly come out in that exact sequence. Now that outcome is no less likely than any other particular sequence, so if it came out that way a sceptic could claim that we were just lucky. That would clearly be nonsensical. A much more reasonable and sensible conclusion, of course, is that the aliens had rigged the draw to allow us to survive!

So the fact that the initial conditions are so fine-tuned deserves an explanation, and a very good one at that. It cannot be simply dismissed as a coincidence or a non-question.

An explanation that has been proposed that does deserve serious scrutiny is that there have been many Big Bangs, with many different initial conditions. Assuming that there were billions upon billions of these, eventually one will produce initial conditions that are right for the Universe to at least have a shot at existing.

In this apparently theory, we are essentially proposing a process statistically along the lines of aliens drawing lottery balls over and over again, countless times, until the numbers come out in the sequence 1 to 59.

On this basis, a viable Universe could arise out of re-generating the initial conditions at the Big Bang until one of the lottery numbers eventually comes up. Is this a simpler explanation of why our Universe and life exists than an explanation based on a primal cause, and in any case does simplicity matter as a criterion of truth? This is the first question and it is usually accepted in the realm of scientific enquiry. A simpler explanation of known facts is usually accepted as superior to a more complex one.

Of course, the simplest state of affairs would be a situation in which nothing had ever existed. This would also be the least arbitrary, and certainly the easiest to understand. Indeed, if nothing had ever existed, there would have been nothing to be explained. Most critically, it would solve the mystery of how things could exist without their existence having some cause. In particular, while it is not possible to propose a causal explanation of why the whole Universe or Universes exists, if nothing had ever existed, that state of affairs would not have needed to be caused. This is not helpful to us, though, as we know that in fact at least one Universe does exist.

Take the opposite extreme, where every possible Universe exists, underpinned by every possible set of initial conditions. In such a state of affairs, most of these might be subject to different fundamental laws, governed by different equations, composed of different elemental matter. There is no reason in principle, on this version of reality, to believe that each different type of Universe should not exist over and over again, up to an infinite number of times, so even our own type of Universe could exist billions of billions of times, or more, so that in the limit everything that could happen has happened and will happen, over and over again. This may be a true depiction of reality, but it or anything anywhere remotely near it, seems a very unconvincing one. In any case, our sole source of understanding about the make-up of a Universe is a study of our own Universe. On what basis, therefore, can we scientifically propose that the other speculative Universes are governed by totally different equations and fundamental physical laws? They may be, but that is a heroic assumption.

Perhaps the laws are the same, but the constants that determines the relative masses of the elementary particles, the relative strength of the physical forces, and many other fundamentals, differ but not the laws themselves. If so, what is the law governing how these constants vary from Universe to universe, and where do these fundamental laws come from? From nothing? It has been argued that absolutely no evidence exists that any other Universe exists but our own, and that the reason that these unseen Universes is proposed is simply to explain the otherwise baffling problem of explaining how our Universe and life within it can exist. That may well be so, but we can park that for now as it is still at least possible that they do exist.

So let’s step away from requiring any evidence, and move on to at least admitting the possibility that there are a lot of universes, but not every conceivable universe. One version of this is that the other Universes have the same fundamental laws, subject to the same fundamental equations, and composed of the same elemental matter as ours, but differ in the initial conditions and the constants. But this leaves us with the question as to why there should be only just so many universes, and no more. A hundred, a thousand, a hundred thousand, whatever number we choose requires an explanation of why just that number. This is again very puzzling. If we didn’t know better, our best ‘a priori’ guess is that there would be no universes, no life. We happen to know that’s wrong, so that leaves our Universe; or else a limitless number of universes where anything that could happen has or will, over and over again; or else a limited number of universes, which begs the question, why just that number?

Is it because certain special features have to obtain in the initial conditions before a Universe can be born, and that these are limited in number. Let us assume this is so. This only begs the question of why these limited features cannot occur more than a limited number of times. If they could, there is no reason to believe the number of universes containing these special features would be less than limitless in number. So, on this view, our Universe exists because it contains the special features which allow a Universe to exist. But if so, we are back with the problem arising in the conception of all possible worlds, but in this case it is only our own type of Universe (i.e. obeying the equations and laws that underpin this Universe) that could exist limitless times. Again, this may be a true depiction of reality, but it seems a very unconvincing one.

The alternative is to adopt an assumption that there is some limiting parameter to the whole process of creating Universes, along some version of string theory which claims that there are a limit of 10 to the power of 500 solutions (admittedly a dizzyingly big number) to the equations that make up the so-called ‘landscape’ of reality. That sort of limiting assumption, however realistic or unrealistic it might be, would seem to offer at least a lifeline to allow us to cling onto some semblance of common sense.

Before summarising where we have got to, a quick aside on the ‘Great Filter’ idea, relating to the question of how life of any form could arise out of inanimate matter, and ultimately to human consciousness. Observable civilisations don’t seem to happen much from what we know now, and possibly only once. Indeed, even in a universe that manages to exist, the mind-numbingly small improbability of getting from inanimate matter to conscious humans seems to require a series of steps of apparently astonishing improbability. The Filter refers to the causal path from simple inanimate matter to a visible civilisation. The underpinning logic is that almost everything that starts along this path is blocked along the way, which might be by means of one extremely hard step, or many very, very hard steps. Indeed, it’s commonly supposed that it has only once ever happened here on earth. Just exactly once, traceable so far to LUCA (our Last Universal Common Ancestor). If so, it may be why the universe out there seems for the most part to be quite dead. The biggest filter, so the argument goes, is that the origin of life from inanimate matter is itself very, very, very hard. It’s a sort of Resurrection but an order of magnitude harder because the ‘dead stuff’ had never been alive, and nor had anything else! And that’s just the first giant leap along the way. This is a big problem of its own but that’s for another day, so let’s leave that aside and go back a step, to the origin of the universe. Before we do so, let us as I suggested before our short detour, summarise very quickly.

Here goes. If we didn’t know better, our best guess, the simplest description of all possible realities, is that nothing exists. But we do know better, because we are alive and conscious, and considering the question. But our Universe is far, far, far too fine-tuned, by a factor of billions of billions, to exist by chance if it is the only Universe. So there must be more, if our Universe is caused by the roll of the die, a lot more. But how many more? If there is some mechanism for generating experimental universe upon universe, why should there be a limit to this process, and if there is not, that means that there will be limitless universes, including limitless identical universes, in which in principle everything possible has happened, and will happen, over and over again.

Even if we accept there is some limiter, we have to ask what causes this limiter to exist, and even if we don’t accept there is a limiter, we still need to ask what governs the equations representing the initial conditions to be as they are, to create one Universe or many. What puts life into the equations and makes a universe or universes at all? And why should the mechanism generating life into these equations have infused them with the physical laws that allow the production of any universe at all?

Some have speculated that we can create a universe or universes out of nothing, that a particle and an anti-particle, for example could in theory spontaneously be generated out of what is described as a ‘quantum vacuum’. According to this theoretical conjecture, the Universe ‘tunnelled’ into existence out of nothing.

This would be a helpful handle for proposing some rational explanation of the origin of the Universe and of space-time if a ‘quantum vacuum’ was in fact nothingness. But that’s the problem with this theoretical foray into the quantum world. In fact, a quantum vacuum is not empty or nothing in any real sense at all. It has a complex mathematical structure, it is saturated with energy fields and virtual-particle activity. In other words, it is a thing with structure and things happening in it. As such, the equations that would form the quantum basis for generating particles, anti-particles, fluctuations, a Universe, actually exist, possess structure. They are not nothingness, not a void.

To be more specific, according to relativistic quantum field theories, particles can be understood as specific arrangements of quantum fields. So one particular arrangement could correspond to there being 28 particles, another 240, another to no particles at all, and another to an infinite number. The arrangement which corresponds to no particles is known as a ‘vacuum’ state. But these relativistic quantum field theoretical vacuum states are indeed particular arrangement of elementary physical stuff, no less than so than our planet or solar system. The only case in which there would be no physical stuff would be if the quantum fields ceased to exist. But that’s the thing. They do exist. There is no something from nothing. And this something, and the equations which infuse it, has somehow had the shape and form to give rise to protons, neutrons, planets, galaxies and us.

So the question is what gives life to this structure, because without that structure, no amount of ‘quantum fiddling’ can create anything. No amount of something can be produced out of nothing. Yes, even empty space is something with structure and potential. More basically, how and why should such a thing as a ‘quantum vacuum’ even have existed, begun to exist, let alone be infused with the potential to create a Universe and conscious life out of non-conscious somethingness?

It is certainly a puzzle, and arguably one without an intuitive solution.


If the conditions in the Big Bang which started our Universe had been even a tiniest of a tiniest of a tiny bit different, with regard to a number of independent physical constants, the galaxies, stars and planets would not have been able to exist. But if we didn’t exist, we couldn’t have asked the question as to why they were so right. In any case, since there must be some initial conditions, the conditions which gave rise to the Universe and life, however fortuitous, were just as likely to prevail as any others. So there is, for both reasons, no puzzle to be explained. Is this a convincing rebuttal of the ‘Fined Tuned’ universe problem. Why? Why not?

Reading and Links

Derek Parfit, ‘Why anything? Why this? Part 1. London Review of Books, 20, 2, 22 January 1998, pp. 24-27.

Derek Parfit, ‘Why anything? Why this? Part 2. London Review of Books, 20, 3, 5 February 1998, pp. 22-25.

John Piippo, Giving Up on Derek Parfit, July 22, 2012

A universe made for me? Physics, fine-tuning and life

John Horgan, ‘Science will never explain why there’s something rather than nothing’, Scientific American, April 23, 2012.

David Bailey, What is the cosmological constant paradox, and what is its significance? 1 January 2017.

Fine Tuning of the Universe

The Great Filter – are we almost past it?

Dragon Debris?

Fine Tuning in Cosmology. Chapter 2. In: Bostrom, N. Anthropic Bias: Observation Selection Effects in Science and Philosophy. 2002.

Last Common Universal Ancestor (LUCA)

David Albert, ‘On the Origin of Everything’, Sunday Book Review, The New York Times, March 23, 2012.

Are we living inside the Matrix? Guide Notes.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

Do we live in a simulation, created by an advanced civilisation, in which we are part of some sophisticated virtual reality experience? For this to be a possibility we can make the obvious assumption that sufficiently advanced civilisations will possess the requisite computing and programming power to create what philosopher Nick Bostrom termed such ‘ancestor simulations’. These simulations would be complex enough for the minds that are simulated to be conscious and able to experience the type of experiences that we do. The creators of these simulations could exist at any stage in the development of the universe, even billions of years into the future.

The argument around simulation goes like this. One of the following three statements must be correct.

  1. That civilisations at our level of development always or almost always disappear before becoming technologically advanced enough to create these simulations.
  2. That the proportion of these technologically advanced civilisations that wish to create these simulations is zero or almost zero.
  3. That we are almost sure to be living in such a simulation.

To see this, let’s examine each proposition in turn.

  1. Suppose that the first is not true. In that case, a significant proportion of civilisations at our stage of technology go on to become technologically advanced enough to create these simulations.
  2. Suppose that the second is not true. In this case, a significant proportion of these civilisations run such simulations.
  3. If both of the above propositions are not true, then there will be countless simulated minds indistinguishable to all intents and purposes from ours, as there is potentially no limit to the number of simulations these civilisations could create. The number of such simulated minds would almost certainly be overwhelmingly greater than the number of minds that created them. Consequently, we would be quite safe in assuming that we are almost certainly inside a simulation created by some form of advanced civilisation.

For the first proposition to be untrue, civilisations must be able to go through the phase of being able to wipe themselves out, either deliberately or by accident, carelessness or neglect, and never or almost never do so. This might perhaps seem unlikely based on our experience of this world, but becomes more likely if we consider all other possible worlds.

For the second proposition to be untrue, we would have to assume that virtually all civilisations that were able to create these simulations would decide not to do so. This again is possible, but would seem unlikely.

If we consider both propositions, and we think it is unlikely that no civilisations survive long enough to achieve what Bostrom calls ‘technological maturity’, and that it is unlikely that hardly any would create ‘ancestor simulations’ if they could, then anyone considering the question is left with a stark conclusion. They really are living in a simulation.

To summarise. An advanced ‘technologically mature’ civilisation would have the capability of creating simulated minds. Based on this, at least one of three propositions must be true.

  1. The proportion of these advanced civilisations is close to zero or zero.
  2. The proportion of these advanced civilisations that wish to run these simulations is close to zero.
  3. The proportion of those consciously considering the question who are living in a simulation is close to one.

If the first of these propositions is true, we will almost certainly not survive to become ‘technologically mature.’ If the second proposition is true, virtually no advanced civilisations are interested in using their power to create such simulations. If the third proposition is true, then conscious beings considering the question are almost certainly living in a simulation.

Through the veil of our ignorance, it might seem sensible to assign equal credence to all three, and to conclude that unless we are currently living in a simulation, descendants of this civilisation will almost certainly never be in a position to run these simulations.

Strangely indeed, the probability that we are living in a simulation increases as we draw closer to the point at which we are able and willing to do so. At the point that we would be ready to create our own simulations, we would paradoxically be at the very point when we were almost sure that we ourselves were simulations. Only by refraining to do so could we in a certain sense make it less likely that we were simulated, as it would show that at least one civilisation that was able to create simulations refrained from doing so. Once we took the plunge, we would know that we were almost certainly only doing so as simulated beings. And yet there must have been someone or something that created the first simulation. Could that be us, we would be asking ourselves? In our simulated hearts and minds, we would already know the answer!


With reference to Bostrom’s ‘simulation’ reasoning, generate an estimate as to the probability that we are living in a simulated world.

References and Links

The Simulation Argument.

Do we live in a computer simulation? Nick Bostrom. New Scientist. 00Month 2006. 8-9.

Are you living in a computer simulation? Bostrom, N. Philosophical Quarterly (2003). 53, 211. 243-255.

Click to access simulation.pdf

Hempel’s Paradox – in a nutshell.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

You spot a pink flamingo and wonder to yourself whether all flamingos are pink. What would it take to confirm or disprove the hypothesis? The nice thing about this sort of hypothesis is that it’s testable and potentially falsifiable. All it takes is to find a flamingo that is not pink, and I can conclude that not all flamingos are pink. Just one observation can change my flamingo world view. It doesn’t matter how many pink flamingos you witness, however, no number can prove the hypothesis short of the number of flamingos that potentially exist. Still, the more you see that are pink, the more probable it becomes that all flamingos are actually pink. How probable you consider that is at any given time is related to how probable you thought it was before you saw the latest one. While considering this, you see someone wearing blue tennis shoes. Does this make it more likely that all flamingos are pink? This is one example of a broader paradox first formally identified by Carl Gustav Hempel, sometimes known as Hempel’s paradox or else the Raven Paradox.

The Raven paradox arises from asking whether observing a green apple makes it more likely that all ravens are black, assuming that you don’t know the answer. It would intuitively seem not. Why should seeing a green apple tell you anything about the colour of ravens? The way to answer this is to re-state ‘All ravens are black’ as ‘Everything that is not black is not a raven.’ In fact, these two statements are logically equivalent. To see this, assume there are just two ravens and two tennis shoes (one right-foot, one left-foot) in the whole world. Now you identify the colour of each of these objects. You observe that both tennis shoes are blue and the other two objects are black. So you announce that everything that is not black (each of the tennis shoes) is not a raven. This is identical to saying that all ravens are black. The logic universalises to any number of objects and colours. Assume now we see just one of the tennis shoes and it turns out to be blue. You can now announce that one possible thing that is not black is not a raven. If you see the other tennis shoe and it is blue, that means that there are now two things that are not black that are not a raven. Each time you see something, it is possible that you would not be able to say this – i.e. you would say instead that you have seen something not black and it is a raven. It is like being dealt a playing card from a deck of four which contains only blue or black cards. You are dealt a black card, and it shows a raven. You know that at least one of the other cards is a raven, and it could be a black card or a blue card. You receive a blue card. Now, before you turn it over, what is the chance it is a raven? You don’t know, but whatever it is, the chance that only black cards show ravens improves if you turn the blue card over and it shows a tennis shoe. Each time you turn a blue card over it could show a raven. Each time that it doesn’t makes it more likely that none of the blue cards shows a raven. Substitute all non-ravens for tennis shoes and all colours other than black for the blue cards, and the result universalises. Every time you see an object that is not black and is not a raven, it makes it just that tiny, tiny bit more likely that everything that is not black is not a raven, i.e. that all ravens are black. How much more likely? This depends on how observable non-black ravens would be if they exist. If there is no chance that they would be seen even if they exist, because non-black ravens never emerge from the nest, say, it is much more difficult to falsify the proposition that all ravens are black. So when you observe a blue tennis shoe it offers less evidence for the ‘all ravens are black’ hypothesis than when it is just possible that the blue thing you saw would have been a raven and not a tennis shoe. More generally, the more likely a non-black raven is to be observed if it exists, the more evidence observation of a non-black object offers for the hypothesis that all ravens are black.

So to summarise, we want to test the hypothesis that all ravens are black. We could go out, find some ravens, and see if they are black. On the other hand, we could simply take the logically equivalent contrapositive of the hypothesis, i.e. that all non-black things are non-ravens. This suggests that we can conduct meaningful research on the colour of ravens from our home or office without observing a single raven, but by simply looking at random objects, noting that they are not black, and checking if they are ravens. As we proceed, we collect data that increasingly less support to the hypothesis that all non-black things are non-ravens, i.e. that all ravens are black. Is there a problem with this approach?

There is no logical flaw in the approach, but the reality is that there are many more non-black things than there are ravens, so if there was a pair (raven, non-black), then we would be much more likely to find it by randomly sampling a raven then by sampling a non-black thing. Therefore, if we sample ravens and fail to find a non-black raven, then we’re much more confident in the truth of our hypothesis that “all ravens are black,” simply because the hypothesis had a much higher chance of being falsified by sampling ravens than by sampling random non-black things.

The same goes for pink flamingos. So we have a paradox traceable to Hempel. I suggest we can do this by appeal to a ‘Possibility Theorem’ which I advance here.

Let’s do this by taking the propositions in the thought experiment in turn. Proposition 1: All flamingos are pink. Proposition 2 (logically equivalent to Proposition 1): Everything that is not pink is not a flamingo. Proposition 3 (advanced here as the Possibility Theorem): If something might or might not exist, but is unobservable, it is more likely to exist than something which can be observed, with any positive probability, but is not observed. If something might or might not exist, it is more likely to exist if it is less likely to be observed than something else which is more likely to be observed, and is not observed. So when I see two blue tennis shoes, I am ever more slightly more confident that all flamingos are pink than before I saw them, and especially so if any non-pink flamingos that might be out there would be easy to spot. And I’d still be wrong, but for all the right reasons, until I saw an orange or white flamingo, and then I’d be right, and sure.


Does seeing a blue tennis shoe make it more or less likely that all flamingos are pink, or neither?

References and Links

Hempel’s Ravens Paradox. PRIME.

Raven Paradox. Wikipedia.