The Favourite-Longshot Bias is the well-established tendency in most betting markets for bettors to over-bet ‘longshots’ (events with long odds, i.e. low probability events) and to relatively under-bet ‘favourites’ (events with short odds, i.e. high probability events).
Assume, for example, that Mr. Miller and Mr. Stiller both start with £1,000.
Now Mr. Miller places a level £10 stake on 100 horses quoted at 2 to 1
Mr. Stiller places a level £10 stake on 100 horses quoted at 20 to 1.
Who is likely to end up with more money at the end?
My Ladbrokes Flat Season Pocket Companion for 1990 provides a nicely laid out piece of evidence here for British flat horse racing between 1985 and 1989. The table conveniently presented in the Companion shows that not one out of 35 favourites sent off at 1/8 or shorter (as short as 1/25) lost between 1985 and 1989. This means a return of between 4% and 12.5% in a couple of minutes, which is an astronomical rate of interest. The point being made is that broadly speaking the shorter the odds, the better the return. The group of ‘white hot’ favourites (odds between 1/5 and 1/25) won 88 out of 96 races for a 6.5% profit. The following table looks at other odds groupings.
Odds Wins Runs Profit %
1/5-1/2 249 344 +£1.80 +0.52
4/7-5/4 881 1780 -£82.60 -4.64
6/4 -3/1 2187 7774 -£629 -8.09
7/2-6/1 3464 21681 -£2237 -10.32
8/1-20/1 2566 53741 -£19823 -36.89
25/1-100/1 441 43426 -£29424 -67.76
An interesting argument advanced by the Strathclyde-based statistician Dr. Robert Henery in 1985 is that the favourite-longshot bias is a consequence of bettors discounting a fixed fraction of their losses, i.e. they underweight their losses compared to their gains.
This argument also explains an observed link between the sum of bookmakers’ prices and the number of runners in a race. The prices being summed here are simply the odds. If, for example, odds of 3/1 (against) are offered about each of the five horses in a race, the implied probability of winning for each horse is ¼ and the sum of prices is 5/4.
In this context, an ‘over-round’ is defined as the excess of the sum of prices over 1, in this case ¼.
The rationale behind Henery’s hypothesis is that bettors will tend to explain away and therefore discount losses as atypical, or unrelated to the judgment of the bettor.
This is consistent with contemporaneous work on the psychology of gambling, such as Gilovich in 1983 and Gilovich and Douglas in 1986.
These studies demonstrate how gamblers tend to discount their losses, often as ‘near wins’ or the outcome of ‘fluke’ events, while bolstering their wins.
Let’s look more closely at how the Henery odds transformation works.
If the true probability of a horse losing a race is q, then the true odds against winning are q/(1-q).
For example, if the true probability of a horse losing a race (q) is ¾, the chance that it will win the race is ¼, i.e. 1- ¾. The odds against it winning are: q/(1-q) = 3/4/(1-3/4) = 3/4/(1/4) = 3/1.
Henery now applies a transformation whereby the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is ½ (q=1/2), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = ½. ¾ = 3/8, i.e. a subjective chance of winning of 5/8.
So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 50% (Evens, i.e. q=1/2) is 3/5 (60%), i.e. odds-on.
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 3/8/(1-3/8) = 3/8/(5/8) = 3/5
If the true probability of a horse losing a race is 80%, so that the true odds against winning are 4/1 (q = 0.8), then the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is 4/5 (q=0.2), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = 3/4. 4/5 = 12/20, i.e. a subjective chance of winning of 8/20 (2/5).
So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 20% (4 to 1, i.e. q=0.8) is 6/4 (40%).
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 12/20 / (1-12/20) = 12/8 = 6/4
To take this to the limit, if the true probability of a horse losing a race is 100%, so that the true odds against winning are ∞ to 1 against (q = 1), then the bettor will again assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is 100% (q=1), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = 3/4. 1 = 3/4, i.e. a subjective chance of winning of 1/4.
So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 100% (∞ to 1, i.e. q=1) is 3/1 (25%).
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 3/4 / (1/4) = 3/1
Similarly, if the true probability of a horse losing a race is 0%, so that the true odds against winning are 0 to 1 against (q = 0), then the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is 0% (q=0), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = 3/4. 0 = 0, i.e. a subjective chance of winning of 1.
So the perceived (subjective) odds associated of winning with true (objective odds) of losing of 0% (0 to 1, i.e. q=0) is also 0/1.
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 0 / 1 = 0/1
This can all be summarised in a table.
Objective odds (against) Subjective odds (against) | |
Evens 3/5 | |
4/1 6/4 | |
Infinity to 1 3/1 | |
0/1 0/1 |
We can now use these stylised examples to establish the bias.
In particular, the implication of the Henery odds transformation is that, for a given f of ¾, 3/5 is perceived as fair odds for a horse with a 1 in 2 chance of winning.
In fact, £100 wagered at 3/5 yields £160 (3/5 x £100, plus stake returned) half of the time (true odds = evens), i.e. an expected return of £80.
£100 wagered at 6/4 yields £250 (6/4 x £100, plus the stake back) one fifth of the time (true odds = 4/1), i.e. an expected return of £50.
£100 wagered at 3/1 yields £0 (3/1 x £100, plus the stake back) none of the time (true odds = Infinity to 1), i.e. an expected return of £0.
It can be shown that the higher the odds the lower is the expected rate of return on the stake, although the relationship between the subjective and objective probabilities remains at a fixed fraction throughout.
Now on to the over-round.
The same simple assumption about bettors’ behaviour can explain the observed relationship between the over-round (sum of win probabilities minus 1) and the number of runners in a race, n.
If each horse is priced according to its true win probability, then over-round = 0. So in a six horse race, where each has a 1 in 6 chance, each would be priced at 5 to 1, so none of the lose probability is shaded by the bookmaker. Here the sum of probabilities = (6 x 1/6) – 1 = 0.
If only a fixed fraction of losses, f, is counted by bettors, the subjective probability of losing on any horse is f(qi), where qi is the objective probability of losing for horse i, and the odds will reflect this bias, i.e. they will be shorter than the true probabilities would imply. The subjective win probabilities in this case are now 1-f(qi), and the sum of these minus 1 gives the over-round.
Where there is no discounting of the odds, the over-round (OR) = 0, i.e. n times correct odds minus 1. Assume now that f = ¾, i.e. ¾ of losses are counted by the bettor.
If there is discounting, then the odds will reflect this, and the more runners the bigger will be the over-round.
So in a race with 5 runners, q is 4/5, but fq = 3/4 x 4/5 = 12/20, so subjective win probability = 1-fq = 8/20, not 1/5. So OR = (5 x 8/20) – 1 = 1.
With 6 runners, fq = ¾ x 5/6 = 15/24, so subjective win probability = 1 – fq = 9/24. OR = (6x 9/24) – 1 = (54/24) -1 = 1_{1/4. }
With 7 runners, fq = ¾ x 6/7 = 18/28, so subjective win probability = 1-fq = 10/28. OR = (7 x 10/28) – 1 = 42/28 = 1_{1/2}
If there is no discounting, then the subjective win probability equals the actual win probability, so an example in a 5-horse is that each has a win probability of 1/5. Here, OR = (5×1/5) – 1 = 0. In a 6-horse race, with no discounting, subjective probability = 1/6. OR = (6 x 1/6) – 1 = 0.
Hence, the over-round is linearly related to the number of runners, assuming that bettors discount a fixed fraction of losses (the ‘Henery Hypothesis’).
If the Henery Hypothesis is correct as a way of explaining the favourite-longshot bias, the bias can be explained as the natural outcome of bettors’ pre-existing perceptions and preferences.
This is quite consistent with a market efficiently processing the information available to it.
Are there other explanations for the favourite-longshot bias, and the observed link between over-round and runners, which do not rely on the Henery Hypothesis? Any coherent theory of the favourite-longshot bias should be able to explain both observed regularities. That is a topic for another time.
How large should a randomly chosen group of people be, to make it more likely than not that at least two of them share a birthday?
For convenience, assume that all dates in the calendar are equally likely as birthdays, and ignore the Leap Year special of February 29^{th}
The first thing to look at is the likelihood that two randomly chosen people would share the same birthday.
Let’s call them Fred and Felicity. Say Felicity’s birthday is May 1^{st}. What is the chance that Fred shares this birthday with Felicity? Well there are 365 days in the year, and only one of these is May 1^{st} and we are assuming that all dates in the calendar are equally likely as birthdays.
So, the probability that Fred’s birthday is May 1^{st} is 1/365, and the chance he shares a birthday with Felicity is 1/365.
So what is the probability that Fred’s birthday is not May 1^{st? }It is 364/365. This is the probability that Fred doesn’t share a birthday with Felicity.
More generally, for any randomly chosen group of two people, the probability that the second person has a different birthday to the first is 364/365.
With 3 people, the chance that all three are different is the chance that the first two are different (364/365) multiplied by the chance that the third birthday is different (363/365).
So, the probability that 3 people have different birthdays = 364/365 x 363/365
This can be written as (364)_{2 }/ 365^{2}
Similarly, probability that 5 people have different birthdays = (364)_{4} / 365^{4}
= 364x363x362x361/365^{4}
So far, the chance of no matches is very high. But by the tenth person the probability of no matches is:
(364/365)*(363/365)(362/365)*(361/365)(360/365)*(359/365)(358/365)*(357/365) (356/365) = 0.8831
More generally, for n people, probability they all have different birthdays =
(364)_{n-1 } / 365^{n-1}
For 23 people, probability of all different birthdays = (364)_{22 }/ 365^{2} = 0.4927
For 22 people, probability of all different birthdays = (364)_{21 }/ 365^{2} = 0.5243
So, in a group of 23 people, there is a (1-0.4927) = 0.5073 chance of that at least two of the group share a birthday.
So how large should a randomly chosen group of people be, to make it more likely than not that at least two of them share a birthday? The answer is 23.
The intuition behind this is quite straightforward if we recognise just how many pairs of people there are in a group of 23 people, any pair of which could share a birthday.
In a group of 23 people, there are, according to the standard formula, ^{23}C_{2 }pairs of people (called 23 Choose 2) pairs of people.
Generally, the number of ways k things can be chosen from n is:
^{n} C _{k} = n! / (n-k)! k!
Thus, ^{23}C_{2 }= 23! / 21! 2! = 23 x 22 / 2 = 253
So, in a group of 23 people, there are 253 pairs of people to choose from.
_{ }Therefore, a group of 23 people generates 253 chances, each of size 1/365, of having at least two people in the group sharing the same birthday.
These chances have some overlap: if A and B have a common birthday, and A and C have a common birthday, then inevitably so do B and C. So the probability of at least two people sharing a birthday in a group of 23 is less than 253/365 (69.3%). It is, as shown previously, 50.73%.
To conclude, the next time you see two football teams line up, include the referee. It is now more likely than not that two of those on the pitch share the same birthday. Strange, but true!
Let’s suppose Bill and Ben each toss separate coins. Let A represent the variable “Bill’s coin toss outcome”, and B represent the variable “Ben’s coin toss outcome”. Both A and B have two possible values (Heads and Tails). It would be uncontroversial to assume that A and B are independent. Evidence about B will not change our belief in A. In other words, the fact that Ben’s coin lands heads does not affect the likelihood that Bill will throw heads. What happens to Bill’s coin and Ben’s coin are unrelated. They are independent.
Now suppose both Bill and Ben toss the same coin. Again let A represent the variable “Bill’s coin toss outcome”, and B represent the variable “Ben’s coin toss outcome”. Assume also that there is a possibility that the coin is biased towards heads but we do not know this for certain. In this case A and B are not independent. Observing that Ben’s coin has landed heads might cause us to increase our belief that Bill will throw a Heads.
In the second example, the variables A and B are both dependent on a separate variable C, “the coin is biased towards Heads” (which has the values True or False). Although in this case A and B are not independent, it turns out that once we know for certain the value of C then any evidence about B cannot change our belief about A.
In such a case we say that A and B are conditionally independent given C.
In many real life situations variables which are believed to be independent are actually only independent conditional on some other variable. Let’s take an example. Suppose that Ted and Ned live on opposite sides of the city and come to work by completely different means. Let’s say Ted arrives by train while Ned drives to work. Let A represent the variable “Ted late” (which has values true or false) and similarly let B represent the variable “Ned late”. At first glance, it might seem that A and B are independent. However, even if Ted and Ned lived and worked in different countries there may be factors (such as an international fuel shortage) which could affect both Ted and Ned. In that case, A and B are not independent. Again, it doesn’t seem reasonable to exclude the possibility that both Ted and Ned may be affected by a rail strike (C). Clearly the likelihood that Ted will arrive late to work will increase if the rail strike takes place; but the likelihood that Ned will arrive late to work might also increase, indirectly, because of the additional traffic on the roads caused by the rail strike. ‘Ted to be late’ and ‘Ned to be late’ are in this case conditionally independent GIVEN the rail strike.
Two events, A and B, are defined to be conditionally independent, given some other event, C, if the probability of both A occurring and B occurring, given some other event, C, is equal to the probability of A occurring given C multiplied by the probability of B occurring given C, i.e.
The notation used for this is: P(AՈB I C) = P(AIC) . P(BIC)
In the example we have just considered, the probability that Ted and Ned are late to work given the train strike equals the probability that Ted is late given the strike multiplied by the probability that Ned is late given the strike.
This takes us to a new question.
Does conditional independence, given C, imply unconditional independence?
Say, for example, Jack is playing Jill at snooker. Jack and Jill know nothing about each other’s ability at snooker.
Now suppose Jill wins her first 5 games. This provides evidence for her to assess the strength of her opponent, Jack, and vice-versa.
But the games may be conditionally independent (Jill is equally likely to win the fifth game as the second given Jack and Jill’s relative skill at chess).
Even so, they are not independent (that would mean that winning the first five games tells you nothing about the likelihood of winning the sixth).
So the answer to the latest question is No. Conditional independence does not imply unconditional independence.
Finally, does unconditional independence imply conditional independence?
To answer this, let’s imagine an event with multiple causes.
Let A be the event that the fire alarm goes off.
Now suppose this could be caused by a genuine fire (F) or someone making popcorn (P), which sets off a false alarm.
Now let’s suppose that the probability of a fire is completely independent of the probability of someone making popcorn. But also that the probability the alarm is indicating a real fire is 100 per cent if nobody is making popcorn.
So the probability of a fire and the probability of making popcorn are independent of each other, yet the probability it’s a genuine fire if the alarm goes off is conditionally dependent on whether someone is making popcorn (you can be sure it’s a genuine fire if nobody is making popcorn).
So, does unconditional independence imply conditional independence? The answer is No.
So, in summary, events may be independent or they may be conditionally independent. Conditional independence does not, however, imply unconditional independence, and unconditional independence does not imply conditional independence.
Further Reading and Links
One of the most celebrated pieces of correspondence in the history of probability and gambling, and one of which I am particularly fond, involves an exchange of letters between the greatest diarist of all time, Samuel Pepys, and the greatest scientist of all time, Sir Isaac Newton.
The six letters exchanged between Pepys in London and Newton in Cambridge related to a problem posed to Newton by Pepys about gambling odds. The interchange took place between November 22 and December 23, 1693. The ostensible reason for Mr. Pepys’ interest was to encourage the thirst for truth of his young friend, Mr. Smith. Whether Sir Isaac believed that tale or not we shall never know. The real reason, however, was later revealed in a letter written to a confidante by Pepys indicating that he himself was about to stake 10 pounds, a considerable sum in 1693, on such a bet. Now we’re talking!
The first letter to Newton introduced Mr. Smith as a fellow with a “general reputation…in this towne (inferiour to none, but superiour to most) for his maistery [of]…Arithmetick”.
What emerged has come down to us as the aptly named Newton-Pepys problem.
Essentially, the question came down to this:
Which of the following three propositions has the greatest chance of success.
A. Six fair dice are tossed independently and at least one ‘6’ appears
B. 12 fair dice are tossed independently and at least two ‘6’s appear.
C. 18 fair dice are tossed independently and at least three ‘6’s appear.
Pepys was convinced that C. had the highest probability and asked Newton to confirm this.
Newton chose A as the highest probability, then B, then C, and produced his calculations for Pepys, who wouldn’t accept them.
So who was right? Newton or Pepys?
Well, let’s see.
The first problem is the easiest to solve.
What is the probability of A?
Probability that one toss of a coin produces a ‘6’ = 1/6
So probability that one toss of a coin does not produce a ‘6’ = 5/6
So probability that six independent tosses of a coin produces no ‘6’ = (5/6)^{6}
So probability of AT LEAST one ‘6’ in 6 tosses = 1 – (5/6)^{6} = 0.6651
So far, so good.
The probability of problem B and probability of problem C are more difficult to calculate and involve use of the binomial distribution, though Newton derived the answers from first principles, by his method of ‘Progressions’.
Both methods give the same answer, but using the more modern binomial distribution is easier.
So let’s do it, along the way by introducing the idea of so-called ‘Bernoulli trials’.
The nice thing about a Bernoulli trial is that it has only two possible outcomes.
Each outcome can be framed as a ‘yes’ or ‘no’ question (success or failure).
Let probability of success = p.
Let probability of failure = 1-p.
Each trial is independent of the others and the probability of the two outcomes remains constant for every trial.
An example is tossing a coin. Will it lands heads?
Another example is rolling a die. Will it come up ‘6’?
Yes = success (S); No = failure (F).
Let probability of success, P (S) = p; probability of failure, P (F) = 1-p.
So the question: How many Bernoulli trials are needed to get to the first success?
This is straightforward, as the only way to need exactly five trials, for example, is to begin with four failures, i.e. FFFFS.
Probability of this = (1-p) (1-p) (1-p) (1-p) p = (1-p)^{4 }p
Similarly, the only way to need exactly six trials is to begin with five failures, i.e. FFFFFS.
Probability of this = (1-p) (1-p) (1-p) (1-p) (1-p) p = (1-p)^{5} p
More generally, the probability that success starts on trial number n =
(1-p)^{n-1} p
This is a geometric distribution. This distribution deals with the number of trials required for a single success.
But what is the chance that the first success takes AT LEAST some number of trials, say 12 trials?
One method is to add the probability of 12 trials to prob. of 13 trials to prob. of 14 trials to prob. of 15 trials, etc. …………………………
Easier method: The only time you will need at least 12 trials is when the first 11 trials are all failures, i.e. (1-p)^{11}
In a sequence of Bernoulli trials, the probability that the first success takes at least n trials is (1-p)^{n-1}
Let’s take a couple of examples.
Probability that the first success (heads on coin toss) takes at least three trials (tosses of the coin)= (1-0.5)^{2} = 0.25
Probability that the first success (heads on coin toss) takes at least four trials (tosses of the coin)= (1-0.5)^{3} = 0.125
But so far we have only learned how to calculate the probability of one success in so many trials.
What if we want to know the probability of two, or three, or however many successes?
To take an example, what is the probability of exactly two ‘6’s in five throws of the die?
To determine this, we need to calculate the number of ways two ‘6’s can occur in five throws of the die, and multiply that by the probability of each of these ways occurring.
So, probability = number of ways something can occur multiplied by probability of each way occurring.
How many ways can we throw two ‘6’s in five throws of the die?
Where S = Success in throwing a ‘6’, F = Fail in throwing a ‘6’, we have:
SSFFF; SFSFF; SFFSF; SFFFS; FSSFF; FSFSF; FSFFS; FFSSF; FFSFS; FFFSS
So there are 10 ways of throwing two ‘6’s in five throws of the dice.
More formally, we are seeking to calculate how many ways 2 things can be chosen from 5. This is known as ‘5 Choose 2’, written as:
^{5 }C _{2}= 10
More generally, the number of ways k things can be chosen from n is:
^{n}C _{k} = n! / (n-k)! k!
n! (known as n factorial) = n (n-1) (n-2) … 1
k! (known as k factorial) = k (k-1) (k-2) … 1
Thus, ^{5}C _{2} = 5! / 3! 2! = 5x4x3x2x1 / (3x2x1x2x1) = 5×4/(2×1) = 20/2=10
So what is the probability of throwing exactly two ‘6’s in five throws of the die, in each of these ten cases? p is the probability of success. 1-p is the probability of failure.
In each case, the probability = p.p.(1-p).(1-p).(1-p)
= p^{2} (1-p)^{3}
Since there are ^{5} C _{2 }such sequences, the probability of exactly 2 ‘6’s =
10 p^{2 }(1-p)^{3}
Generally, in a fixed sequence of n Bernoulli trials, the probability of exactly r successes is:
^{n}C _{r} x p^{r} (1-p) ^{n-r}
This is the binomial distribution. Note that it requires that the probability of success on each trial be constant. It also requires only two possible outcomes.
So, for example, what is the chance of exactly 3 heads when a fair coin is tossed 5 times?
^{5}C _{3} x (1/2)^{3} x (1/2)^{2} = 10/32 = 5/16
And what is the chance of exactly 2 sixes when a fair die is rolled five times?
^{5 }C _{2}x (1/6)^{2} x (5/6)^{3} = 10 x 1/36 x 125/216 = 1250/7776 = 0.1608
So let’s now use the binomial distribution to solve the Newton-Pepys problem.
- What is the probability of obtaining at least one six with 6 dice?
- What is the probability of obtaining at least two sixes with 12 dice?
- What is the probability of obtaining at least three sizes with 18 dice?
First, what is the probability of no sixes with 6 dice?
P (no sixes with six dice) = ^{n} C _{x }. (1/6)^{x} . (5/6)^{n-x, }x = 0,1,2,…,n
Where x is the number of successes.
So, probability of no successes (no sixes) with 6 dice =
n!/(n-k)!k! = 6!/(6-0)!0! x (1/6)^{0} . (5/6)^{6-0} = 6!/6! X 1 x 1 x (5/6)^{6 = }(5/6)^{6}
Note that: 0! = 1
Here’s the proof: n! = n. (n-1)!
At n=1, 1! = 1. (1-1)!
So 1 = 0!
So, where x is the number of sixes, probability of at least one six is equal to ‘1’ minus the probability of no sixes, which can be written as:
P (x≥ 1) = 1 – P(x=0) = 1 – (5/6)^{6 }= 0.665 (to three decimal places).
i.e. probability of at least one six = 1 minus the probability of no sixes.
That is a formal solution to Part 1 of the Newton-Pepys Problem.
Now on to Part 2.
Probability of at least two sixes with 12 dice is equal to ‘1’ minus the probability of no sixes minus the probability of exactly one six.
This can be written as:
P (x≥2) = 1 – P(x=0) – P(x=1)
P(x=0) in 12 throws of the dice = (5/6)^{12}
P (x=1) in 12 throws of the dice = ^{12} C _{1} . (1/6)^{1} . (5/6)^{11}^{n}C _{k} = n! / (n-k)! k!
So ^{12} C _{1 }
= 12! / (12-1)! 1! = 12! / 11! 1! = 12
So, P (x≥2) = 1 – (5/6)^{12 }– 12. (1/6) . (5/6)^{11 }
= 1 – 0.112156654 – 2 . (0.134587985) = 0.887843346 – 0.26917597 =
= 0.618667376 = 0.619 (to 3 decimal places)
This is a formal solution to Part 2 of the Newton-Pepys Problem.
Now on to Part 3.
Probability of at least three sixes with 18 dice is equal to ‘1’ minus the probability of no sixes minus the probability of exactly one six minus the probability of at exactly two sixes.
This can be written as:
P (x≥3) = 1 – P(x=0) – P(x=1) – P(x=2)
P(x=0) in 18 throws of the dice = (5/6)^{18}
P (x=1) in 18 throws of the dice = ^{18} C _{1} . (1/6)^{1} . (5/6)^{17}
^{n}C _{k} = n! / (n-k)! k!
So ^{18} C _{1}
= 18! / (18-1)! 1! = 18
So P (x=1) = 18. (1/6)^{1} . (5/6)^{17}
P (x=2) = ^{18 }C _{2 .} (1/6)^{2} .(5/6)^{16}
^{18 }C _{2 }
_{ }= 18! / (18-2)! 2! = 18!/16! 2! = 18. (17/2)
So P (x=2) = 18. (17/2) (1/6)^{2 }(5/6)^{16}
So P(x=3) = 1 – P (x=0) – (P(x=1) – P (x=2)
P (x=0) = (5/6)^{18}
= 0.0375610365
P (x=1) = 18. 1/6. (0.0450732438) = 0.135219731
P (x=2) = 18. (17/2) (1/36) (0.0540878926) = 0.229873544
So P(x=3) = 1 – 0.0375610365 – 0.135219731 – 0.229873544 =
P(x≥3) = 0.597345689 = 0.597 (to 3 decimal places, )
This is a formal solution to Part 3 of the Newton-Pepys Problem.
So, to re-state the Newton-Pepys problem.
Which of the following three propositions has the greatest chance of success?
A. Six fair dice are tossed independently and at least one ‘6’ appears.
B. 12 fair dice are tossed independently and at least two ‘6’s appear.
C. 18 fair dice are tossed independently and at least three ‘6’s appear.
Pepys was convinced that C. had the highest probability and asked Newton to confirm this.
Newton chose A, then B, then C, and produced his calculations for Pepys, who wouldn’t accept them.
So who was right? Newton or Pepys?
According to our calculations, what is the probability of A? 0.665
What is the probability of B? 0.619
What is the probability of C? 0.597
So Sir Isaac’s solution was right. Samuel Pepys was wrong, a wrong compounded by refusing to accept Newton’s solution. How much he lost gambling on his misjudgement is mired in the mists of history. The Newton-Pepys Problem is not, and continues to tease our brains to this very day.
Further Reading and Links
http://datagenetics.com/blog/february12014/index.html
Zeno of Elea was a Greek philosopher of the 5^{th} century BC, best known for his paradoxes of motion, described by Aristotle in his ‘Physics’. Of these perhaps the best known is his paradox of the tortoise and Achilles, in its various forms. In a modern version, the antelope starts 100 metres ahead of the cheetah and moves at half the speed of the cheetah. Will the cheetah ever catch the antelope, assuming they don’t slow down?
Zeno’s paradox relies on the fact that when the cheetah reaches the starting position of the antelope, the antelope will have travelled 50 metres further. When the cheetah arrives at that point, the antelope will have travelled a further 25 metres, and so on. Zeno argued that this was an infinite process, and so does not have a final, finite step. So how can the cheetah ever catch the antelope?
There is a mathematical solution to the paradox, which goes like this:
Let S be the distance the cheetah runs and let 1 = 100 metres.
So S = 1 + ½ + ¼ + 1/8 + 1/16 + 1/32 …..
½ S = ½ + ¼ + 1/8 + 1/16 + 1/32 …..
Therefore, S – ½ S = 1
Therefore, S = 2
So the cheetah catches the antelope in 200 metres.
So an infinite process, with no final step, has a finite conclusion.
That’s the mathematical solution, but does that solve the intuitive paradox? How can an infinite process, with no final step, come to an end? I understand the mathematical solution, but somehow it is as unsatisfying as the wrapper of a chocolate bar. To me, the real chocolate remains untouched. Such paradoxes I refer to as ‘chocolate paradoxes.’ What they have in common is that they can be solved mathematically without really being solved at all.
For those who might differ with me, the Thomson’s Lamp thought experiment offers a related challenge. Devised by philosopher James F. Thomson in 1954, it goes like this. Think of a lamp with a switch. You flick the switch to turn the light on. At the end of one minute exactly you flick it off. At the end of a further half minute, you turn it on again. At the end of a further quarter minute you turn it off. And so on. The time between each turning on and off the lamp is always half the duration of the time before. Assume you have the superpower to do each turning on and turning off instantaneously.
Adding these up gives: 1 minute plus half a minute plus a quarter of a minute ….
1 + ½ + ¼ + 1/8 + 1/16 + 1/32 + … = 2.
In other words, all of these infinitely many time intervals add up to exactly two minutes.
So here’s the question. At the end of two minutes, is the lamp on or off?
And here’s a second question. Say the lamp starts out being off and you turn it on after one minute, then off after a further half minute and so on. Does this make any difference to your answer?
Thomson claimed there was no solution, and that the problem led to a contradiction.
“It seems impossible to answer this question. It cannot be on, because I did not ever turn it on without at once turning it off. It cannot be off, because I did in the first place turn it on, and thereafter I never turned it off without at once turning it on. But the lamp must be either on or off. This is a contradiction.”
While considering the relationship between the infinite and the finite, consider in conclusion the following.
Can a number of infinite length be represented by a line of finite length? Solution below.
Spoiler Alert (Solution)
The square root of 2 is an irrational number, with no finite solution. In other words, it goes on for ever. 1.4142135623730950488……………………….. for ever…..
So can a line with a finite length exactly equal to this infinitely long number be drawn?
Draw a right-angled triangle, of vertical length (a) and horizontal length (b) equal to 1.
Then, the length of the hypoteneuse of the triangle, c, can be derived from the length of the adjacent (a) and opposite (b) sides, using Pythagoras’ Theorem.
a^{2} + b^{2} = c^{2}
So, 1^{2} + 1^{2} = c^{2}
^{ }So c^{2} = 2
c = √2
This is a line of finite length, representing a number of infinite length. So the answer to the question is yes. Strange? Indeed. Another of those tantalising ‘chocolate paradoxes.’
Further reading and links
http://numberphile.com/videos/zeno_paradox.html
Thomson, James, F. ‘Tasks and Super-Tasks’, Analysis, 15 (1), 1-13.
The famed correspondence between two titans of 17^{th} century French intellectual thought, Blaise Pascal (Pascal’s Wager) and Pierre Fermat (Fermat’s Last Theorem) was to mark the foundation of modern probability theory. But it was sparked off by a question posed to Pascal by legendary French gambler of the time, Antoine Gombaud, better known as the Chevalier de Mere.
The question related to a new dice game the Chevalier had invented. According to the rules of the game, he asked for even money odds that a pair of dice, when rolled 24 times, will come up with a double-6 at least once. His reasoning seemed impeccable. If the chance of a 6 on one roll of the die = 1/6, then the chance of a double-6 when two dice are thrown = 1/6 x 1/6 (as they are independent events) = 1/36.
So, he reasoned, the chance of at least one double-6 in 24 throws is: 24/36 = 2/3. So this should be a profitable game for the Chevalier. When it didn’t turn out that way, he asked the great philosopher and mathematician, Blaise Pascal to look into it, as you do.
Pascal derived the correct probabilities as follows:
Probability of a double-6 in one throw of a pair of dice = 1/6 x 1/6 = 1/36.
So probability of NO double-6 in one throw of a pair of dice = 35/36.
So, probability of no double-6 in 24 throws of a pair of dice = 35/36 x 35/36 … 24 times = 35/36 to the power of 24, i.e. (35/36)^{24 }= 0.5086.
So, probability of at least one double-6 = 1 – 0.5086 = 0.4914
So the Chevalier was betting at even money on a game which he lost (albeit marginally) more often than he won, which is why he was losing over time.
What if he changed the game to give himself 25 throws?
Now, the probability of throwing at least one double-6 in 25 throws of a pair of dice is:
1 – (35/36)^{25} = 0.5055.
These odds, at even money, are in favour of the Chevalier, but this probability is still lower than the probability of obtaining one ‘6’ in four throws of a single die.
In the single-die game, the Chevalier has a house edge of 51.77% – 48.23% = 3.54%.
In the ‘pair of dice’ game (24 throws), the Chevalier’s edge =
49.14% – 50.81% = -1.72%
In the ‘pair of dice’ game (25 throws), the Chevalier’s edge =
50.55% – 49.45% = 1.1%
A better game for the Chevalier would have been to offer even money that he could get at least one run of ten heads in a row in 1024 tosses of a coin. The derivation of this probability is similar in method to the dice problem.
First, we need to determine the probability of 10 heads in 10 tosses of a fair coin.
The odds are: ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½
Odds = (1/2)^{10} = 1/1024, i.e. 1023/1.
Based on this, what is the probability of at least one run of 10 heads in 1024 tosses of the coin? Is it 0.5? No, because although you can expect ONE run of 10 heads on average, you could obtain zero, 2, 3, 4, etc.
So what is the probability of NO RUN of 10 heads in 1024 tosses of the coin?
This is: (1-1/1024)^{1024}
The probability of NO RUNS OF TEN HEADS = (1023/1024)^{1024} = 37%
So probability of AT LEAST one run of 10 heads = 63%.
Now assume you have tossed the coin already 234 times out of 1024, without a run of 10 heads, what is your chance now of getting 10 heads?
Probability of NO RUNS OF TEN HEADS in remaining 790 tosses = (1023/1024)^{790 }= 46%
So probability of at least one success = 54%.
The Chevalier could have played either of these games and expected to come out ahead. But the game would have taken a long time. He preferred the shorter game, which produced the longer loss.
Until he was put right by Monsieur Pascal.
Most importantly, though, the Chevalier’s question led to a correspondence, most of which has survived, which led to the foundations of modern probability theory.
I will examine just one of the conclusions of this correspondence today, and it relates to the infamous ‘Gambler’s Ruin’ problem.
This is an idea set in the form of a problem by Pascal for Fermat, subsequently published by Christiaan Huygens (‘On reasoning in games of chance’, 1657) and formally solved by Jacobus Bernoulli (‘Ars Conjectandi’, 1713).
One way of stating the problem is as follows. If you play any gambling game long enough, will you eventually go bankrupt, even if the odds are in your favour, if your opponent has unlimited funds?
Example: You and your opponent toss a coin, where the loser pays the winner £1. The game continues until either you or your opponent has all the money. Suppose you have £10 to start and your opponent has £20. What are the probabilities that a) you and b) your opponent, will end up with all the money?
The answer is that the player who starts with more money has more chance of ending up with all of it. The formula is:
P_{1} = n_{1} / (n_{1} + n_{2})
P_{2} = n_{2 }/ (n_{1} + n_{2})
Where n_{1} is the amount of money that player 1 starts with, and n_{2 }is the amount of money that player 2 starts with, and P1 and P2 are the probabilities that player 1 or player 2, your opponent, wins.
In this case, you start with £10 of the £30 total, and so have a 10/(10+20) = 10/30 = 1/3 chance of winning the £30; your opponent has a 2/3 chance of winning the £30. But even if you do win this game, and you play the game again and again, against different opponents, or the same one who has borrowed more money, eventually you will lose your entire bankroll. This is true even if the odds are in your favour. Eventually you will meet a long-enough bad streak to bankrupt you.
In other words, infinite capital will overcome any finite odds against it. This is the ‘Gambler’s Ruin’ problem, and many gamblers over the years have been ruined because of their unawareness of it.
So how can we avoid falling victim to the problem of ‘Gambler’s Ruin?’
‘Never bet more than you can afford to lose’.
‘When the Fun Stops, Stop!’
Now that’s a start.
Further Reading and Links
Letters between Fermat and Pascal on Probability: https://www.york.ac.uk/depts/maths/histstat/pascal.pdf
‘Superforecasting’ is a term popularised from insights gained as part of a fascinating idea known as the ‘Good Judgment Project’, which consists of running tournaments where entrants compete to forecast the outcome of national and international events.
The key conclusion of this project is that an identifiable element of those taking part (so-called ‘Superforecasters’) were able to consistently and significantly out-predict their peers. To the extent that this ‘superforecasting’ is real, and it seems to be, it provides support for the belief that markets can not only be beaten but systematically so.
So what is special about these ‘Superforecasters’? A key distinguishing feature of these wizards of prediction is that they tend to update their estimates much more frequently than regular forecasters, and they do so in smaller increments. Moreover, they tend to break big intractable problems down into smaller tractable ones.
They are also much better than regular forecasters at avoiding the trap of underweighting new information or overweighting it. In particular, they are good at evaluating probabilities dispassionately using a so-called Bayesian approach, i.e. establishing a prior (or baseline) probability that an event will occur, and then constantly updating that probability as new information emerges, incrementally updating in proportion to the weight of the new evidence.
In adopting this approach, the Superforecasters are echoing the response of legendary economist, John Maynard Keynes, to a criticism made to his face that he had changed his position on monetary policy.
“When my information changes, I alter my conclusions. What do you do, Sir?”
In this, Keynes was one of the great ‘Superforecasters.’ Keynes went on to earn a fortune betting in the currency and commodity markets.
Superforecasters in the field of sports betting can benefit in particular from betting in-running, while the event is taking place. Their evaluations are also likely to be data-driven, and are updated as frequently as possible, taking into account variables some of which may not even exist pre-match.
They will be aware of players who tend to struggle to close the deal, whether in golf, tennis, snooker, or whatever, and who may be value ‘lays’ when trading in-running at short prices. Or shaky starters, like batsmen whose average belies their likely performance once they get into double figures. This information is only valuable, however, if the market doesn’t already incorporate it. So they gain an edge by access to and dispassionate analysis of large data sets. Moreover, they are very aware that patterns spotted, and conclusions derived, from small data sets can be dangerous, and potentially very hazardous to the accumulation of wealth.
Superforecasters also tend to use ‘Triage’. This is the process of determining the most important things from amongst a large number that require attention. Risk expert and Hedge Fund manager, Aaron Brown offers an example of how, when he first got interested in basketball in the 1970s there were data analysts who tried to analyse the game from scratch. He considered that a hard proposition compared to asking which team was likely to attract more betting interest. As Los Angeles was a rich and high-betting city, and the LA Lakers a glamorous team, he figured it wasn’t hard to guess that the betting public would disproportionately favour the Laker and that therefore the spread would be slanted against them. ‘Bet against the Lakers at home’ became his strategy, and he observes that it took a lot less effort than simulating basketball games.”
Could such a simple strategy work today, tweaked or otherwise? And in what circumstances would you apply it? That’s a more nuanced issue, but Superforecasters (who are normally very keen on big data sets) would be alert to it.
Aaron Brown sees trading contracts on the future as striking the right balance between under- and over-confidence, between prudence and decisiveness. The hard part about this, he observes, is that confidence is negatively correlated to accuracy. Even experienced risk takers bet more when they’re wrong than when they’re right, he says, and the most confident people are generally the least reliable.
The solution, he maintains, is to keep careful, objective records, preferably by a third party.
That’s right – even experienced risk takers bet more when they’re wrong than when they’re right. If true, this is a critical insight.
So how might a Superforecaster go about constructing a sports forecasting model?
Let’s say he wants to construct a model to forecast the outcome of a football match or a golf tournament. In the former, he might focus on assessing the likely team line-up before its announcement, and draw on his hopefully extensive data set to eke out an edge from that. The football market is very liquid and likely to be quite efficient to known information, so any forecasting edge in terms of estimating future information, like team shape, can be critical. The same might apply to rugby, cricket, and other team games.
In terms of golf, he could include statistics on the average length of drive of the players, their tee to green percentages, their putting performance, the weather, the type of course, and so on. But where is the edge over the market?
He could try to develop a better model than others, including using new, state-of-the-art econometric techniques. In trying to improve the model, he could also seek to identify additional explanatory variables.
He might also turn to the field of ‘prospect theory’, a body of work pioneered by Daniel Kahneman and Amos Tversky. This states that people behave and make decisions according to the frame of reference rather than just the final outcome. Humans, according to prospect theory, do not think or think or behave totally rationally, and this could be built that into the model.
In particular, a key plank of prospect theory is ‘loss aversion’, the idea that people treat losses more harshly than equivalent gains, and that they view these losses and gains with regard to a sometimes artificial frame of reference.
An excellent seminal paper on this effect in golf (by Devin Pope and Maurice Schweitzer, in the American Economic Review), is a good example of the sort of way in which study of the economic literature can improve sports modelling. The key contribution of the Pope and Schweitzer paper is that it shows how prospect theory can play a role even in the behaviour of highly experienced and well-incentivised professionals. In particular, they demonstrate, using a database of millions of putts, that professional golfers are significantly more likely to make a putt for par than a putt for birdie, even when all other factors, such as distance to the pin, break, are allowed for. But why? And how does prospect theory explain it?
To find the explanation, they examine a number of possible explanations, and reject them one by one until they determine the true explanation. The find it is because golfers see par as the ‘reference’ score, and so a missed par is viewed (subconsciously or otherwise) by these very human golfers as a significantly greater loss than a missed birdie. They react irrationally in consequence, and cannot help themselves from doing so even when made aware of it. The researchers show that equivalent birdie putts tend to come up slightly too short relative to par putts. This is valuable information for Superforecasters, or even the casual bettor. It is also valuable information for a sports psychologist. If only someone could stand close to a professional golfer every time they stand over a birdie putt and whisper in their ear ‘This is for Par’, it would over time make a significant difference to their performance and pay.
So Superforecasters will Improve their model by increments, taking into account factors which more conventional thinkers might not even consider, and will apply due weight to updating their forecasts as new information emerges.
In conclusion, how might we sum up the difference between a Superforecaster and an ordinary mortal? Watch them as they view the final holes of the Masters golf tournament. What’s the chance of Sergio Garcia sinking that 10-footer? The ordinary mortal will just see the putt, the distance to the hole and the potential break of the ball on the green. The Superforecaster is going one step further, and also asking whether the 10-footer is for par or birdie. It really does make a difference, and it’s why she is watching from the members’ area at the Augusta National Golf Club. She has earned her place there, and she knew it before anyone else.
Further Reading and Links
D.G. Pope and M.E. Schweitzer, 2011, Is Tiger Woods Loss-Averse? Persistent Bias in the Face of Experience, Competition and High Stakes, American Economic Review, 101(1), 129-157.
Philip Tetlock and Dan Gardner, Superforecasting: The Art and Science of Prediction, 2016, London: Random House.