Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.
A viscountess, a radio DJ, a reality star, a vlogger, a comedian, several sportspeople and an assortment of actors and presenters. These, more or less, are the celebrities lined up to compete in the 2019 season of Strictly Come Dancing.
Outside their day jobs, few people know much about them yet. But over the 13 weeks or so of shows up until Christmas, viewers will at least learn how well the contestants can dance. But how much will their success in the competition have to do with their foxtrot and to what extent will it be, literally, the luck of the draw that sees the victors lift the trophy in December?
A seminal study published in 2010 looked at public voting at the end of episodes of the various Idol television pop singing contests and found that singers who were later on in the bill got a disproportionately higher share of the public vote than those who had preceded them.
This was explained as a “recency effect” – meaning that those performing later are more recent in the memory of people who were judging or voting. Interestingly, a different study, of wine tasting, suggested that there is also a significant “primacy effect” which favours the wines that people taste first (as well, to some extent, as last).
A little bias is in order
What would happen if the evaluation of each performance was carried out immediately after each performance instead of at the end – surely this would eliminate the benefit of going last as there would be equal recency in each case? The problem in implementing this is that the public need to see all the performers before they can choose which of them deserves their vote.
You might think the solution is to award a vote to each performer immediately after each performance – by complementing the public vote with the scores of a panel of expert judges. And, of course, Strictly Come Dancing (or Dancing with the Stars if you are in the US) does just this. So there should be no “recency effect” in the expert voting – because the next performer does not take to the stage until the previous performer has been scored.
We might expect in this case that the later performers taking to the dance floor should have no advantage over earlier performing contestants in the expert evaluations – and, in particular, there should be no “last dance” advantage.
We decided to test this out using a large data set of every performance ever danced on the UK and US versions of the show – going right back to the debut show in 2004. Our findings, published in Economics Letters, proved not only surprising, but almost a bit shocking.
Last shall be first
Contrary to expectations, we found the same sequence order bias by the expert panel judges – who voted after each act – as by the general public, voting after all performances had concluded.
We applied a range of statistical tests to allow for the difference in quality of the various performers and as a result we were able to exclude quality as a reason for getting high marks. This worked for all but the opening spot of the night, which we found was generally filled by one of the better performers.
So the findings matched the Idol study in demonstrating that the last dance slot should be most coveted, but that the first to perform also scored better than expected. This resembles a J-curve where there are sequence order effects such that the first and later performing contestants disproportionately gained higher expert panel scores.
Although we believe the production team’s choice of opening performance may play a role in this, our best explanation of the key sequence biases is as a type of “grade inflation” in the expert panel’s scoring. In particular, we interpret the “order” effect as deriving from studio audience pressure – a little like the published evidence of unconscious bias exhibited by referees in response to spectator pressure. The influence on the judges of increasing studio acclaim and euphoria as the contest progresses to a conclusion is likely to be further exacerbated by the proximity of the judges to the audience.
When the votes from the general public augment the expert panel scores – as is the case in Strictly Come Dancing – the biases observed in the expert panel scores are amplified. All of which means that, based on past series, the best place to perform is last and second is the least successful place to perform.
The implications of this are worrying if they spill over into the real world. Is there an advantage in going last (or first) into the interview room for a job – even if the applicants are evaluated between interviews? The same effects could have implications in so many situations, such as sitting down in a dentist’s chair or doctor’s surgery, appearing in front of a magistrate or having your examination script marked by someone with a huge pile of work to get through.
One study, reported in the New York Times in 2011, found that experienced parole judges granted freedom about 65% of the time to the first prisoner to appear before them on a given day, and the first after lunch – but to almost nobody by the end of a morning session.
So our research confirms what has long been suspected – that the order in which performers (and quite possibly interviewees) appear can make a big difference. So it’s now time to look more carefully at the potential dangers this can pose more generally for people’s daily lives, and what we can do to best address the problem.
Exercise
You arrive at someone’s home and are ushered into the garden. You know that a train passes the end of the garden every half an hour on average but the trains are actually scheduled so that half pass by with an interval of a quarter of an hour and half with an interval of 45 minutes. Given that you have no clue when the last train passed by and the scheduled interval between that train and the next, how long can you expect to wait for the next train?
Solution to Exercise
The mean interval between trains is 30 minutes, so the average expected wait would seem to be 15 minutes if you arrive at a random time.
But it is three times as likely that you will arrive during the 45 minutes interval as during the 15 minutes interval, and therefore three times the chance of waiting 22.5 minutes (half way along the 45 minutes interval) as 7.5 minutes (half way along the 15 minutes interval).
So your expected wait is 3 x 22.5 minutes plus 1 x 7.5 minutes, divided by four. This equals 75 divided by 4 or 18.75 minutes (18 minutes, 45 seconds).
Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.
The bus arrives every twenty minutes on average, though sometimes the interval between buses is a bit longer and sometimes a bit shorter. Still, it’s 20 minutes taken as an average, or an average of three buses an hour. So you emerge onto the main road from a side lane at some random time, and come straight upon the bus stop. How long can you expect to wait on average for the next bus to arrive?
The intuitive answer is 10 minutes, since this is exactly half way along the average interval between buses, and if your usual wait is rather longer than this, then you have been unlucky.
But is this right? The Inspection Paradox suggests that in most circumstances you will actually be quite lucky only to wait ten minutes for the next bus to arrive.
Let’s examine this more closely. The bus arrives every 20 minutes on average, or three times an hour on average. But that is only an average. If they actually do arrive at exactly 20 minute intervals, then your expected wait is indeed 10 minutes (the mid-point of the interval between the bus arrivals). But if there is any variation around that average, things change, for the worse.
Say for example, that half the time the buses arrive at a ten minute interval and half the time at a 30 minute interval. The overall average is now 20 minutes, but from your point of view it is three times more likely that you’ll turn up during the 30 minute interval than during the ten minute interval. Your appearance at the stop is random, and as such is more likely to take place during a long interval between two buses arriving than during a short interval. It is like randomly throwing a dart at a timeline 30 minutes long. You could well hit the ten minute interval but it is much more likely that you will hit the 30 minute interval.
So let’s see what this means for our expected wait time. If you randomly arrive during the long (30 minute) interval, you can expect to wait 15 minutes. If you randomly arrive during the short (10 minute) interval, you can expect to wait 5 minutes. But there is three times the chance you will arrive during the long interval, and therefore three times the chance of waiting 15 minutes as five minutes. So you expected wait is 3×15 minutes plus 1x 5 minutes, divided by four. This equals 50 divided by 4 or 12.5 minutes.
In conclusion, the buses arrive on average every 20 minutes but your expected wait time is not half of that (10 minutes) but more in every case except when the buses arrive at exact 20 minute intervals. The greater the dispersion around the average, the greater the amount by which your expected wait time exceeds the average wait time. This is the ‘Inspection Paradox’, which states than whenever you ‘inspect’ a process you are likely to find that things take (or last) longer than their ‘uninspected’ average. What seems like the persistence of bad luck is actually the laws of probability and statistics playing out their natural course.
Once made aware of the paradox, it seems to appear everywhere.
For example, take the case where the average class size at an institution is 30 students. If you decide to interview random students from the institution, and ask them how big is their class size, you will usually obtain an average rather higher than 30. Let’s take a stylised example to explain why. Say that the institution has class sizes of either ten or 50, and there are equal numbers of both class sizes. So the overall average class size is 30. But in selecting a random student, it is five times more likely that he or she will come from a class of 50 students than of ten students. So for every one student who replies ‘10’ to your enquiry about their class size, there will be five who answer ’50.’ So the average class size thrown up by your survey is 5×50 + 1 x 10, divided by 6. This equals 260/6 = 43.3. So the act of inspecting the class sizes actually increases the average obtained compared to the uninspected average. The only circumstance in which the inspected and uninspected average coincides is when every class size is equal.
The range of real-life cases where this occurs is almost boundless. For example, you visit the gym at a random time of day and ask a random sample of those who are there how long they normally exercise for. The answer you obtain will likely well exceed the average of all those who attend the gym that day because it is more likely that when you turn up you will come across those who exercise for a long time than a short time.
Once you know about the Inspection Paradox, the world and our perception of our place in it, is never quite the same again.
Exercise
You arrive at someone’s home and are ushered into the garden. You know that a train passes the end of the garden every half an hour on average but the trains are actually scheduled so that half pass by with an interval of a quarter of an hour and half with an interval of 45 minutes. Given that you have no clue when the last train passed by and the scheduled interval between that train and the next, how long can you expect to wait for the next train?
Links and References
Amir D. Aczel. Chance: A Guide to Gambling, Love, the Stock market and Just About Everything Else. 18 May, 2016. NY: Thunder’s Mouth Press.
On the Persistence of Bad Luck (and Good). Amir Aczel. Sept. 4, 2013. http://blogs.discovermagazine.com/crux/2013/09/04/on-the-persistence-of-bad-luck-and-good/#.XXJL0ihKh3g
The Waiting Time Paradox, or, Why is My Bus Always Late? https://jakevdp.github.io/blog/2018/09/13/waiting-time-paradox/
Probably Overthinking It. August 18, 2015. The Inspection Paradox is Everywhere. http://allendowney.blogspot.com/2015/08/the-inspection-paradox-is-everywhere.html
Solutions to Exercises
Question 1.
ab/ [ab+c(1-a)]
a is the prior probability, i.e. the probability that a hypothesis is true before you see the new evidence. Before the new evidence (the test), this chance is estimated at 1 in 100 (0.01), as we are told that 1 per cent of the people who visit his surgery have the virus. So, a = 0.01
b is the probability of the new evidence if the hypothesis is true. The probability of the new evidence (the positive result on the test) if the hypothesis is true (the patient is sick) is 95%, since the test is 95% accurate. So, b =0.95
c is the probability of the new evidence if the hypothesis is false. The probability of the new evidence (the positive result on the test) if the hypothesis is false (the patient is not sick) is 5% (because the test is 95% accurate, and we can only expect a false positive 5 times in 100). So, c = 0.05
Using Bayes’ Theorem, the updated (posterior) probability = 0.01 x 0.95/ [(0.01 x 0.95) + 0.05 (1-0.01)] = 0.0095 / (0.0095 + 0.0495) = 0.161
So the chance the doctor should give to the patient having the flu, if testing positive, is 16.1 per cent.
Question 2.
Let A = a positive test for an individual.
Let B = a negative test for an individual.
Let C = the employee is a drug user.
We are seeking to determine whether a player who tested positive is a drug user. This is represented by P (C I A).
Using Bayes’ Theorem,
P (C I A) = P (A I C) . P (C) / P (A)
We are told that P (A I C) = 0.9
Also, P (C) = 0.1
P (A) is the sum of the probability of testing positive if the player is using the banned substances and the probability of testing positive if the player is not using the banned substances.
We know that 10% of the tournament entrants are using the drugs and that 90% of the drug users will test positive.
So the probability of testing positive if the employee is taking the banned drugs = 0.9 x 0.1 = 0.09
We also know that the test is 85% for those not taking the drugs, so 15% of those innocent of taking the drugs will still test positive. But only 10% of the players are guilty of taking the drugs so 90% are not taking the drugs.
So, the probability of testing positive if the employee is not taking the banned drugs (known as a ‘false positive’) = 0.9 x 0.15 = 0.135
So, P (A) = 0.09 + 0.135 = 0.225
So, P (C I A) = P (A I C) . P (C) / P (A) = 0.9 x 0.1 / 0.225 = 0.4
So, the probability that an entrant to the tennis tournament tests positive and is taking the banned substances is 40%.
Question 3.
a. Sensitivity = 66/ (66+4) = 0.94 = 94%
b. Specificity = 827/ (827+3) = 0.99 = 99%
c. PPV = TP/(TP+FP) = 66/ (66+3) = 95.7%
d. NPV = TN/(TN+FN) = 827/ (827+4) = 99.5%
e. LR+ = sensitivity/(1-specificity) = 0.94/0.01 = 94
LR+ = P (T+ I D+) / P (T+ I D-) = 0.94/0.01 = 94
f. The negative likelihood ratio is calculated as:
LR- = (1-sensitivity)/specificity = (1-0.94) / 0.99 = 0.06/0.99 = 0.06
This is equivalent to:
LR- = P (T- I D+) / P (T- I D-) = 0.06/0.99 = 0.06
g. Pre-Test Probability of having the flu = 66+4/3+827 = 70/830 = 0.084
Pre-Test Odds = P (something is true) / P (something is false) = 0.084/ (1-0.084) = 0.091
Post-Test Odds = Pre-Test Odds x LR+ = 0.091 x 94 = 8.55
Probability = Odds / (1 + Odds) = 8.55 / 9.55 = 0.895
Question 4.
a. Sensitivity = 90/ (90+10) = 90%
b. Specificity = 750/ (750+150) = 83%
- a. Sensitivity = 610 / 728 = 83.8%
b. Specificity = 127,344/ 140, 556 = 90.6%
c. LR+ = sensitivity / (1 – specificity) = 0.836 / (1-0.906) = 8.9
d. LR- = (1 – sensitivity) / specificity = (1-0.836) / 0.906 = 0.18
e. Pre-test probability of having flu = 610 + 118 / (13212 + 127344) = 728/140556 = 0.005. So pre-test probability of not having flu = 0.995.
Odds = P (something is true) / P (something is false)
So Pre-Test Odds = 0.005 / 0.995 = 0.005
N.B. For events with a very low probability, Odds are very similar to Probability.
f. Post-Test Odds = Pre-Test Odds x LR+ = 0.005 x 8.9 = 0.045
Post-Test Probability = Post-Test Odds / (1 + Post-Test Odds) = 0.045 / (1 + 0.045) = 4.3%.
g. Pre-Test Odds = P (Something is true, pre-test) / P (something is false, pre-test) = 0.3/0.7 = 0.43
h. Post-test Odds = Pre-test Odds x LR+ = 0.43 x 8.9 = 3.8
i. Probability = Odds / (1 + Odds) = 3.8 / (1 + 3.8) = 79%
j. Post-test Odds = Pre-test Odds x LR– = 0.43 x 0.18 = 0.077
k. Post-test probability = Odds / (1 + Odds) = 0.077 / (1 + 0.077) = 7.1%
To illustrate the Expected Value Paradox, let us propose a coin-tossing game, in which you gain 50% of what you bet if the coin lands Heads and lose 40% if it lands Tails. What is the expected value of a single play of this game?
The Expected Value can be calculated as the sum of the probabilities of each possible outcome in the game times the return if that outcome occurs.
Say, for example, the unit stake for each play of the game is £10. In this case, the gain if the coin lands Heads is 50% x £10 = £5, and the loss if the coin lands Tails is 40% x £10 = £4.
In this case, the expected value (given a fair coin, with 0.5 chance of Heads and 0.5 chance of Tails) = 0.5 x £5 – 0.5 x £4 = £0.5, or 50 pence.
So the Expected Value of the game is 5%. This is the positive net expectation for each play of the game (toss of the coin).
Let’s see how this plays out in an actual experiment in which 100 people play the game. What do we expect would be the average final balance of the players?
The expected gain from the 50 players tossing Heads = 50 x £5 = £250.
The expected loss from the 50 players tossing Tails = 50 x £4 = £200.
So, the net gain over 100 players = £250 – £200 = £50.
The average net gain of the 100 player = £50/100 = £0.5, or 50 pence.
Expected Value = 0.5 x £1.5 + 0.5 x 60p. = £1.05. As above, this is an expected gain of 5%.
From two coin tosses, our best estimate is 25 Heads-Heads, 25 Tails-Tails, 25 Heads-Tails and 25 Tails-Heads.
The Expected Value over the two coin tosses = 0.25 x (1.5)2 + 0.25 x (0.6)2 + 0.25 (1.5 x 0.6) + 0.25 (0.6 x 1.5) = £1.0575.
However many coin tosses the group throws, the Expected Value is positive.
Take now the case of one person playing the game through time. Say there are four coin tosses, for a stake of £10.
From four coin tosses, our best estimate is 2 Heads and 2 Tails.
Expected value for 2 Heads and 2 Tails = £10 x 1.5 x 1.5 x 0.6 x 0.6.
Expected value goes from £10 to £15 to £22.50 to £13.50 to £8.10. This is a net loss.
To clarify, we bet £10. The coin lands Heads. We now have £15. We bet £15 now on the next coin toss. It lands Heads again. We now have £22.50. We bet £22.50 now on the next coin toss. It lands Tails. Now we are back to £13.50. We bet this £13.50 on the next coin toss. It lands Tails again and we are down to £8.10. This is a net loss on the original stake of £10.
If we throw the same number of Heads and Tails after tossing the coin N times, we would expect more generally to earn the following.
1.5N/2 x 0.6N/2 = (1.5 x 0.6)N/2 = 0.9N/2
Eventually, all the stack used for betting is lost.
Herein lies the paradox. When many people play the game a fixed number of times, the average return is positive, but when a fixed number of people play the game many times, they should expect to lose most of their money.
This is a demonstration of the difference between what is termed ‘time averaging’ and ‘ensemble averaging.’
Thinking of the game as a random process, time averaging is taking the average value as the process continues. Ensemble averaging is taking the average value of many processes running for some fixed amount of time.
Processes where there is a difference between time and ensemble averaging are called ‘ergodic processes.’ In the real world, however, many processes, including notably in finance, are non-ergodic.
Say that in an election two parties, A and B, attract some percentage of voters, x% and y% respectively. This is not the same thing as saying that over the course of their voting lives, each individual votes for party A in x% of elections and for party B in y% of elections. These two concepts are distinct.
Again, if we wish to determine the most visited parts of a city, we could take a snapshot in time of how many people are in neighbourhood A, how many in neighbourhood B, etc. Alternatively, we could follow a particular individual or a few individuals, over a period of time and see how often they visit neighbourhood A, neighbourhood B, etc. The first analysis (the ensemble) may not be representative over a period of time, while the second (time) may not be representative of all the people.
An ergodic process is one which in which the two types of statistic give the same results. In an ergodic system, time is irrelevant and has no direction. Say, for example, that 100 people rolled a die once, and the total of the scores is divided by 100. This finite-time average approaches the ensemble average as more and more people are included in the sample. Now, take the case of a single person rolling a die 100 times, and the total scored is divided by 100. This finite-time average would eventually approach the time average.
An implication of ergodicity is that the result ensemble averaging will be the same as time averaging.
And here is the key point: In the case of ensemble averages, it is the size of the sample that eventually removes the randomness from the sample. In the case of time averages, it is the time devoted to the process that removes randomness.
In the dice rolling example, both methods give the same answer, subject to errors. In this sense, rolling dice is an ergodic system.
However, if we now bet on the results of the dice rolling game, wealth does not follow an ergodic system. If a player goes bankrupt, he stays bankrupt, so the time average of wealth can approach zero over time as time passes, even though the ensemble value of wealth may increase.
As a new example take the case of 100 people visiting a casino, with a certain amount of money. Some may win, some may lose, but we can infer the house edge by counting the average percentage loss of the 100 people. This is the ensemble average. This is different to one person going to the casino 100 days in a row, starting with a set amount. The probabilities of success derived from a collection of people does not apply to one person. The first is the ‘ensemble probability’, the second is the ‘time probability’ (the second is concerned with a single person through time).
Here is the key point: No individual person has sure access to the returns of the market without infinite pockets and an absence of so-called ‘uncle points’ (the point at which he needs, or feels the need, to exit the game). To equate the two is to confuse ensemble averaging with time averaging.
If the player/investor has to reduce exposure because of losses, or maybe retirement or other change of circumstances, his returns will be divorced from those of the market or the game. The essential point is that success first requires survival. This applies to an individual in a different sense to the ensemble.
So where does the money lost by the non-survivors go? It gets transferred to the survivors, some of whom tend to scoop up much or most of the pool, i.e. the money is scoped up by the tail probability of those who keep surviving, which may just be by blind good luck, just as the non-survivors may have been forced out of the game/market by blind bad luck. So the lucky survivors (and in particular the tail-end very lucky survivors) more than compensate for the effect of the unlucky entrants.
The so-called Kelly approach to investment strategy, discussed in a separate chapter, is an investment approach which seeks to respond to the survivor issue.
Say, for example, that the probability of Heads from a coin toss is 0.6, and Heads wins a dollar, but Tails (with a probability of 0.4) loses a dollar. Although the Expected Value of this game is positive, if the response of an investor in the game is to stake all their bankroll on each toss of the coin, the expected time until bankroll bankruptcy is just 1/(1-0.6) = 2.5 tosses of the coin.
The Kelly strategy to optimise the growth rate if the bankroll is to invest a fraction of the bankroll equal to the difference in the likelihood you will win or lose.
In the above example, it means we should in each game bet the fraction of x = 0.6 – 0.4 = 0.2 of the bankroll.
The optimal average growth rate becomes: 0.6 log (1.2) + 0.4 log (0.8) = 0.2.
If we bet all our bankroll on each coin toss, we will most likely lose the bankroll. This is balanced out over all players by those who with low probability win a large bankroll. For the real-life player, however, it is most relevant to look at the time-average of what may be expected to be won.
In trying to maximise Expected Value, the probability of bankroll bankruptcy soon gets close to one. It is better to invest, say, 20% of bankroll in each game, and maximise long-term average bankroll growth.
In the coin-toss example, it is like supposing that various “I”s are tossing a coin, and the losses of the many of them are offset by the huge profit of the relatively small number of “I”s who do win. But this ensemble-average does not work for an individual for whom a time-average better reflects the one timeline in which that individual exists.
Put another way, because the individual cannot go back in time and the bankruptcy option is always actual, it is not possible to realise the small chance of making the tail-end upside of the positive expectation value of a game/investment without taking on the significant risk of non-survival/bankruptcy. In other words, the individual lives in one universe, on one time path, and so is faced with the reality of time-averaging as opposed to an ensemble average in which one can call upon the gains of parallel investors/game players on parallel timelines in essentially parallel worlds.
To summarise, the difference between 100 people going to a casino and one person going to the casino 100 times is the difference between understanding probability in conventional terms and through the lens of path dependency.
References and Links
Time for a change: Introducing irreversible time in economics. https://www.gresham.ac.uk/lectures-and-events/time-for-a-change-introducing-irreversible-time-in-economics
What is ergodicity? https://larspsyll.wordpress.com/2016/11/23/what-is-ergodicity-2/
Non-ergodic economics, expected utility and the Kelly criterion. https://larspsyll.wordpress.com/2012/04/21/non-ergodic-economics-expected-utility-and-the-kelly-criterion/
Ergodicity. http://squidarth.com/math/2018/11/27/ergodicity.html
Ergodicity. http://nassimtaleb.org/tag/ergodicity/
Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.
One of the most celebrated pieces of correspondence in the history of probability and gambling, and one of which I am particularly fond, involves an exchange of letters between the greatest diarist of all time, Samuel Pepys, and the greatest scientist of all time, Sir Isaac Newton.
The six letters exchanged between Pepys in London and Newton in Cambridge related to a problem posed to Newton by Pepys about gambling odds. The interchange took place between November 22 and December 23, 1693. The ostensible reason for Mr. Pepys’ interest was to encourage the thirst for truth of his young friend, Mr. Smith. Whether Sir Isaac believed that tale or not we shall never know. The real reason, however, was later revealed in a letter written to a confidante by Pepys indicating that he himself was about to stake 10 pounds, a considerable sum in 1693, on such a bet. Now we’re talking!
The first letter to Newton introduced Mr. Smith as a fellow with a “general reputation…in this towne (inferiour to none, but superiour to most) for his maistery [of]…Arithmetick”.
What emerged has come down to us as the aptly named Newton-Pepys problem.
Essentially, the question came down to this:
Which of the following three propositions has the greatest chance of success.
- Six fair dice are tossed independently and at least one ‘6’ appears
- 12 fair dice are tossed independently and at least two ‘6’s appear.
- 18 fair dice are tossed independently and at least three ‘6’s appear.
Pepys was convinced that C. had the highest probability and asked Newton to confirm this.
Newton chose A as the highest probability, then B, then C, and produced his calculations for Pepys, who wouldn’t accept them.
So who was right? Newton or Pepys?
Well, let’s see.
The first problem is the easiest to solve.
What is the probability of A?
Probability that one toss of a coin produces a ‘6’ = 1/6
So probability that one toss of a coin does not produce a ‘6’ = 5/6
So probability that six independent tosses of a coin produces no ‘6’ = (5/6)6
So probability of AT LEAST one ‘6’ in 6 tosses = 1 – (5/6)6 = 0.6651
So far, so good.
The probability of problem B and probability of problem C are more difficult to calculate and involve use of the binomial distribution, though Newton derived the answers from first principles, by his method of ‘Progressions’.
Both methods give the same answer, but using the more modern binomial distribution is easier.
So let’s do it, along the way by introducing the idea of so-called ‘Bernoulli trials’.
The nice thing about a Bernoulli trial is that it has only two possible outcomes.
Each outcome can be framed as a ‘yes’ or ‘no’ question (success or failure).
Let probability of success = p.
Let probability of failure = 1-p.
Each trial is independent of the others and the probability of the two outcomes remains constant for every trial.
An example is tossing a coin. Will it lands heads?
Another example is rolling a die. Will it come up ‘6’?
Yes = success (S); No = failure (F).
Let probability of success, P (S) = p; probability of failure, P (F) = 1-p.
So the question: How many Bernoulli trials are needed to get to the first success?
This is straightforward, as the only way to need exactly five trials, for example, is to begin with four failures, i.e. FFFFS.
Probability of this = (1-p) (1-p) (1-p) (1-p) p = (1-p)4 p
Similarly, the only way to need exactly six trials is to begin with five failures, i.e. FFFFFS.
Probability of this = (1-p) (1-p) (1-p) (1-p) (1-p) p = (1-p)5 p
More generally, the probability that success starts on trial number n =
(1-p)n-1 p
This is a geometric distribution. This distribution deals with the number of trials required for a single success.
But what is the chance that the first success takes AT LEAST some number of trials, say 12 trials?
One method is to add the probability of 12 trials to prob. of 13 trials to prob. of 14 trials to prob. of 15 trials, etc. …………………………
Easier method: The only time you will need at least 12 trials is when the first 11 trials are all failures, i.e. (1-p)11
In a sequence of Bernoulli trials, the probability that the first success takes at least n trials is (1-p)n-1
Let’s take a couple of examples.
Probability that the first success (heads on coin toss) takes at least three trials (tosses of the coin)= (1-0.5)2 = 0.25
Probability that the first success (heads on coin toss) takes at least four trials (tosses of the coin)= (1-0.5)3 = 0.125
But so far we have only learned how to calculate the probability of one success in so many trials.
What if we want to know the probability of two, or three, or however many successes?
To take an example, what is the probability of exactly two ‘6’s in five throws of the die?
To determine this, we need to calculate the number of ways two ‘6’s can occur in five throws of the die, and multiply that by the probability of each of these ways occurring.
So, probability = number of ways something can occur multiplied by probability of each way occurring.
How many ways can we throw two ‘6’s in five throws of the die?
Where S = Success in throwing a ‘6’, F = Fail in throwing a ‘6’, we have:
SSFFF; SFSFF; SFFSF; SFFFS; FSSFF; FSFSF; FSFFS; FFSSF; FFSFS; FFFSS
So there are 10 ways of throwing two ‘6’s in five throws of the dice.
More formally, we are seeking to calculate how many ways 2 things can be chosen from 5. This is known as ‘5 Choose 2’, written as:
5 C 2= 10
More generally, the number of ways k things can be chosen from n is:
nC k = n! / (n-k)! k!
n! (known as n factorial) = n (n-1) (n-2) … 1
k! (known as k factorial) = k (k-1) (k-2) … 1
Thus, 5C 2 = 5! / 3! 2! = 5x4x3x2x1 / (3x2x1x2x1) = 5×4/(2×1) = 20/2=10
So what is the probability of throwing exactly two ‘6’s in five throws of the die, in each of these ten cases? p is the probability of success. 1-p is the probability of failure.
In each case, the probability = p.p.(1-p).(1-p).(1-p)
= p2 (1-p)3
Since there are 5 C 2 such sequences, the probability of exactly 2 ‘6’s =
10 p2 (1-p)3
Generally, in a fixed sequence of n Bernoulli trials, the probability of exactly r successes is:
nC r x pr (1-p) n-r
This is the binomial distribution. Note that it requires that the probability of success on each trial be constant. It also requires only two possible outcomes.
So, for example, what is the chance of exactly 3 heads when a fair coin is tossed 5 times?
5C 3 x (1/2)3 x (1/2)2 = 10/32 = 5/16
And what is the chance of exactly 2 sixes when a fair die is rolled five times?
5 C 2x (1/6)2 x (5/6)3 = 10 x 1/36 x 125/216 = 1250/7776 = 0.1608
So let’s now use the binomial distribution to solve the Newton-Pepys problem.
- What is the probability of obtaining at least one six with 6 dice?
- What is the probability of obtaining at least two sixes with 12 dice?
- What is the probability of obtaining at least three sizes with 18 dice?
First, what is the probability of no sixes with 6 dice?
P (no sixes with six dice) = n C x . (1/6)x . (5/6)n-x, x = 0,1,2,…,n
Where x is the number of successes.
So, probability of no successes (no sixes) with 6 dice =
n!/(n-k)!k! = 6!/(6-0)!0! x (1/6)0 . (5/6)6-0 = 6!/6! X 1 x 1 x (5/6)6 = (5/6)6
Note that: 0! = 1
Here’s the proof: n! = n. (n-1)!
At n=1, 1! = 1. (1-1)!
So 1 = 0!
So, where x is the number of sixes, probability of at least one six is equal to ‘1’ minus the probability of no sixes, which can be written as:
P (x≥ 1) = 1 – P(x=0) = 1 – (5/6)6 = 0.665 (to three decimal places).
i.e. probability of at least one six = 1 minus the probability of no sixes.
That is a formal solution to Part 1 of the Newton-Pepys Problem.
Now on to Part 2.
Probability of at least two sixes with 12 dice is equal to ‘1’ minus the probability of no sixes minus the probability of exactly one six.
This can be written as:
P (x≥2) = 1 – P(x=0) – P(x=1)
P(x=0) in 12 throws of the dice = (5/6)12
P (x=1) in 12 throws of the dice = 12 C 1 . (1/6)1 . (5/6)11nC k = n! / (n-k)! k!
So 12 C 1
= 12! / (12-1)! 1! = 12! / 11! 1! = 12
So, P (x≥2) = 1 – (5/6)12 – 12. (1/6) . (5/6)11
= 1 – 0.112156654 – 2 . (0.134587985) = 0.887843346 – 0.26917597 =
= 0.618667376 = 0.619 (to 3 decimal places)
This is a formal solution to Part 2 of the Newton-Pepys Problem.
Now on to Part 3.
Probability of at least three sixes with 18 dice is equal to ‘1’ minus the probability of no sixes minus the probability of exactly one six minus the probability of at exactly two sixes.
This can be written as:
P (x≥3) = 1 – P(x=0) – P(x=1) – P(x=2)
P(x=0) in 18 throws of the dice = (5/6)18
P (x=1) in 18 throws of the dice = 18 C 1 . (1/6)1 . (5/6)17
nC k = n! / (n-k)! k!
So 18 C 1
= 18! / (18-1)! 1! = 18
So P (x=1) = 18. (1/6)1 . (5/6)17
P (x=2) = 18 C 2 . (1/6)2 .(5/6)16
18 C 2
= 18! / (18-2)! 2! = 18!/16! 2! = 18. (17/2)
So P (x=2) = 18. (17/2) (1/6)2 (5/6)16
So P(x=3) = 1 – P (x=0) – (P(x=1) – P (x=2)
P (x=0) = (5/6)18
= 0.0375610365
P (x=1) = 18. 1/6. (0.0450732438) = 0.135219731
P (x=2) = 18. (17/2) (1/36) (0.0540878926) = 0.229873544
So P(x=3) = 1 – 0.0375610365 – 0.135219731 – 0.229873544 =
P(x≥3) = 0.597345689 = 0.597 (to 3 decimal places, )
This is a formal solution to Part 3 of the Newton-Pepys Problem.
So, to re-state the Newton-Pepys problem.
Which of the following three propositions has the greatest chance of success?
- Six fair dice are tossed independently and at least one ‘6’ appears.
- 12 fair dice are tossed independently and at least two ‘6’s appear.
- 18 fair dice are tossed independently and at least three ‘6’s appear.
Pepys was convinced that C. had the highest probability and asked Newton to confirm this.
Newton chose A, then B, then C, and produced his calculations for Pepys, who wouldn’t accept them.
So who was right? Newton or Pepys?
According to our calculations, what is the probability of A? 0.665
What is the probability of B? 0.619
What is the probability of C? 0.597
So Sir Isaac’s solution was right. Samuel Pepys was wrong, a wrong compounded by refusing to accept Newton’s solution. How much he lost gambling on his misjudgement is mired in the mists of history. The Newton-Pepys Problem is not, and continues to tease our brains to this very day.
References and Links
Newton and Pepys. DataGenetics. http://datagenetics.com/blog/february12014/index.html
Newton-Pepys problem. Wikipedia. https://en.wikipedia.org/wiki/Newton%E2%80%93Pepys_problem
The Gambler’s Fallacy, also known as the Monte Carlo Fallacy, is the proposition that people, instead of accepting an actual independence of successive outcomes, are influenced in their perceptions of the next possible outcome by the results of the preceding sequence of outcomes – e.g. throws of a die, spins of a wheel. Put another way, the fallacy is the mistaken belief that the probability of an event is decreased when the event has occurred recently, even though the probability of the event is objectively known to be independent across trials.
This can be illustrated by considering the repeated toss of a fair coin. The outcomes of each coin toss are in fact independent of each other, and the probability of getting heads on a single toss is 1/2. The probability of getting two heads in two tosses is 1/4, of three heads in three tosses is 1/8, and of four heads in a row is 1/16. Since the probability of a run of five successive heads is 1/32, the fallacy is to believe that the next toss would be more likely to come up tails rather than heads again. In fact, “5 heads in a row” and “4 heads, then tails” both have a probability of 1/32. Since the first four tosses turn u heads, the probability that the next toss is a head is 1/2, and similarly for tails.
While a run of five heads in a row has a probability of 1/32, this applies only before the first coin is tossed. After the first four tosses, the next coin toss has a probability of 1/2 Heads and 1/2 Tails.
The so-called Inverse Gambler’s Fallacy is where someone entering a room sees an individual rolling a double six with a pair of fair dice and concludes (with flawed logic) that the person must have been rolling the dice for some time, as it is unlikely that they would roll a double six on a first or early attempt.
The existence of a ‘gambler’s fallacy’ can be traced to laboratory studies and lottery-type games (Clotfelter and Cook, 1993; Terrell, 1994). Clotfelter and Cook found (in a study of a Maryland numbers game) a significant fall in the amount of money wagered on winning numbers in the days following the win, an effect which did not disappear entirely until after about sixty days. This particular game was, however, characterized by a fixed-odds payout to a unit bet, and so the gambler’s fallacy had no effect on expected returns. In pari-mutuel games, on the other hand, the return to a winning number is linked to the amount of money bet on that number, and so the operation of a systematic bias against certain numbers will tend to increase the expected return on those numbers.
Terrell (1994) investigated one such pari-mutuel system, the New Jersey State Lottery. In a sample of 1,785 drawings from 1988 to 1993, he constructed a subsample of 97 winners which repeated as a winner within the 60 day cut-off point suggested by Clotfelter and Cook. He found that these numbers had a higher payout than when they previously won on 80 of the 97 occasions. To determine the relationship, he regressed the payout to winning numbers on the number of days since the last win by that number. The expected payout increased by 28% one day after winning, and decreased from this level by c. 0.5% each day after the number won, returning to its original level 60 days later. The size of the gambler’s fallacy, while significant, was less than that found by Clotfelter and Cook in their fixed-odds numbers game.
It is as if irrational behaviour exists, but reduces as the cost of the anomalous behaviour increases.
An opposite effect is where people tend to predict the same outcome as the previous event, resulting in a belief that there are streaks in performance. This is known as the ‘hot hand effect’, and normally applies in the context of human performance, as in basketball shots, whereas the Gambler’s Fallacy is applied to inanimate games such as coin tosses or spins of a roulette wheel. This is because human performance may not be perceived as random in the same way as, say, a coin flip.
Exercise
Distinguish between the Gambler’s Fallacy, the Inverse Gambler’s Fallacy and the Hot Hand Effect. Can these three phenomena be logically reconciled?
References and Links
Gambler’s Fallacy. Wikipedia. https://en.wikipedia.org/wiki/Gambler%27s_fallacy
Gambler’s Fallacy. Logically Fallacious. https://www.logicallyfallacious.com/tools/lp/Bo/LogicalFallacies/98/Gambler-s-Fallacy
Gambler’s Fallacy. RationalWiki. https://rationalwiki.org/wiki/Gambler%27s_fallacy
Inverse Gambler’s Fallacy. Wikipedia. https://en.wikipedia.org/wiki/Inverse_gambler%27s_fallacy
Inverse Gambler’s Fallacy. RationalWiki. https://rationalwiki.org/wiki/Gambler%27s_fallacy
Hot Hand. Wikipedia. https://en.wikipedia.org/wiki/Hot_hand
Clotfelter, C.T. and Cook, P.J. (1993). Notes: The “Gambler’s Fallacy” in Lottery Play, Management Science, 39.12,i-1553. https://pubsonline.informs.org/doi/abs/10.1287/mnsc.39.12.1521
Terrell, D. (1994). A Test of the Gambler’s Fallacy: Evidence from Pari-Mutuel Games. Journal of Risk and Uncertainty. 8,3, 309-317. https://link.springer.com/article/10.1007/BF01064047
The Base Rate Fallacy occurs when we disregard or undervalue prior information when making a judgment on how likely something is. In particular, if presented with related base rate information (i.e. generic, general information) and specific information (information pertaining only to a certain case), the fallacy arises from a tendency to focus on the latter at the expense of the former.
For example, we are informed that someone is an avid book-lover, we might think it more likely that they are a librarian than a nurse. There are, however, many more nurses than librarians. In this example, we have not taken sufficient account of the base rate for the number of nurses relative to librarians.
Now consider testing for a medical condition, which affects 2% of the population. Assume there’s a test for this condition which will correctly identify them with this condition 95% of the time. If someone does not have the condition, the test will correctly identify them as being clear of this condition 80% of the time.
Now consider a test a random group of people. Of the 2% of patients who are suffering from the condition, 95% will be correctly diagnosed with the condition, whereas of the 98% of patients who do not have the condition, 20% will be incorrectly diagnosed as having the condition (almost 20% of the population).
What this means is that of the 21.5% of the population (0.95 x 2% + 0.2 x 98%) who are diagnosed with the condition, slightly less than 2% (0.95 x 2% = 1.9%) actually are suffering from the condition, i.e. 8.8%.
Exercise
Consider testing for a medical condition, which affects 4% of the population. Assume there’s a test for this condition which will correctly identify them with this condition 90% of the time. If someone does not have the condition, the test will correctly identify them as being clear of this condition 90% of the time.
If someone tests positive for the condition, what is the probability that they have the condition?
Reading and Links
Base Rate Fallacy. In: Paradoxes of probability and other statistical strangeness. UTS, 5 April, 2017. S. Woodcock. http://newsroom.uts.edu.au/news/2017/04/paradoxes-probability-and-other-statistical-strangeness
Base Rate Fallacy. Wikipedia. https://en.wikipedia.org/wiki/Base_rate_fallacy
Professor Leighton Vaughan Williams – Written evidence (PPD0024)
1. In this evidence, I consider the relationship between political betting and political opinion polls, and highlight peer-reviewed research I have undertaken into this. I also reference some other published work of mine on opinion polling and political forecasting more generally. Research I have undertaken into the impact of the dissemination of information via social media is also highlighted.
2. The recorded history of election betting markets can be traced as far back as 1868 for US presidential elections (Rhode and Strumpf, 2013) and 1503 for papal conclaves. Between 1868 and 2012, no clear favourite for the White House had lost the presidential election other than in 1948, when longshot Harry Truman defeated his Republican rival, Thomas Dewey. 2016 can be added to that list, following the defeat of strong favourite Hillary Clinton in the Electoral College.
3. The record of the betting markets in predicting the outcome of papal conclaves is somewhat more chequered and is considered in Vaughan Williams and Paton (2015) in which I examine, with my co-author Professor David Paton, the success of papal betting markets historically.
4. The potential of the betting markets and prediction markets (markets created specifically to provide forecasts) to assimilate collective knowledge and wisdom has increased in recent years as the volume of money wagered and number of market participants has soared. Betting exchanges alone now see tens of millions of pounds trading on a single election.
5. An argument made for the value of betting markets in predicting the probable outcome of elections is that the collective wisdom of many people is greater than that of the few. We might also expect that those who know more, and are better able to process the available information, would on average tend to bet more.
6. The lower the transaction costs (the betting public have not paid tax on their bets in the UK since 2001, and margins have fallen since the advent of betting exchanges) and the lower the costs of accessing and processing information (through the development of the Internet and search engines), the more efficient we might expect betting markets to become in translating information into forecasts. Modern betting markets might be expected for these reasons to provide better forecasts than ever.
7. There is plenty of anecdotal evidence about the accuracy of political betting markets, especially compared to the polls. The 1985 by-election in Brecon and Radnor is a classic example. On Election Day, July 4th, an opinion poll undertaken by the Mori polling organisation was published which gave Labour a commanding lead of 18 percent over the Liberal Alliance candidate. Ladbrokes simultaneously made the Liberal the 4/7 favourite. The Liberal won.
8. Forward 20 years to a BBC World Service live radio debate in 2005, in the run-up to the UK general election, when forecasts were swapped between the Mori representative and myself on the likely outcome of the election. I predicted a Labour majority of about 60, as I had done a few days earlier in the Economist magazine (Economist, April 14th, 2005) and on BBC Radio 4 Today (April, 18th, 2005), based on the betting at the time. The Mori representative predicted a Labour majority of over 100 based on their polling. The actual majority was 66.
9. More recent anecdotal evidence comes from the 2012 US presidential election. Barack Obama was the heavy favourite to win, while the average of the pollsters had the popular vote within 0.7%, and two leading polling organisations, Gallup and Rasmussen, had Mitt Romney ahead in final polls. Obama won by 3.9%.
10. During the later stages of the 2014 Scottish referendum campaign, the polling average had it relatively close (especially compared with the actual result), with more than one poll calling it for independence (one by 7%). The betting odds were always very strongly in favour of Scotland staying in the UK. The result echoed the 1995 Quebec separation referendum in Canada. There the final polling showed ‘Yes to separation’ with a six point lead. In the event, ‘No to separation’ won by one point. This late swing to the ‘status quo’ is credited by some with the confidence in the betting markets about a ‘NO’ outcome in Scotland.
11. In the 2015 general election in Israel, final polls showed Netanyahu’s Likud party trailing the main opposition party by 4% (Channel 2, Channel 10, Jerusalem Post), by 3% (Channel 1) and by 2% (Teleseker/Walla). Meanwhile, Israel’s Channel 2 television news on Election Day featured the odds on the online prediction market site, Predictwise. This gave Netanyahu an 80% chance of winning. The next day, Netanyahu declared that he had won “against the odds.” He actually won against the polls.
12. Polling averages during the 2015 UK general election campaign often showed Conservatives and Labour very close in terms of vote share. Meanwhile, the betting odds always had Conservative most seats as short odds-on. On the Monday before polling day, for example, the polling average had it essentially tied in terms of vote share, while Conservatives to win most seats was trading on the markets as short as 1/6.
13. For the 2015 Irish same-sex marriage referendum, the spread betting markets were offering a mid-point of 60% for YES to same-sex marriage, and 40% for NO. The average of the final opinion polls had YES on 71% and NO on 29%. The final result was 62%-38% for YES, much closer to the projection from the markets.
14. If this anecdotal evidence is correct, it is natural to ask why the betting markets outperform the opinion polls in terms of forecast accuracy. One obvious reason is that there is an asymmetry. People who bet in significant sums on an election outcome will usually have access to the polling evidence, while opinion polls do not take account of information contained in the betting odds (though the opinions expressed might). Sophisticated political bettors also take into account the past experience of how good different pollsters are, what tends to happen to those who are undecided when they actually vote, differential turnout of voters, what might drive the agenda between the dates of the polling surveys and election day itself, and so on. All of this can in principle be captured in the markets.
15. Pollsters, except perhaps with their final polls, tend to claim that they are not producing a forecast, but a snapshot of opinion. In contrast, the betting markets are generating odds about the final result. Moreover, the polls are used by those trading the markets to improve their forecasts, so they are a valuable input. But they are only one input. Those betting in the markets have access to much other information as well including, for example, informed political analysis, statistical modelling, focus groups and on-the-ground information including local canvass returns.
16. To test the reliability of the anecdotal evidence pointing to the superior forecasting performance of the betting markets over the polls, I collected vast data sets of every matched contract placed on two leading betting exchanges and from a dedicated prediction market for US elections since 2000. This was collected over 900 days before the 2008 election alone, and to indicate the size, a single data set was made up of 411,858 observations from one exchange alone for that year. Data was derived notably from presidential elections at national and state level, Senate elections, House elections and elections for Governor and Mayor. Democrat and Republican selection primaries were also included. Information was collected on the polling company, the length of time over which the poll was conducted, and the type of poll.
17. My co-author, Dr. James Reade, and I compared the betting over the entire period with the opinion polls published over that period, and also with expert opinion and a statistical model.
18. In a paper, titled ‘Forecasting Elections’ (Vaughan Williams and Reade, 2016b), published in the ‘Journal of Forecasting’ – see also Vaughan Williams and Reade, 2017, 2015), we specifically assessed opinion polls, prediction and betting markets, expert opinion and statistical modelling over this vast data set of elections in order to determine which performed better in terms of forecasting outcomes. We
considered accuracy, bias and precision over different time horizons before an election.
19. A very simple measure of accuracy is the percentage of correct forecasts, i.e. how often a forecast correctly predicts the election outcome.
20. A related but distinctly different concept to accuracy is unbiasedness. An unbiased vote share forecast is, on average, equal to the true vote share outcome. An unbiased probability forecast is also, on average, equal to the true probability that the candidate wins the election. Forecasts that are accurate can also be biased, provided the bias is in the correct direction. If polls are consistently upward biased for candidates that eventually win, then despite being biased they will be very accurate in predicting the outcome, whereas polls that are consistently downward biased for candidates that eventually win will be very inaccurate as well as biased
21. We also identified the precision of the forecasts, which relates to the spread of the forecasts.
22. We considered accuracy, bias and precision over different time horizons before an election. We found that the betting/prediction markets provided the most accurate and precise forecasts and were similar in terms of bias to opinion polls. We found that betting/prediction market forecasts also tended to improve as the elections approached, while we found evidence of opinion polls tending to perform worse.
23. In Brown, Reade and Vaughan Williams (2017), we examine the precise impact of the release of information from a leading opinion polling company on the political betting markets. To do this, we use an extensive data set of over 25 million contracts that records (anonymised) individual trader IDs for the buyers and sellers of the contracts and align this to the exact time of release of this information. We find that polling releases by this prominent opinion pollster quickly influences trading volumes and market prices, but that experienced and more aggressive liquidity-taking traders bide their time before entering the market after such news events. We find that the market prices are not at their most informative in the immediate aftermath of a poll release.
24. We also conducted research into the impact of breaking news on the markets, notably via social media and live blogging. In Vaughan Williams and Paton (2015) we use an extensive data set of contracts matched on a leading betting exchange specifically regarding the outcome of the 2013 papal election. We found that genuine information released on Twitter was not reflected in the betting markets, and was only very partially incorporated when published later on the live blog of a major British newspaper. One possible explanation is that the information was not believed as it related to a closed-door conclave (Vaughan Williams, 2015a, considers
closed door forecasting in another context). However, this finding was consistent in some respects with evidence in Vaughan Williams and Reade (2016a) about the limited impact on a leading betting exchange of major breaking news in a UK general election when released on Twitter, at least until the news was validated by traditional media.
25. In summary, the overwhelming consensus of evidence prior to the 2015 UK General Election pointed to the success of political betting markets in predicting the outcome of elections. In contrast, the 2015 UK General Election, the 2016 EU referendum in the UK, the 2016 US presidential election and the 2017 UK election, all produced results that were a shock to the great majority of pollsters as well as to the betting markets. In each case, the longshot outcome (Conservative overall majority, Brexit, Trump, No overall majority) prevailed.
26. There are various theories as to why the polls and markets broke down in these recent big votes. One theory is based on the simple laws of probability. An 80% favourite can be expected to lose one time in five, if the odds are correct. In the long run, according to this explanation, things should balance out.
27. A second theory to explain recent surprise results is that something fundamental has changed in the way that information contained in political betting markets is perceived and processed. One interpretation is that the widespread success of the betting markets in forecasting election outcomes, and the publicity that was given to this, turned them into an accepted measure of the state of a race, creating a perception which was difficult to shift in response to new information. To this extent, the market prices to some extent led opinion rather than simply reflecting it. From this perspective, the prices in the markets became somewhat sticky.
28. A third theory is that conventional patterns of voting broke down in 2015 and subsequently, primarily due to unprecedented differential voter turnout patterns across key demographics, which were not correctly modelled in most of the polling and which were not picked up by those trading the betting markets.
29. There are other theories, which may be linked to the above, including the impact of social media, and manipulation of this, on voter perceptions and voting patterns.
30. I explore how well the pollsters, ‘expert opinion’, modellers, prediction and betting markets performed in the 2017 UK general election in Vaughan Williams (2017a) – “Report card: how well did UK election forecasters perform this time?” and explore the polling failure in the 2015 UK general election in Vaughan Williams (2015b) – “Why the polls got it so wrong in the British election”, and some implications in a follow-up article (Vaughan Williams, 2015c).
31. I explore how well the pollsters, ‘expert opinion’, modellers, prediction and betting markets performed in the 2016 US presidential election in Vaughan Williams (2016) – “The madness of crowds, polls and experts confirmed by Trump victory”, and the implications of turnout projections for opinion polling in Vaughan Williams, 2017b – “Election pollsters put their methods to the test – and turnout is the key.”
References
BBC Radio 4 Today, Are betting markets a better guide to election results than opinion polls? April 18th, 2005, 0740. http://www.bbc.co.uk/radio4/today/listenagain/listenagain_20050418.shtml
Brown, A., Reade, J.J. and Vaughan Williams, L. (2017), ‘When are Prediction Market Prices Most Informative?’ Working Paper.
Economist, Punters v pollsters. Are betting markets a better guide to election results than opinion polls? April 14th, 2005. http://www.economist.com/node/3868824
Rhode, P.W. and Strumpf, K. (2013), ‘The Long History of Political Betting Markets: An International Perspective’, in: The Oxford Handbook of the Economics of Gambling, ed. L. Vaughan Williams and D. Siegel, 560-588.
Vaughan Williams, L. (2017a), ‘Report card: how well did UK election forecasters perform this time?’ The Conversation, June 10. http://theconversation.com/report-card-how-well- did-uk-election-forecasters-perform-this-time-79237
Vaughan Williams, L. (2017b), ‘Election pollsters put their methods to the test – and turnout is the key’, The Conversation, June 2. http://theconversation.com/election-pollsters-put- their-methods-to-the-test-and-turnout-is-the-key-78778
Vaughan Williams, L. (2016), ‘The madness of crowds, polls and experts confirmed by Trump victory’, The Conversation, November 9. http://theconversation.com/the-madness-of- crowds-polls-and-experts-confirmed-by-trump-victory-68547
Vaughan Williams, L. (2015a), ‘Forecasting the decisions of the US Supreme Court: lessons from the ‘affordable care act’ judgment,’ The Journal of Prediction Markets, 9 (2), 64-78.
Vaughan Williams, L. (2015b), ‘Why the polls got it so wrong in the British election’, The Conversation, May 8. http://theconversation.com/why-the-polls-got-it-so-wrong-in-the- british-election-41530
Vaughan Williams, L. (2015c), ‘How looking at bad polls can show Labour how to win the next election’, The Conversation, May 20. http://theconversation.com/how-looking-at-bad- polls-can-show-labour-how-to-win-the-next-election-42065
Vaughan Williams, L. and Paton, D. (2015), ‘Forecasting the Outcome of Closed-Door Decisions: Evidence from 500 Years of Betting on Papal Conclaves’, Journal of Forecasting, 34 (5), 391-404.
Vaughan Williams, L. and Reade, J.J. (2016a), ‘Prediction Markets, Social Media and Information Efficiency’, Kyklos, 69 (3), 518-556.
Vaughan Williams, L. and Reade, J.J. (2016b), ‘Forecasting Elections’, Journal of Forecasting, 35 (4), 308-328.
Vaughan Williams, L. and Reade, J.J. (2017), ‘Polls to Probabilities: Prediction Markets and Opinion Polls’, Working Paper.
Vaughan Williams, L. and Reade, J.J. (2015), ‘Prediction Markets and Polls as Election Forecasts’, Working Paper.
31 October 2017
