“Few prediction schemes have been more accurate, and at the same time more perplexing, than the Super Bowl Stock Market Predictor, which asserts that the league affiliation of the Super Bowl winner predicts stock market direction. In this study, the authors examine the record and statistical significance of this anomaly and demonstrate that an investor would have clearly outperformed the market by reacting to Super Bowl game outcomes.” Thus read the abstract to a paper published in 1990 by Thomas Krueger and William Kennedy in the very well regarded Journal of Finance.
“If the Super Bowl is won by a team from the old National Football League (now the NFC, or National Football Conference),” they wrote, “then the stock market is very likely to finish the year higher than it began. On the other hand, if the game is won by a team from the old American Football League (now the AFC, or American Football Conference), the market will finish lower than it began.”
It is important to note, though, that some AFC teams count as NFL wins because they originated in the old NFL, i.e. Pittsburgh Steelers, Baltimore Ravens (formerly Cleveland Browns, Baltimore/Indianapolis Colts).
Over the 22-year history of the Super Bowl to the date of submission of their study in 1988, they documented a 91% accuracy rate for their predictor.
What happened in 1989? The NFC team, San Francisco 49ers, beat the AFC’s Cincinnati Bengals– the stock market rose 27%.
Further confirmation of an idea first proposed by New York Times sportswriter Leonard Koppett, published as ‘The Super Bowl Predictor’ by investment advisor Robert H. Stovall in the January 1988 issue of ‘Financial World.’
So what happened in 1990? Well, the NFC’s San Francisco 49ers won a second consecutive victory, beating the AFC’s Denver Broncos, by 55 points to 10. But the stock market fell in 1990, by 4.3%.
But then the Super Bowl Predictor returned to form, correctly predicting the direction of the stock market in 1991, 1992, 1993, 1994, 1995, 1996, 1997. Since the launch of the Super Bowl that made for 28 correct predictions out of 31 (a success rate of 90.3%).
Since then, the Super Bowl Predictor has had a much more chequered record. predicted correctly only about half the time since 1997. In 2009, Robert Stovell, a strategist for Wood Asset Management in Sarasota, Florida, and an early champion of the Stock Market Indicator wrote: “Nothing seems to be working anymore {in the stock market]”. Used to be, I was only happy when it was over 90% (accurate), and when it was still above 80% I was pleased. But certainly 79% is still far above a failing grade.” (quoted on January 12, 2009, in MarketBeat (WSJ.com’s ‘inside look at the markets’).
Prior to Super Bowl 2017, the Predictor had called it right since then five times (2010, 2011, 2012, 2014 and 2015) and wrong twice (2013 and 2016). Over the whole run of Super Bowls, the indicator had been right a total of 40 times out of 50, as measured by the S&P 500 index. That year the AFC’s New England Patriots stormed from 25 points behind at one point in the game to beat the Atlanta Falcons by 34 points to 28 in overtime. It should have presaged a bad year for the stock markets, but in fact the markets climbed. They should also have climbed following the 2018 victory of the NFC’s Philadelphia Eagles over the Patriots, but the reverse happened. So the indicator, as of Super Bowl 2019, had been right 40 times out 52, with a failing record for each of the previous three years.
For those still retaining some faith in the indicator, and wanting to see a good year ahead for the stock market, the team to cheer for in 2019 was the LA Rams, of the NFC. Having said that, their opponents, the AFC’s New England Patriots, won the 2017 Super Bowl, and it presaged a good year on the markets. On the betting markets, the Patriots were the marginal favourites to win in 2019 and triumphed by 13 points to 3. We now wait to see what 2019 brings.
So is the Super Bowl Indicator a real forecasting tool, or is it simply descriptive of what has happened rather than containing any predictive value?
You decide!
Exercise
Do you consider that the Super Bowl Indicator has any value as a stock market predictor?
Reading and Links
Krueger, T.M. and Kennedy, W.F. (1990), An Examination of the Super Bowl Stock Market Predictor, Journal of Finance, 1990, 45 (2), 691-697. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1540-6261.1990.tb03712.x
Schmidt, B. and Clayton, R. Super Bowl Indicator and Equity Markets: Correlation not Causation 2017). Journal of Business Inquiry, 17, 2, 97-103. http://journals.uvu.edu/index.php/jbi/article/download/235/208
Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.
The Efficient Market Hypothesis (EMH), in its strictest form, holds that market prices or odds reflect all known information.
Prices or odds may change when new information is released, but this new information is unpredictable.
So the best estimate of the price or odds likely to prevail at any point in the future is the price now. This is dismal, if true, because it would mean that it is not possible to beat the market, except by chance. But this can’t be true, or else it creates a paradox. If the market was always efficient, traders would have no economic incentive to acquire information, since information acquisition and processing is not a costless activity and would add nothing to what can be obtained by simply looking at current market prices.
It has been mathematically proved (Grossman and Stiglitz, 1980, American Economic Review) that when information is not costless to obtain or process, asset prices can never fully reflect all the information available to traders. So in the real world, markets are not completely efficient. They cannot be. This result is a relief, at least in principle, to those seeking to beat the market.
The equilibrium proposed by Grossman and Stiglitz is one in which some profits are available to some investors.
Essentially, rational, ‘informed’ traders will seek to acquire and process new information whenever the benefits of doing so are greater then the costs. Up to the point, economists say, where the marginal costs equal the marginal benefits of obtaining and processing information.
But to the extent that trading is a zero-sum game, or worse, as most betting markets are, winners need losers.
So who are the winners and who are the losers?
To take the example of a poker game, good players need weak players in the game. In the financial literature these ‘weak players’ are known as ‘noise (or ‘uninformed’) traders.’ Noise makes trading in financial markets possible, and thus allows us to observe prices for financial assets. But noise also causes markets to be somewhat inefficient.
Imagine a world with no noise traders, no information costs, no trading costs – an efficient market, a market in which it would be irrational to place any trades, a market without traders, a strange kind of world. So the Efficient Market hypothesis in its strictest form cannot be true. So it is possible in principle to beat the market. How might we do this? By a ‘technical’ strategy which uses information contained in past and present prices or odds. Or by a ‘fundamental’ strategy which uses information about real variables, such as form. Or by some combination of these. Those practical matters can be examined at another time. Here we are looking exclusively at whether markets are informationally inefficient as a matter of principle, in a world of positive information costs, or indeed transaction costs, and we can conclude that the answer is Yes.
There is another systematic reason why markets might be inefficient in a broader sense, and that is the existence of asymmetric information. A notable case of this is called ‘adverse selection’, which refers to a situation in which the buyer or seller of a product knows something about the product quality or condition that the other party does not know, allowing them to have a better estimate of what the true cost of the product should be. This can lead to the breakdown of a market in which it exists. George Akerlof’s seminal article (‘The Market for Lemons’), published in 1970 in the Quarterly Journal of Economics, which examined the problem of adverse selection on the market for used cars has important implications for any market characterised by adverse selection.
Here is the problem. If Mr. Smith wants to SELL me his horse, do I really WANT to buy it? It’s a question as old as markets and horses have existed, but it was for many, many years, one of the unspoken questions of economics. So how do we solve this paradox? For most of the history of economics, the answer was quite simple. Simply assume perfect markets and perfect information, so the horse buyer would know everything about the horse, and so would the seller, and in those cases where the horse is worth more to the buyer than the seller, both can strike a mutually beneficial deal. There’s a term for this: ‘gains from trade’.
In the real world, the person selling the horse is likely to know rather more about it than the potential purchaser. This is called ‘asymmetric information’, and the buyer is facing what is called an ‘adverse selection’ problem, as he has adverse information relative to the seller. Akerlof had become intrigued by the way in which economists were limited by their assumption of well-functioning markets characterised by perfect information. For example, the conventional wisdom was that unemployment was simply caused by money wages adjusting too slowly to changes in the supply and demand for labour. This was the so-called ‘neo-classical synthesis’ and it assumed classic markets, albeit they could be a bit slow to work.
At the same time, economists had come to doubt that changes in the availability of capital and labour could in themselves explain economic growth. The role of education was called upon as a sort of magic bullet to explain why an economy grew as fast as it did. But how can we distinguish the impact on productivity of the education itself from the extent to which education simply helped grade people? The idea here is that more able people will tend on average to seek out more education. So how far does education contribute to growth, and how far is it simply a signal and a screen for employers? In the real world, of course, these signals could be useful because employers are like the horse buyers – they know less about the potential employees than the employees know about themselves, the classic adverse selection problem.
Akerlof turned to the used car market for the answer, not least because at the time a major factor in the business cycle was the big fluctuation in sales of new cars. Just like in the market for horses, the first thing a potential used car buyer is likely to ask is “Why should I WANT to buy that used car if he wants so much to SELL it to me”. The suspicion is that the car is what Americans call a ‘lemon’, a sub-standard pick of the crop. Owners of better quality used cars, called ‘plums’, are much less likely to want to sell.
Now let’s say that you’re willing to spend £10,000 on a plum but only £5,000 on a lemon. In such a case, the best price you’d be willing to pay is about £7,500, and only then if you thought there was an equal chance of a lemon and a plum. At this price, though, sellers of the plums will tend to back out, but sellers of the troublesome lemons will be very happy to accept your offer.
But as a buyer you know this, so will not be willing to pay £7,500 for what is very likely to be a lemon. The prices that will be offered in this scenario may well spiral down to £5,000 and only the worst used cars will be bought and sold. The bad lemons have effectively driven out the good plums, and buyers will start buying new cars instead of plums. Just as with horses, asymmetric and imperfect information in the used car market has the potential, therefore, to severely compromise its effective operation.
We can assume that the demand for used cars depends most strongly on two variables – the price of the car and the average quality of used cars traded. Both the supply of used cars and the average quality will depend upon the price. In equilibrium, the supply equals the demand for the given average quality. As the price declines, normally the quality will also fall. And it’s quite possible that no cars will be traded at any price.
This same idea shows the problem of medical insurance. In a free market for medical insurance, people above a certain age, for example, will have great difficulty in buying medical insurance. So why doesn’t the price rise to match the risk? The answer is that as the price level rises the people who insure themselves will be those who are increasingly certain that they will need the insurance. In consequence, the average medical condition of insurance applicants deteriorates as the price level rises – such that no insurance sales [for these age groups] may take place at any price. This is strictly analogous to the car case, where the average quality of used cars supplied fell with a corresponding fall in the price level. The principle of ‘adverse selection’ is potentially present in all lines of insurance. Adverse selection can arise whenever those seeking insurance have freedom to buy or not to buy, to choose the insurance plan, and to continue or discontinue as a policy holder.
There are ways to counteract the effects of quality uncertainty, such as guarantees on consumer durables. Brand names perform a complementary function. Brand names not only indicate quality but also give the consumer a means of retaliation if the quality does not meet expectations. Chains – such as hotel chains or restaurant chains – are similar to brand names. Licensing practices also reduce quality uncertainty. And education and labour markets themselves have their own ‘brand names.’
So one of the big problems that confront markets is the fact that some of the participants often don’t know certain things that others in the market do know. This includes the market for most consumer durables, virtually all jobs markets, many financial markets, etc. In these cases, one of the roles of economics is to ask what system of incentives is most likely to address this problem of imperfect and asymmetric information. In economics, signalling is the idea that one party (termed the agent) credibly conveys some information about itself to another party (the principal). Signals should be distinguished from what have been called ‘indices’ ( a term coined by Robert Jervis in his 1968 PhD thesis). Indices are attributes over which one has no control. Think of these as generally unalterable attributes of something or someone. Signals are things that are visible and that are in part designed to communicate. In a sense, they are alterable attributes. So employees send a signal about their ability level to the employer by acquiring certain education credentials. The informational value of the credential comes from the fact that the employer assumes it is positively correlated with having greater ability.
Education credentials can be used as a signal to the firm, indicating a certain level of ability that the individual may possess; thereby narrowing the informational gap. In a seminal article on signalling, published in 1973 by Michael Spence, he proposes the key assumption that good-type employees pay less for one unit of education than bad-type employees. In Spence’s model it is optimal for the higher ability person to obtain the credential (the observable signal) but not for the lower ability individual. The premise for the model is that a person of high ability has a lower cost for obtaining a given level of education than does a person of lower ability. Cost can be in terms of tuition costs, or intangible costs, such as stress and time and effort in obtaining the qualification. Thus, if both individuals act rationally it is optimal for the higher ability person to obtain the qualification but not for the lower ability person so long as the employers respond to the signal correctly. This will result in the workers self-sorting into the two groups. For this to work, it must be excessively costly, or impossible, to project a false image. The basic argument follows from the intuition that a behaviour that costs nothing can be equally well taken by anyone and so provides no information. It follows that perceivers should focus on behaviour which is costly to undertake. Signalling is an action by a party with good information that is confined to situations of asymmetric information.
The concept of screening should be distinguished from signalling, the latter implying that the informed agent moves first. When there is asymmetric information in the market, screening can involve incentives that encourage the better informed to self-select or self-reveal.
Joseph Stiglitz pioneered the theory of screening, examining how a less informed party can induce the other party to reveal their information. They can provide a menu of choices in such a way that the optimal choice of the other party depends on their private information. For example, a theme park might offer a menu of gold and silver tickets, where the more expensive gold ticket allows the customer to avoid the queue at rides. This will induce the customers to self-sort and reveal genuine information as to the value they place on their time and their desire to avoid the queues.
So can markets be efficient? In the strictest informational sense, the answer is No. But there are ways in which they can be made more efficient in the broader sense of the term than they would be in their natural state.
Reading and Links
Missing Markets: Insurance and Lemons. CORE. https://core-econ.org/the-economy/book/text/12.html#126-missing-markets-insurance-and-lemons
The Efficient Market Hypothesis and Its Critics. Burton Malkiel. 2003.
Click to access Efficient%20Market%20Hypothesis%20and%20its%20Critics%20-%20Malkiel.pdf
Akerlof, G. (1970), The Market for Lemons: Quality, Uncertainty and the Market Mechanism. Quarterly Journal of Economics. 84:488-500.
Grossman, S.J. and Stiglitz, J. (1980), The Impossibility of Informationally Efficient Markets, American Economic Review, June, 393-408.
Jervis, R. Signaling and Perception, in Kristen Monroe, ed., Political Psychology (Earlbaum, 2002).
Spence, M. (1973). “Job Market Signaling”. Quarterly Journal of Economics (The Quarterly Journal of Economics, 87 (3): 355–374.
Joseph E. Stiglitz, 1975. “The Theory of ‘Screening’, Education, and the Distribution of Income,” American Economic Review, 65(3), pp. 283–300.
Joseph E. Stiglitz, 1981. “Information and the Change in the Paradigm of Economics”, Nobel Prize Lecture, December 8.
A. Michael Spence, 1981. “Signaling in Retrospect and the Informational Structure of Markets”, Nobel Prize Lecture, December 8.
George A. Akerlof, “Behavioral Macroeconomics and Macroeconomic Behavior”, Nobel Prize Lecture, December 8.
The ‘over-round’
In a two-horse race, if both horses have an equal chance of winning (objectively), and both are offered at evens, then the expected profit of the market-maker (and of the bettor) is zero, ignoring operating, information and transactions costs.
In a two-horse race, if both are offered at evens (regardless of the respective probabilities of victory of the two horses), then it would require a stake of £x (split equally between the two horses) to be sure of being returned that £x (a net profit of zero) whichever horse wins. In this circumstance, the over-round of the bookmaker is said to be 100%, i.e. a notional profit margin of zero.
In practice, even if the notional profit margin is zero, the bookmaker is at a disadvantage if the horses are not equally matched, as a sophisticated bettor can take advantage by staking more than half on the horse with the greater chance of winning.
More generally, the over-round does not yield an accurate indicator of the bookmaker’s profit margin if bettors do not stake across all options in such a way as to ensure that their total stake of £x yields a certain return of £x, factored by the over-round.
For example, if the over-round is 120%, the notional margin to the bookmaker is 20%, and put simply bettors would have to stake £120 to ensure a return of £100. Say, for instance, that both horses in a 2-horse race are being offered at 4 to 6. Then the bettor would need to stake £60 on each (£120 in total) to be guaranteed a return of £100 (£40 plus the £60 stake returned) whichever horse won. In such circumstances, the bookmaker is guaranteed at 20% profit, regardless of the outcome.
If one horse is offered at 4 to 6 and the other at 6 to 4, the bettor can guarantee a zero profit (and loss) by staking £60 at 4 to 6 and £40 at 6 to 4. That way, a £100 return is guaranteed for a total stake of £100, regardless of the outcome. Again, if the horse offered at 4 to 6 is actually a 4 to 7 chance, and bettors stake exclusively on this horse, their expected return is positive (although there is now a risk of losing the entire stake), and the expected return of the bookmaker is negative (though the actual return may be positive).
To summarize, the notional margin, as implied in the over-round, formally equates to the actual margin only if bettors stake proportionately more on the outcome offered at shorter odds.
Creating an over-round
Take as an example the following odds offered about a binary proposition to players, where the odds-maker believes that the objective probability of X winning is 1 in 5 (0.2) and of Y winning is 4 in 5 (0.8).
Assuming an over-round of 100% (i.e. margin of zero), the odds-setter (taken here to be a bookmaker) would set the following odds:
Odds about X = 5.0 (4 to 1): Odds about Y = 1.25 (1 to 4).
Assume now that the odds-maker wishes to create an over-round of 108%.
In each case the odds offered should be cut, by 8 per cent in each case. So 8% of 5.0 = 0.4. Deducting 0.4 from 5.0 gives 4.6. 8% of 1.25 = 0.1. Deducting 0.1 from 1.25 gives 1.15.
So in the particular example, the odds offered would be as follows:
Odds about X = 4.6; Odds about Y = 1.15.
Assuming an equal amount bet (say £1,000) bet on both sides of the proposition (i.e. a total of £2,000, consisting of perhaps 200 people betting £10 each), the profit (loss) to the bookmaker would vary depending on the outcome.
If horse X wins, the bookmaker will pay out:
4.6x £1,000 = £4,600
Total amount staked (on X and Y) = £2,000.
Net profit to bookmaker if horse X wins = £2,000 – £4,600 = – £2,600
So if horse X wins, bookmaker loses £2,600.
If horse Y wins, the bookmaker will pay out:
1.15 x £1,000 = £1,150
Total amount staked (on X and Y) = £2,000
Net profit to bookmaker if horse Y wins = £2,000 – £1,150 = £850
Expected value of profit = expected value of profit from X + expected value of profit from Y = (-£2,600) x 0.2 + (£850) x 0.8 = -£520 + £680 = £160.
This is assuming that the implied probabilities in the odds are the correct probabilities, i.e. odds of 4/1 = probability of 1/5 (0.2); odds of 1/4 = probability of 4/5 (0.8).
Note also that £160 = 8% of total stake on X and Y (£2,000).
This all assumes, as observed, that the objective probabilities are correctly observed and that the amount staked on both sides of the proposition are equal.
Even if we assume that the objective probabilities are correctly observed then there is still substantial volatility of outcome (i.e. risk) for the bookmaker. If the objective probability is incorrectly observed, however, the outcome for the bookmaker may be worse, i.e. a systematic loss.
For example, assume the probability of horse X winning is actually 25%; assume probability of horse Y winning is 75%.
At the given odds levels, and assuming equal stakes across both propositions, we derive the following.
As above, if horse X wins, the bookmaker will pay out, as before:
4.6 x £1,000 = £4,600
Total amount staked (on X and Y) = £2,000.
Net profit to bookmaker if horse X wins = £2,000 – £4,600 = – £2,600
So if horse X wins, bookmaker loses £2,600.
If horse Y wins, the bookmaker will pay out, as before:
1.15 x £1,000 = £1,150
Total amount staked (on X and Y) = £2,000
Net profit to bookmaker if horse Y wins = £2,000 – £1,150 = £850
Expected value of profit = expected value of profit from X + expected value of profit from Y = (-£2,600) x 0.25 + (£850) x 0.75 = -£650 + £637.50 = -£12.50, i.e. a loss of £12.50.
Insofar as the objective probability of horse X winning is greater than 20%, the expected profit to the bookmaker will decline. At 24.65%, the profit (rounded to the nearest pound) can be shown to be equal to zero, and above that to turn negative.
Assume objective probability of horse X winning = 0.2465; objective probability of horse Y winning = 0.753.
Then, expected value of profit = expected value of profit from X + expected value of profit from Y = (-£2,600) x 0.2465 + (£850) x 0.7535 = -£640 + £640 = 0
To the extent that the objective probabilities are inaccurately estimated, therefore there is significant potential from the bookmaker’s point of view for a negative expected (as well as actual) profit.
Using the probabilities from the original example, the staking pattern from the bettor’s point of view that will lead to a unique expected loss (8% in this case) across both betting propositions is to bet more on the favourite and less on the longshot, in this case £1,600 and £400 respectively.
This leads to the following outcomes:
Profit to a £400 bet on horse X (if it wins) at 4.60 = £1,840
Profit to a £1,600 on horse Y (if it wins) at 1.15 = £1,840
Guaranteed profit by staking these sums on each horse from the bettor’s point of view = – £160, i.e. a net loss of 8% of total stake.
Insofar as bettors can be induced to bet in these proportions, the operator is guaranteed a profit regardless of the outcome. If the average bet size is the same for bets made on either side, then we need four times as many bettors on the favourite as the longshot to achieve this. Otherwise, the same outcome can be achieved if those who are backing the favourite bet four times as much in total as those backing the longshot.
Another way to manage risk in the face of unbalanced staking patterns is to move the odds so as to limit the maximum loss.
In order to reduce the maximum downside (i.e. when X wins) the bookmaker may move the odds in such a way as to attract money on one horse and away from the other horse. To do this, the odds about one horse may be lengthened and those about the other horse shortened before a negative downside is occurred to ether outcome. While such a strategy may reduce the exposure of the operator, the price may be paid in reduced profits.
Ultimately, line management from the operator’s point of view is about balancing risk and return, while maintaining an edge in favour of the ‘house’. From the bettor’s point of view, it is about exploiting opportunities which might arise where one (or more) of the odds making up that over-round are mispriced in the bettor’s favour, a possibility which can arise even when the over-round favours the ‘house.’
The history of forecasting election outcomes for betting purposes is well-documented for open elections, such as presidential elections in the US, and for longer, though in less detail for the closed elections of the Pope.
In the former, it has been traced, according to contemporaries, to the election of George Washington and has existed in organized markets since the 1860s.
The first recorded example of betting on a papal election, however, can be traced much further back, to the papal conclave of September, 1503, at which time it was considered already an old practice.
The brokers in the Roman banking houses (sensali) who made books and offered odds on who would be elected, made Cardinal Francesco Piccolomini the 100/30 favourite, ahead of Cardinals Guiliano della Rovere (100/15) and Georges d’Amboise (the favourite if judged by the vocal support of the street crowds) at 100/13.
Although Piccolomini is thought to have trailed in the first round of voting with 4 votes to 13 for d’Amboise and 15 for della Rovere, Piccolomini apparently benefited from a switch of votes from d’Amboise to himself in subsequent voting, and duly became Pope Pius III.
The bookmakers were proved right.
The next conclave for which we have the betting odds is that of December, 1521, in which odds were offered on no fewer than twenty cardinals.
Giulio de’Medici, the cousin of Leo X, was the betting favourite, at 100 to 25 (4/1), followed closely by Cardinal Alessandro Farnese at 100/20 (5/1), whose odds shortened to 100 to 40 (5/2) after a Roman mob plundered his house.
Though Farnese at one point came close to being elected Pope, he could not reach the required two-thirds of the vote, and ultimately the cardinals looked outside of the conclave, electing Adrian of Utrecht as Pope Adrian VI.
In the papal election of 1549-50, Cardinal Gianmaria del Monte (who was eventually elected Julius III) had opened in the betting as the 5/1 (against) favourite, but within three days Cardinal Reginald Pole had been established at odds of 4/1. On December 5, as balloting began, Pole was clear favourite at 100/95.
On that day, he received 26 of the 28 votes that would have given him the two-thirds majority required to elect him Pontiff. Although on the point of being made Pope by acclamation, Pole insisted on waiting until he won the formal two-thirds majority.
By the time that four additional French cardinals, opposed to Pole, arrived December 11, however, he was trading at 5/2, and a month later he was being offered at odds of 100/16. His chance had gone.
In the papal conclave of April, 1555, Gian Pietro Carafa stood a good chance of being elected pope, ranking among the top three papabile in the first ballot of the conclave. It is reported that brokers intentionally “spread the rumour that Naples [i.e. Carafa] had died”, in order to attract money on the other candidates. Carafa went on to be elected Pope.
The first 1590 conclave, in September, is the earliest in which reports of insider trading emerged, when two of the key influencers of votes in the conclave, Cardinals Montalto and Sforza secretly agreed to join forces in support of Niccolo Sfondrato.
It is reported that both made fortunes betting on him, at odds of 10/1 the day before he was elected as Pope Urban VII.
As the conclave opened, he had been trading at 100/11, compared to Giambattista Castagna, who was offered at 100/22.
During the second conclave of 1590, Cardinal Gabriele Paleotti at one point increased to an implied probability of 70 per cent in the betting: “Wednesday at the twenty-second hour rumour began to hold Paleotti as pope, and it went on increasing so that at the end of the morning, he had risen to 70 in the wagering.” The odds were not reflected in the outcome. Giovanni Battista Castagna was elected Pope Urban VII.
In 1603, despite a papal bull ‘Cogit Nos’, by Pope Gregory XIV, issued on March 21, 1591, which imposed a penalty of excommunication for wagering on papal or cardinal elections, or length of the papal reign, 21 cardinals were quoted odds of winning by the bookmakers.
The favourite was Cesare Baronius, at 10/1. The closest he came to election, however, was gaining the support of 32 cardinal electors, nine short of the required tally. Ultimately, Alessandro de’Medici became Pope Leo XI.
This ban on papal betting was abrogated in 1918 by Pope Benedict XV’s reforms.
In relation to the papal conclave of 1878, a New York Times correspondent wrote that: “The death and advents of the Popes has always given rise to an excessive amount of gambling in the lottery, and today the people of Italy are in a state of excitement that is indescribable.” There is no available known record, however, of the odds offered on that election. Similarly, the papal conclaves of 1903 and 1922 also attracted a great deal of wagering interest, which was reported widely in the international press, though no known record remains of the odds offered.
Bookmaker odds in Milan are available, however, for the 1958 conclave, which show Cardinal Angelo Roncalli the 2/1 favourite, followed by Cardinals Agagianian and Ottaviani at 3/1, then Stefan Wyszynski and Giuseppe Siri at 4/ 1. The odds were justified when Cardinal Roncalli was elected Pope John XXIII.
For the first conclave of 1978, bookmakers in London were offering odds of 5/2 about Cardinal Sergio Pignedoli, 7/2 about Sebastian Baggio and Ugo Poletti and 4/1 about Carlo Benelli. The best odds about a non-Italian were 8/1 about Johannes Willebrands. Of these only Pignedoli showed any strength in the voting, unconfirmed reports of the voting indicating that he obtained about 18 votes in the first ballot, compared to about 23 for Albino Luciani and 25 for Giuseppe Siri. Ultimately, Cardinal Luciani was elected Pope John Paul I.
For the second conclave of 1978, following the death of Pope John Paul I, the Associated Press noted that:
“Once again, there is no odds-on favourite to be elected as the new pope of the Roman Catholic Church … mentioned most often are Corradi Ursi, Salvatore Pappalardo, Ugo Poletti, Giuseppe Siri, Giovanni Colombo, Giovanni Benelli and Antonio Poma… Non-Italian front-runners include Argentinian Eduardo Pironio, 57, and Dutchman Johannes Willebrands, 68.”
Cardinal Carol Wojtyla, archbishop of Krakow, was elected Pope John Paul II, after the eighth ballot.
In 2005, Cardinal Joseph Ratzinger opened in the betting at 12/1 with one major bookmaker.
At that point, another leading bookmaker made Cardinal Arinze favourite, with Archbishop Tettamanzi, Cardinal Ratzinger and Cardinal Hummes as the next in the betting.
After three ballots, Ratzinger was favourite on two out of the three online betting boards monitored by CNN, his shortest odds being 5/2. He was at that point in the conclave being offered at between 9/2 favourite and 11/2 second favourite.
By the last day of the conclave, Cardinal Ratzinger had shortened to a clear 3/1 favourite, closely followed by Carlo Martini at 100/30 and Jean-Marie Lustiger at 7/2.
By that point, Francis Arinze had dropped back to 8/1, the same price as Claudio Hummes (who was now in the top six in all three lists). He had opened at 12/1. At the same time, Jorge Bergoglio was trading at 12/1 and Angelo Scola at 25/1.
According to a newspaper report, “among those speculating about who the next pope will be, the big money – literally is on Joseph Ratzinger, who delivered a stirring homily at the late Pope’s funeral … As of yesterday, most gambling sites gave Ratzinger … the best odds, with a host of second-tier candidates not far behind.”
Side bets were available on the name of the next pope.
Benedict was the 3 to 1 favourite. John Paul was offered at 7 to 2. Pius at 6 to 1. Peter at 8 to 1. John at 10 to 1.
Joseph Ratzinger was elected Benedict XVI.
The first show of odds following the 2005 conclave for the successor to Benedict was: Angelo Scola 6-1; Christoph Schonborn 7-1; Oscar Maradiga 7-1; Jorge Bergoglio 9-1; Francis Arinze 10-1; Dionigi Tettamanzi 25-1.
In 2013, a survey of the so-called experts made Angelo Scola favourite, although the expert assessment and the betting odds diverged to some degree after that. A survey of Vatican watchers by YouTrend.It listed Timothy Dolan of the United States as the second most likely pope, followed by Cardinals Marc Ouellet, Odilo Scherer and Thomas O’Malley. Luis Tagle of the Phillipines was sixth was ranked sixth. Some of the bookmakers’ favourites, notably Cardinals Turkson and Bertone, did not appear on this experts’ list.
The implied win probabilities in the Oddschecker display of best bookmaker odds on March 3rd were as follows: Scola, 23%; Turkson, 22%; Bertone, 16%; Ouellet, 12%; Bagnasco, 10%; Ravasi, 8%; Sandri, 7%; Erdo, 7%; Scherer, 6%; Schonborn, 6%; Maradiaga, 5%; Arinze, 5%; O’Malley, 4%; Tagle, 4%; Bergoglio, 4%; Dolan, 3%; Hummes, 3%; Grocholewski, 3%; Dziwisz, 3%; Carrera, 2%; Piacenza, 2%; Marini, 2%; Rylko, 2%; Sarah, 2%; Martino 2%. Note that the probabilities add up to more than 100 due to rounding and the in-built margin in the bookmakers’ odds.
A Washington Post analysis, published on March 11th, calculated the implied probabilities of the ‘frontrunners’ based on betting sites including the betting exchange, Betfair.
The results were: Scola, 19.9%; Scherer, 11.9%, Turkson, 9.7%; Bertone, 8.3%; Ouellet, 5%; Erdo, 4.9%; O’Malley, 3.8%; Schonborn, 3.7%; Ravasi, 3.4%; Tagle, 2.6%; Sandri, 2.5%; Dolan, 2.3%; Bagnasco, 2.3%.
On the morning of the final ballot, on March 13th, 2013, the Guardian newspaper Liveblog reported that: “Ladbrokes has Scola at 9/4, Scherer at 3/1 and Turkson at 6/1. Paddy Power has Scola at 11/4, Scherer at 7/2 and Turkson at 9/2.”
A post by Vatican Insider journalist Andrea Tornielli was also published ahead of the final ballot, stating that “The first casting of ballots, which will serve as a primary, will see votes merge towards the Archbishop of Milan, Angelo Scola, as well as the Canadian Marc Ouellet and the Brazilian Odilo Pedro Scherer. Some votes might also go to the Argentinian Jorge Mario Bergoglio and to other cardinals mentioned during the past few hours, such as the Sinhalese Malcolm Ranjith, the American Timothy Dolan and others. It remains to be seen if, among these nominations, there will be one able to garner at least two-thirds of the votes.”
Despite this level of detail, the same article declared that “From the moment cardinal electors entered the Santa Marta residence, they have not had any contact with the outside world and have to use protected paths that are constantly under surveillance, to get about. Every space they enter is monitored and blocked off from all forms of communication… All those who have to access the Holy See during the Conclave are bound to the strictest confidentiality.”
Then came the three strikes of the clock.
The first strike of the clock was a post by Vatican Insider journalist Giacomo Galeazzi, time-stamped on Vatican Insider Twitter at 8.24am that morning. It noted that there were only five candidates left in the running: Scola, Scherer, Bergoglio, Ouellet, Dolan.
The second strike of the clock was a link to a post by Vatican Insider journalist Giacomo Galeazzi, time-stamped on Vatican Insider Twitter at 11.12am: “After the first negative scrutinies, lunch breaks and dinners in Santa Marta House, the cardinals’ residence during the conclave, become opportunities for informal discussions on disregarding candidates with weaker consensuses, to the advantage of the papabile who have obtained more votes so far (Scola, Bergoglio, Ouellet).”
So, by 11.12 am, according to Galeazzi, it was effectively down to three – Cardinals Scola, Bergoglio and Ouellet.
The third strike of the clock came at 11.57am, when the Guardian Liveblog reported that: “La Stampa’s Vatican Insider claims that most of the votes have been going to Cardinals Scola, Bergoglio and Ouellet. This morning it was claiming most of them were going to Scola, Scherer, Bergoglio, Ouellet and Dolan. But it’s hard to know where they can be getting this information from.”
So what was actually going on while the clock was striking once, twice, thrice? A post-election report, published in La Repubblica, claims that Scola received approximately 35 votes in the first vote, to 20 for Bergoglio and 15 for Ouellet. National Catholic Reporter also reports that there was some support for Scherer: “After two rounds of voting Wednesday morning, it had become clear that neither Scola nor Scherer were likely to cross the finish line and gain the 77 votes needed for election … The fourth ballot, the first of Wednesday afternoon, saw Bergoglio separate himself from the pack.”
So it appears that Galeazzi’s tweeted reports conformed broadly to what we now understand to have been the case. Somehow it seems he knew!!!
But the markets failed to respond except for a flicker towards Bergoglio on the exchanges after the Guardian Liveblog posted the niche Galeazzi tweets to their wider audience.
So, either the new information was not (for good or bad reason) sufficiently believed. Or it was for the most part overlooked by those trading on the exchanges. Or the market was not sufficiently liquid to make it possible to earn a significant return, so most sophisticated traders did not bother to participate.
Whatever the reason, the betting markets did not perform as well as might have been expected in responding to new public information, which subsequently turned out to be accurate, unless the reports were accurate by sheer chance and deserved to be disbelieved. After all, it was ‘Vatican Insider’ itself that declared how “All those who have to access the Holy See during the Conclave are bound to the strictest confidentiality.”
This cannot be explained either in terms of the fog of conflicting signals as there were no other credible sources issuing conflicting information.
So the ‘Galeazzi anomaly’, as I term it, turns into a mystery, partly because he seemed to know what he shouldn’t have known, but also because hardly anyone seemed to believe him. Giacomo Galeazzi shouted wolf, and there was a wolf! It is a lesson that some, in an efficient market, will now have learned.
Further Reading.
Vaughan Williams, L. and Paton, D., (2015), Forecasting the Outcome of Closed-Door Decisions: Evidence from 500 Years of Betting on Papal Conclaves, Journal of Forecasting, 34 (5), August, 391-404.
The Favourite-Longshot Bias is the well-established tendency in most betting markets for bettors to over-bet ‘longshots’ (events with long odds, i.e. low probability events) and to relatively under-bet ‘favourites’ (events with short odds, i.e. high probability events).
Assume, for example, that Mr. Miller and Mr. Stiller both start with £1,000.
Now Mr. Miller places a level £10 stake on 100 horses quoted at 2 to 1
Mr. Stiller places a level £10 stake on 100 horses quoted at 20 to 1.
Who is likely to end up with more money at the end?
My Ladbrokes Flat Season Pocket Companion for 1990 provides a nicely laid out piece of evidence here for British flat horse racing between 1985 and 1989. The table conveniently presented in the Companion shows that not one out of 35 favourites sent off at 1/8 or shorter (as short as 1/25) lost between 1985 and 1989. This means a return of between 4% and 12.5% in a couple of minutes, which is an astronomical rate of interest. The point being made is that broadly speaking the shorter the odds, the better the return. The group of ‘white hot’ favourites (odds between 1/5 and 1/25) won 88 out of 96 races for a 6.5% profit. The following table looks at other odds groupings.
Odds Wins Runs Profit %
1/5-1/2 249 344 +£1.80 +0.52
4/7-5/4 881 1780 -£82.60 -4.64
6/4 -3/1 2187 7774 -£629 -8.09
7/2-6/1 3464 21681 -£2237 -10.32
8/1-20/1 2566 53741 -£19823 -36.89
25/1-100/1 441 43426 -£29424 -67.76
An interesting argument advanced by the Strathclyde-based statistician Dr. Robert Henery in 1985 is that the favourite-longshot bias is a consequence of bettors discounting a fixed fraction of their losses, i.e. they underweight their losses compared to their gains.
This argument also explains an observed link between the sum of bookmakers’ prices and the number of runners in a race. The prices being summed here are simply the odds. If, for example, odds of 3/1 (against) are offered about each of the five horses in a race, the implied probability of winning for each horse is ¼ and the sum of prices is 5/4.
In this context, an ‘over-round’ is defined as the excess of the sum of prices over 1, in this case ¼.
The rationale behind Henery’s hypothesis is that bettors will tend to explain away and therefore discount losses as atypical, or unrelated to the judgment of the bettor.
This is consistent with contemporaneous work on the psychology of gambling, such as Gilovich in 1983 and Gilovich and Douglas in 1986.
These studies demonstrate how gamblers tend to discount their losses, often as ‘near wins’ or the outcome of ‘fluke’ events, while bolstering their wins.
Let’s look more closely at how the Henery odds transformation works.
If the true probability of a horse losing a race is q, then the true odds against winning are q/(1-q).
For example, if the true probability of a horse losing a race (q) is ¾, the chance that it will win the race is ¼, i.e. 1- ¾. The odds against it winning are: q/(1-q) = 3/4/(1-3/4) = 3/4/(1/4) = 3/1.
Henery now applies a transformation whereby the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is ½ (q=1/2), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = ½. ¾ = 3/8, i.e. a subjective chance of winning of 5/8.
So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 50% (Evens, i.e. q=1/2) is 3/5 (60%), i.e. odds-on.
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 3/8/(1-3/8) = 3/8/(5/8) = 3/5
If the true probability of a horse losing a race is 80%, so that the true odds against winning are 4/1 (q = 0.8), then the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is 4/5 (q=0.8), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = 3/4. 4/5 = 12/20, i.e. a subjective chance of winning of 8/20 (2/5).
So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 80% (4 to 1, i.e. q=0.8) is 6/4 (40%).
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 12/20 / (1-12/20) = 12/8 = 6/4
To take this to the limit, if the true probability of a horse losing a race is 100%, so that the true odds against winning are ∞ to 1 against (q = 1), then the bettor will again assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is 100% (q=1), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = 3/4. 1 = 3/4, i.e. a subjective chance of winning of 1/4.
So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 100% (∞ to 1, i.e. q=1) is 3/1 (25%).
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 3/4 / (1/4) = 3/1
Similarly, if the true probability of a horse losing a race is 0%, so that the true odds against winning are 0 to 1 against (q = 0), then the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is 0% (q=0), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = 3/4. 0 = 0, i.e. a subjective chance of winning of 1.
So the perceived (subjective) odds associated of winning with true (objective odds) of losing of 0% (0 to 1, i.e. q=0) is also 0/1.
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 0 / 1 = 0/1
This can all be summarised in a table.
| Objective odds (against) Subjective odds (against) | |
| Evens 3/5 | |
| 4/1 6/4 | |
| Infinity to 1 3/1 | |
| 0/1 0/1 |
We can now use these stylised examples to establish the bias.
In particular, the implication of the Henery odds transformation is that, for a given f of ¾, 3/5 is perceived as fair odds for a horse with a 1 in 2 chance of winning.
In fact, £100 wagered at 3/5 yields £160 (3/5 x £100, plus stake returned) half of the time (true odds = evens), i.e. an expected return of £80.
£100 wagered at 6/4 yields £250 (6/4 x £100, plus the stake back) one fifth of the time (true odds = 4/1), i.e. an expected return of £50.
£100 wagered at 3/1 yields £0 (3/1 x £100, plus the stake back) none of the time (true odds = Infinity to 1), i.e. an expected return of £0.
It can be shown that the higher the odds the lower is the expected rate of return on the stake, although the relationship between the subjective and objective probabilities remains at a fixed fraction throughout.
Now on to the over-round.
The same simple assumption about bettors’ behaviour can explain the observed relationship between the over-round (sum of win probabilities minus 1) and the number of runners in a race, n.
If each horse is priced according to its true win probability, then over-round = 0. So in a six horse race, where each has a 1 in 6 chance, each would be priced at 5 to 1, so none of the lose probability is shaded by the bookmaker. Here the sum of probabilities = (6 x 1/6) – 1 = 0.
If only a fixed fraction of losses, f, is counted by bettors, the subjective probability of losing on any horse is f(qi), where qi is the objective probability of losing for horse i, and the odds will reflect this bias, i.e. they will be shorter than the true probabilities would imply. The subjective win probabilities in this case are now 1-f(qi), and the sum of these minus 1 gives the over-round.
Where there is no discounting of the odds, the over-round (OR) = 0, i.e. n times correct odds minus 1. Assume now that f = ¾, i.e. ¾ of losses are counted by the bettor.
If there is discounting, then the odds will reflect this, and the more runners the bigger will be the over-round.
So in a race with 5 runners, q is 4/5, but fq = 3/4 x 4/5 = 12/20, so subjective win probability = 1-fq = 8/20, not 1/5. So OR = (5 x 8/20) – 1 = 1.
With 6 runners, fq = ¾ x 5/6 = 15/24, so subjective win probability = 1 – fq = 9/24. OR = (6x 9/24) – 1 = (54/24) -1 = 11/4.
With 7 runners, fq = ¾ x 6/7 = 18/28, so subjective win probability = 1-fq = 10/28. OR = (7 x 10/28) – 1 = 42/28 = 11/2
If there is no discounting, then the subjective win probability equals the actual win probability, so an example in a 5-horse is that each has a win probability of 1/5. Here, OR = (5×1/5) – 1 = 0. In a 6-horse race, with no discounting, subjective probability = 1/6. OR = (6 x 1/6) – 1 = 0.
Hence, the over-round is linearly related to the number of runners, assuming that bettors discount a fixed fraction of losses (the ‘Henery Hypothesis’).
If the Henery Hypothesis is correct as a way of explaining the favourite-longshot bias, the bias can be explained as the natural outcome of bettors’ pre-existing perceptions and preferences.
This is quite consistent with a market efficiently processing the information available to it. Moreover, there is little evidence that the market offers opportunities for market players to earn abnormal returns or positive profits. Thus although possibilities clearly exist for earning above-average returns on the basis of weak form information, there is no convincing evidence that this contradicts a wider conceptualisation of this type of information efficiency.
Are there other explanations for the favourite-longshot bias, and the observed link between over-round and runners, which do not rely on the Henery Hypothesis?
One explanation is based on consumer preference for risk. A seminal article by Richard Emeric Quandt in the Quarterly Journal of Economics in 1986 explains the existence of the bias as a natural and necessary consequence of equilibrium in a market characterised by risk-loving bettors with homogeneous beliefs. As such, this idea that bettors are risk-loving runs contrary to conventional explanations of financial behaviour which tend to assume risk-aversion. It is possible however, that bettors should be classified differently to participants in other types of financial market, not least because of consumption benefits from racetrack and other types of betting which may not be replicated elsewhere.
Joe Golec and Maurry Tamarkin (1998, Journal of Political Economy) seek to arbitrate between the hypothesis of risk-loving bettors and a hypothesis that bettors are in fact skewness-lovers, arguing in favour of the latter explanation for the existence of a favourite-longshot bias in betting markets.
William Hurley and Lawrence McDonough (1995, American Economic Review) propose a quite different theoretical model of the favourite-longshot bias, which requires neither a hypothesis of risk-loving nor skewness-loving behaviour. Instead, the bias can arise in a risk-neutral environment, populated by at least some uninformed bettors and unsophisticated bettors, as a consequences of positive transactions and/or information costs. Michael Smith, David Paton and Leighton Vaughan Williams (2006, Economica) compare the size of the bias in person-to-person betting exchanges (characterised by lower margins/transactions costs) and bookmaker markets (higher margins/costs). They find the bias to be lower in the former, a finding which is at least consistent with this explanation.
So far, it should be noted that these are all demand-side explanations.
A major challenge to demand-side explanations of the bias was proposed by Hyun Song Shin (1991, Economic Journal), based on the idea that odds-setters respond to the adverse selection problem posed by insiders (bettors with superior information to bookmakers) by artificially squeezing odds at the longer end of the market. The consequence of this price-setting behaviour is for the betting odds to relatively understate the winning chances of favourites and to overstate the winning chances of longshots. This is the traditional favourite-longshot bias. Another implication of this modelling of odds-setting is that the over-round (the sum of implied probabilities in the odds) will tend to be greater as the number of runners increases, because more runners implies higher odds.
While Shin’s modelling can explain a favourite-longshot bias in betting markets characterised by odds-setters, and also a link between the number of runners and the bookmakers’ over-round, it can be shown (Vaughan Williams and Paton, 1997, Economic Journal) that identical results may result from demand-side explanations. To help arbitrate between these competing hypotheses, Vaughan Williams and Paton employ a large data set to distinguish between two types of race, on the basis of their relative potential for insider trading. It is shown that the correlation between the number of runners and the sum of prices is restricted to those races in which there are clear possibilities for the use of inside information. This lends empirical support to Shin’s supply-side explanation of the phenomenon. Even so, the favourite-longshot bias continues to exist in pari-mutuel markets, in which there are no odds-setters, but instead a pool of all bets which is paid out (minus fixed operator deductions) to winning bets.
To the extent that the favourite-longshot bias cannot be fully explained by the adverse selection problem facing odds-setters (certainly the case in pari-mutuel betting markets), most explanations can be classified as either preference-based or perception-based. Risk love or skewness love are examples of preference-based explanations.
Discounting of losses or other explanations based on a miscalibration of probabilities can be categorized as perception-based explanations. Marco Ottaviani and Peter Sorensen (2009, American Economic Journal), for example, show that information asymmetries between bettors may lead to misperceptions of the true probabilities of horses winning.
Behavioural theories suggest that cognitive errors and misperceptions of probabilities play a role in market mispricing. These theories incorporate laboratory studies by cognitive psychologists which show that people are systematically poor at discerning between small and tiny probabilities, and hence price both similarly. Further, people express a strong preference for certainty over extremely likely outcomes, leading highly probable gambles to be under-priced. These results form an important foundation of Prospect Theory (Daniel Kahneman and Amos Tversky, 1979).
A number of papers seek to arbitrate between preference and perceptions based explanations of the favourite-longshot bias. An example is Erik Snowberg and Justin Wolfers (2010, Journal of Political Economy), who use a novel data set comparing behaviour in simple win pools and more complex compound bets (e.g. exactas, involving identification of first and second place) to seek to discriminate between these explanations. Their results, they argue, are more consistent with misperceptions rather than risk-love. The bias persists in equilibrium because misperceptions are not large enough to generate profit opportunities for unbiased bettors. That said, the cost of the bias is still large, and de-biasing an individual bettor could reduce their costs of betting substantially.
A more recent paper seeks to extend this analysis into the world of online poker. In ‘Towards an understanding of the origins of the favourite-longshot bias: Evidence from online poker markets, a real-money natural laboratory’, first published online in Economica in 2016, Leighton Vaughan Williams and others find a favourite-longshot bias in online poker play, especially in lower stakes games. “We find that misperception rather than risk-love offers the best explanation for the behaviour that we identify.”
In conclusion, the favourite-longshot bias is a well-established market anomaly in sports betting markets, which can be traced in the published academic literature as far back as Richard Griffith (1949, American Journal of Psychology). Explanations can broadly be divided into demand-based and supply-based, preference-based and perceptions-based. A significant amount of modern research has been focused on seeking to arbitrate between these competing explanations of the bias by formulating predictions as to how data derived from these markets would behave if one or other explanation was correct. A compromise position, which may or may not be correct, is that all of these explanations have some merit, the relative merit of each depending on the market context.
How large should a randomly chosen group of people be, to make it more likely than not that at least two of them share a birthday?
For convenience, assume that all dates in the calendar are equally likely as birthdays, and ignore the Leap Year special of February 29th
The first thing to look at is the likelihood that two randomly chosen people would share the same birthday.
Let’s call them Fred and Felicity. Say Felicity’s birthday is May 1st. What is the chance that Fred shares this birthday with Felicity? Well there are 365 days in the year, and only one of these is May 1st and we are assuming that all dates in the calendar are equally likely as birthdays.
So, the probability that Fred’s birthday is May 1st is 1/365, and the chance he shares a birthday with Felicity is 1/365.
So what is the probability that Fred’s birthday is not May 1st? It is 364/365. This is the probability that Fred doesn’t share a birthday with Felicity.
More generally, for any randomly chosen group of two people, the probability that the second person has a different birthday to the first is 364/365.
With 3 people, the chance that all three are different is the chance that the first two are different (364/365) multiplied by the chance that the third birthday is different (363/365).
So, the probability that 3 people have different birthdays = 364/365 x 363/365
This can be written as (364)2 / 3652
Similarly, probability that 5 people have different birthdays = (364)4 / 3654
= 364x363x362x361/3654
So far, the chance of no matches is very high. But by the tenth person the probability of no matches is:
(364/365)*(363/365)(362/365)*(361/365)(360/365)*(359/365)(358/365)*(357/365) (356/365) = 0.8831
More generally, for n people, probability they all have different birthdays =
(364)n-1 / 365n-1
For 23 people, probability of all different birthdays = (364)22 / 3652 = 0.4927
For 22 people, probability of all different birthdays = (364)21 / 3652 = 0.5243
So, in a group of 23 people, there is a (1-0.4927) = 0.5073 chance of that at least two of the group share a birthday.
So how large should a randomly chosen group of people be, to make it more likely than not that at least two of them share a birthday? The answer is 23.
The intuition behind this is quite straightforward if we recognise just how many pairs of people there are in a group of 23 people, any pair of which could share a birthday.
In a group of 23 people, there are, according to the standard formula, 23C2 pairs of people (called 23 Choose 2) pairs of people.
Generally, the number of ways k things can be chosen from n is:
n C k = n! / (n-k)! k!
Thus, 23C2 = 23! / 21! 2! = 23 x 22 / 2 = 253
So, in a group of 23 people, there are 253 pairs of people to choose from.
Therefore, a group of 23 people generates 253 chances, each of size 1/365, of having at least two people in the group sharing the same birthday.
These chances have some overlap: if A and B have a common birthday, and A and C have a common birthday, then inevitably so do B and C. So the probability of at least two people sharing a birthday in a group of 23 is less than 253/365 (69.3%). It is, as shown previously, 50.73%.
To conclude, the next time you see two football teams line up, include the referee. It is now more likely than not that two of those on the pitch share the same birthday. Strange, but true!
Appendix
Using experiments, events and sample spaces to solve the Birthday Problem.
Another way to look at the Birthday problem is by use of experiments and sample spaces. A sample space lists the possible outcomes of an experiment.
Take a coin-tossing experiment. In this case, a coin is tossed and it can land heads or tails.
Experiment: Toss a coin. Sample space = Heads; Tails.
Experiment: Toss a coin until it get Heads. Identify the number of tosses needed. Sample space = 1; 2; 3; 4; 5.
Experiment: Measure the time between two successive lightning strikes. Sample space = the set of positive numbers.
In many common examples, each outcome in the sample space is assigned an equal probability. An example is tossing a coin twice.
Here, the sample space = HH, HT, TH, TT.
Assign an equal probability to each of these outcomes. So, probability of each outcome = 1/4.
An ‘event’ is the name for a collection of outcomes.
The probability of an event = number of outcomes in the event / number of outcomes in the sample space.
Event of zero heads (TT) has probability = 1/4
Event of exactly one heads (HT, TH) has probability = 2/4 = 1/2
Event of two heads (HH) has probability = 1/4
Examples from dice (plural); die (singular). Sample space from one die = 1, 2, 3, 4, 5, 6.
Possible events:
a. Outcome is number 5
b. Outcome is an even number.
c. Outcome is even but is less than 6.
In a., probability = 1/6
In b., probability = 3/6
In c., probability = 2/6
Now, apply these concepts to the Birthday Problem.
Suppose that a room contains four people. What is the probability that at least two of these people share the same birthday?
The easiest way to solve this is to count the complementary event that none of the four share the same birthday and find that probability. We can then subtract this probability from 1 to establish the probability that at least two of the four share a birthday.
Size of the sample space = 365 x 365 x 365 x 365
Size of event that none of the four share the same birthday = 365 x 364 x 363 x 362
Probability that none of the four people share the same birthday =
365 x 364 x 363 x 362 / 365 x 365 x 365 x 365 = 0.984
Probability that at least two of them share the same birthday = 1 – 0.984 = 0.016
Similarly, it can be calculated that the probability of at least two sharing a birthday increases as n, the number in the room, increases, as below:
n = 16; probability = 0.284
n= 23; probability = 0.507
n = 32; probability = 0.753
n = 40; probability = 0.891
n= 56; probability = 0.988
n = 100; probability = 0.9999997
So, the probability that two share a birthday exceeds 0.5 in a room of 23 or more people.
Let’s suppose Bill and Ben each toss separate coins. Let A represent the variable “Bill’s coin toss outcome”, and B represent the variable “Ben’s coin toss outcome”. Both A and B have two possible values (Heads and Tails). It would be uncontroversial to assume that A and B are independent. Evidence about B will not change our belief in A. In other words, the fact that Ben’s coin lands heads does not affect the likelihood that Bill will throw heads. What happens to Bill’s coin and Ben’s coin are unrelated. They are independent.
Now suppose both Bill and Ben toss the same coin. Again let A represent the variable “Bill’s coin toss outcome”, and B represent the variable “Ben’s coin toss outcome”. Assume also that there is a possibility that the coin is biased towards heads but we do not know this for certain. In this case A and B are not independent. Observing that Ben’s coin has landed heads might cause us to increase our belief that Bill will throw a Heads.
In the second example, the variables A and B are both dependent on a separate variable C, “the coin is biased towards Heads” (which has the values True or False). Although in this case A and B are not independent, it turns out that once we know for certain the value of C then any evidence about B cannot change our belief about A.
In such a case we say that A and B are conditionally independent given C.
In many real life situations variables which are believed to be independent are actually only independent conditional on some other variable. Let’s take an example. Suppose that Ted and Ned live on opposite sides of the city and come to work by completely different means. Let’s say Ted arrives by train while Ned drives to work. Let A represent the variable “Ted late” (which has values true or false) and similarly let B represent the variable “Ned late”. At first glance, it might seem that A and B are independent. However, even if Ted and Ned lived and worked in different countries there may be factors (such as an international fuel shortage) which could affect both Ted and Ned. In that case, A and B are not independent. Again, it doesn’t seem reasonable to exclude the possibility that both Ted and Ned may be affected by a rail strike (C). Clearly the likelihood that Ted will arrive late to work will increase if the rail strike takes place; but the likelihood that Ned will arrive late to work might also increase, indirectly, because of the additional traffic on the roads caused by the rail strike. ‘Ted to be late’ and ‘Ned to be late’ are in this case conditionally independent GIVEN the rail strike.
Two events, A and B, are defined to be conditionally independent, given some other event, C, if the probability of both A occurring and B occurring, given some other event, C, is equal to the probability of A occurring given C multiplied by the probability of B occurring given C, i.e.
The notation used for this is: P(AՈB I C) = P(AIC) . P(BIC)
In the example we have just considered, the probability that Ted and Ned are late to work given the train strike equals the probability that Ted is late given the strike multiplied by the probability that Ned is late given the strike.
This takes us to a new question.
Does conditional independence, given C, imply unconditional independence?
Say, for example, Jack is playing Jill at snooker. Jack and Jill know nothing about each other’s ability at snooker.
Now suppose Jill wins her first 5 games. This provides evidence for her to assess the strength of her opponent, Jack, and vice-versa.
But the games may be conditionally independent (Jill is equally likely to win the fifth game as the second given Jack and Jill’s relative skill at chess).
Even so, they are not independent (that would mean that winning the first five games tells you nothing about the likelihood of winning the sixth).
So the answer to the latest question is No. Conditional independence does not imply unconditional independence.
Finally, does unconditional independence imply conditional independence?
To answer this, let’s imagine an event with multiple causes.
Let A be the event that the fire alarm goes off.
Now suppose this could be caused by a genuine fire (F) or someone making popcorn (P), which sets off a false alarm.
Now let’s suppose that the probability of a fire is completely independent of the probability of someone making popcorn. But also that the probability the alarm is indicating a real fire is 100 per cent if nobody is making popcorn.
So the probability of a fire and the probability of making popcorn are independent of each other, yet the probability it’s a genuine fire if the alarm goes off is conditionally dependent on whether someone is making popcorn (you can be sure it’s a genuine fire if nobody is making popcorn).
So, does unconditional independence imply conditional independence? The answer is No.
So, in summary, events may be independent or they may be conditionally independent. Conditional independence does not, however, imply unconditional independence, and unconditional independence does not imply conditional independence.
Further Reading and Links
One of the most celebrated pieces of correspondence in the history of probability and gambling, and one of which I am particularly fond, involves an exchange of letters between the greatest diarist of all time, Samuel Pepys, and the greatest scientist of all time, Sir Isaac Newton.
The six letters exchanged between Pepys in London and Newton in Cambridge related to a problem posed to Newton by Pepys about gambling odds. The interchange took place between November 22 and December 23, 1693. The ostensible reason for Mr. Pepys’ interest was to encourage the thirst for truth of his young friend, Mr. Smith. Whether Sir Isaac believed that tale or not we shall never know. The real reason, however, was later revealed in a letter written to a confidante by Pepys indicating that he himself was about to stake 10 pounds, a considerable sum in 1693, on such a bet. Now we’re talking!
The first letter to Newton introduced Mr. Smith as a fellow with a “general reputation…in this towne (inferiour to none, but superiour to most) for his maistery [of]…Arithmetick”.
What emerged has come down to us as the aptly named Newton-Pepys problem.
Essentially, the question came down to this:
Which of the following three propositions has the greatest chance of success.
A. Six fair dice are tossed independently and at least one ‘6’ appears
B. 12 fair dice are tossed independently and at least two ‘6’s appear.
C. 18 fair dice are tossed independently and at least three ‘6’s appear.
Pepys was convinced that C. had the highest probability and asked Newton to confirm this.
Newton chose A as the highest probability, then B, then C, and produced his calculations for Pepys, who wouldn’t accept them.
So who was right? Newton or Pepys?
Well, let’s see.
The first problem is the easiest to solve.
What is the probability of A?
Probability that one toss of a coin produces a ‘6’ = 1/6
So probability that one toss of a coin does not produce a ‘6’ = 5/6
So probability that six independent tosses of a coin produces no ‘6’ = (5/6)6
So probability of AT LEAST one ‘6’ in 6 tosses = 1 – (5/6)6 = 0.6651
So far, so good.
The probability of problem B and probability of problem C are more difficult to calculate and involve use of the binomial distribution, though Newton derived the answers from first principles, by his method of ‘Progressions’.
Both methods give the same answer, but using the more modern binomial distribution is easier.
So let’s do it, along the way by introducing the idea of so-called ‘Bernoulli trials’.
The nice thing about a Bernoulli trial is that it has only two possible outcomes.
Each outcome can be framed as a ‘yes’ or ‘no’ question (success or failure).
Let probability of success = p.
Let probability of failure = 1-p.
Each trial is independent of the others and the probability of the two outcomes remains constant for every trial.
An example is tossing a coin. Will it lands heads?
Another example is rolling a die. Will it come up ‘6’?
Yes = success (S); No = failure (F).
Let probability of success, P (S) = p; probability of failure, P (F) = 1-p.
So the question: How many Bernoulli trials are needed to get to the first success?
This is straightforward, as the only way to need exactly five trials, for example, is to begin with four failures, i.e. FFFFS.
Probability of this = (1-p) (1-p) (1-p) (1-p) p = (1-p)4 p
Similarly, the only way to need exactly six trials is to begin with five failures, i.e. FFFFFS.
Probability of this = (1-p) (1-p) (1-p) (1-p) (1-p) p = (1-p)5 p
More generally, the probability that success starts on trial number n =
(1-p)n-1 p
This is a geometric distribution. This distribution deals with the number of trials required for a single success.
But what is the chance that the first success takes AT LEAST some number of trials, say 12 trials?
One method is to add the probability of 12 trials to prob. of 13 trials to prob. of 14 trials to prob. of 15 trials, etc. …………………………
Easier method: The only time you will need at least 12 trials is when the first 11 trials are all failures, i.e. (1-p)11
In a sequence of Bernoulli trials, the probability that the first success takes at least n trials is (1-p)n-1
Let’s take a couple of examples.
Probability that the first success (heads on coin toss) takes at least three trials (tosses of the coin)= (1-0.5)2 = 0.25
Probability that the first success (heads on coin toss) takes at least four trials (tosses of the coin)= (1-0.5)3 = 0.125
But so far we have only learned how to calculate the probability of one success in so many trials.
What if we want to know the probability of two, or three, or however many successes?
To take an example, what is the probability of exactly two ‘6’s in five throws of the die?
To determine this, we need to calculate the number of ways two ‘6’s can occur in five throws of the die, and multiply that by the probability of each of these ways occurring.
So, probability = number of ways something can occur multiplied by probability of each way occurring.
How many ways can we throw two ‘6’s in five throws of the die?
Where S = Success in throwing a ‘6’, F = Fail in throwing a ‘6’, we have:
SSFFF; SFSFF; SFFSF; SFFFS; FSSFF; FSFSF; FSFFS; FFSSF; FFSFS; FFFSS
So there are 10 ways of throwing two ‘6’s in five throws of the dice.
More formally, we are seeking to calculate how many ways 2 things can be chosen from 5. This is known as ‘5 Choose 2’, written as:
5 C 2= 10
More generally, the number of ways k things can be chosen from n is:
nC k = n! / (n-k)! k!
n! (known as n factorial) = n (n-1) (n-2) … 1
k! (known as k factorial) = k (k-1) (k-2) … 1
Thus, 5C 2 = 5! / 3! 2! = 5x4x3x2x1 / (3x2x1x2x1) = 5×4/(2×1) = 20/2=10
So what is the probability of throwing exactly two ‘6’s in five throws of the die, in each of these ten cases? p is the probability of success. 1-p is the probability of failure.
In each case, the probability = p.p.(1-p).(1-p).(1-p)
= p2 (1-p)3
Since there are 5 C 2 such sequences, the probability of exactly 2 ‘6’s =
10 p2 (1-p)3
Generally, in a fixed sequence of n Bernoulli trials, the probability of exactly r successes is:
nC r x pr (1-p) n-r
This is the binomial distribution. Note that it requires that the probability of success on each trial be constant. It also requires only two possible outcomes.
So, for example, what is the chance of exactly 3 heads when a fair coin is tossed 5 times?
5C 3 x (1/2)3 x (1/2)2 = 10/32 = 5/16
And what is the chance of exactly 2 sixes when a fair die is rolled five times?
5 C 2x (1/6)2 x (5/6)3 = 10 x 1/36 x 125/216 = 1250/7776 = 0.1608
So let’s now use the binomial distribution to solve the Newton-Pepys problem.
- What is the probability of obtaining at least one six with 6 dice?
- What is the probability of obtaining at least two sixes with 12 dice?
- What is the probability of obtaining at least three sizes with 18 dice?
First, what is the probability of no sixes with 6 dice?
P (no sixes with six dice) = n C x . (1/6)x . (5/6)n-x, x = 0,1,2,…,n
Where x is the number of successes.
So, probability of no successes (no sixes) with 6 dice =
n!/(n-k)!k! = 6!/(6-0)!0! x (1/6)0 . (5/6)6-0 = 6!/6! X 1 x 1 x (5/6)6 = (5/6)6
Note that: 0! = 1
Here’s the proof: n! = n. (n-1)!
At n=1, 1! = 1. (1-1)!
So 1 = 0!
So, where x is the number of sixes, probability of at least one six is equal to ‘1’ minus the probability of no sixes, which can be written as:
P (x≥ 1) = 1 – P(x=0) = 1 – (5/6)6 = 0.665 (to three decimal places).
i.e. probability of at least one six = 1 minus the probability of no sixes.
That is a formal solution to Part 1 of the Newton-Pepys Problem.
Now on to Part 2.
Probability of at least two sixes with 12 dice is equal to ‘1’ minus the probability of no sixes minus the probability of exactly one six.
This can be written as:
P (x≥2) = 1 – P(x=0) – P(x=1)
P(x=0) in 12 throws of the dice = (5/6)12
P (x=1) in 12 throws of the dice = 12 C 1 . (1/6)1 . (5/6)11nC k = n! / (n-k)! k!
So 12 C 1
= 12! / (12-1)! 1! = 12! / 11! 1! = 12
So, P (x≥2) = 1 – (5/6)12 – 12. (1/6) . (5/6)11
= 1 – 0.112156654 – 2 . (0.134587985) = 0.887843346 – 0.26917597 =
= 0.618667376 = 0.619 (to 3 decimal places)
This is a formal solution to Part 2 of the Newton-Pepys Problem.
Now on to Part 3.
Probability of at least three sixes with 18 dice is equal to ‘1’ minus the probability of no sixes minus the probability of exactly one six minus the probability of at exactly two sixes.
This can be written as:
P (x≥3) = 1 – P(x=0) – P(x=1) – P(x=2)
P(x=0) in 18 throws of the dice = (5/6)18
P (x=1) in 18 throws of the dice = 18 C 1 . (1/6)1 . (5/6)17
nC k = n! / (n-k)! k!
So 18 C 1
= 18! / (18-1)! 1! = 18
So P (x=1) = 18. (1/6)1 . (5/6)17
P (x=2) = 18 C 2 . (1/6)2 .(5/6)16
18 C 2
= 18! / (18-2)! 2! = 18!/16! 2! = 18. (17/2)
So P (x=2) = 18. (17/2) (1/6)2 (5/6)16
So P(x=3) = 1 – P (x=0) – (P(x=1) – P (x=2)
P (x=0) = (5/6)18
= 0.0375610365
P (x=1) = 18. 1/6. (0.0450732438) = 0.135219731
P (x=2) = 18. (17/2) (1/36) (0.0540878926) = 0.229873544
So P(x=3) = 1 – 0.0375610365 – 0.135219731 – 0.229873544 =
P(x≥3) = 0.597345689 = 0.597 (to 3 decimal places, )
This is a formal solution to Part 3 of the Newton-Pepys Problem.
So, to re-state the Newton-Pepys problem.
Which of the following three propositions has the greatest chance of success?
A. Six fair dice are tossed independently and at least one ‘6’ appears.
B. 12 fair dice are tossed independently and at least two ‘6’s appear.
C. 18 fair dice are tossed independently and at least three ‘6’s appear.
Pepys was convinced that C. had the highest probability and asked Newton to confirm this.
Newton chose A, then B, then C, and produced his calculations for Pepys, who wouldn’t accept them.
So who was right? Newton or Pepys?
According to our calculations, what is the probability of A? 0.665
What is the probability of B? 0.619
What is the probability of C? 0.597
So Sir Isaac’s solution was right. Samuel Pepys was wrong, a wrong compounded by refusing to accept Newton’s solution. How much he lost gambling on his misjudgement is mired in the mists of history. The Newton-Pepys Problem is not, and continues to tease our brains to this very day.
Further Reading and Links
http://datagenetics.com/blog/february12014/index.html
Zeno of Elea was a Greek philosopher of the 5th century BC, best known for his paradoxes of motion, described by Aristotle in his ‘Physics’. Of these perhaps the best known is his paradox of the tortoise and Achilles, in its various forms. In a modern version, the antelope starts 100 metres ahead of the cheetah and moves at half the speed of the cheetah. Will the cheetah ever catch the antelope, assuming they don’t slow down?
Zeno’s paradox relies on the fact that when the cheetah reaches the starting position of the antelope, the antelope will have travelled 50 metres further. When the cheetah arrives at that point, the antelope will have travelled a further 25 metres, and so on. Zeno argued that this was an infinite process, and so does not have a final, finite step. So how can the cheetah ever catch the antelope?
There is a mathematical solution to the paradox, which goes like this:
Let S be the distance the cheetah runs and let 1 = 100 metres.
So S = 1 + ½ + ¼ + 1/8 + 1/16 + 1/32 …..
½ S = ½ + ¼ + 1/8 + 1/16 + 1/32 …..
Therefore, S – ½ S = 1
Therefore, S = 2
So the cheetah catches the antelope in 200 metres.
So an infinite process, with no final step, has a finite conclusion.
That’s the mathematical solution, but does that solve the intuitive paradox? How can an infinite process, with no final step, come to an end? I understand the mathematical solution, but somehow it is as unsatisfying as the wrapper of a chocolate bar. To me, the real chocolate remains untouched. Such paradoxes I refer to as ‘chocolate paradoxes.’ What they have in common is that they can be solved mathematically without really being solved at all.
For those who might differ with me, the Thomson’s Lamp thought experiment offers a related challenge. Devised by philosopher James F. Thomson in 1954, it goes like this. Think of a lamp with a switch. You flick the switch to turn the light on. At the end of one minute exactly you flick it off. At the end of a further half minute, you turn it on again. At the end of a further quarter minute you turn it off. And so on. The time between each turning on and off the lamp is always half the duration of the time before. Assume you have the superpower to do each turning on and turning off instantaneously.
Adding these up gives: 1 minute plus half a minute plus a quarter of a minute ….
1 + ½ + ¼ + 1/8 + 1/16 + 1/32 + … = 2.
In other words, all of these infinitely many time intervals add up to exactly two minutes.
So here’s the question. At the end of two minutes, is the lamp on or off?
And here’s a second question. Say the lamp starts out being off and you turn it on after one minute, then off after a further half minute and so on. Does this make any difference to your answer?
Thomson claimed there was no solution, and that the problem led to a contradiction.
“It seems impossible to answer this question. It cannot be on, because I did not ever turn it on without at once turning it off. It cannot be off, because I did in the first place turn it on, and thereafter I never turned it off without at once turning it on. But the lamp must be either on or off. This is a contradiction.”
While considering the relationship between the infinite and the finite, consider in conclusion the following.
Can a number of infinite length be represented by a line of finite length? Solution below.
Spoiler Alert (Solution)
The square root of 2 is an irrational number, with no finite solution. In other words, it goes on for ever. 1.4142135623730950488……………………….. for ever…..
So can a line with a finite length exactly equal to this infinitely long number be drawn?
Draw a right-angled triangle, of vertical length (a) and horizontal length (b) equal to 1.

Then, the length of the hypoteneuse of the triangle, c, can be derived from the length of the adjacent (a) and opposite (b) sides, using Pythagoras’ Theorem.
a2 + b2 = c2
So, 12 + 12 = c2
So c2 = 2
c = √2
This is a line of finite length, representing a number of infinite length. So the answer to the question is yes. Strange? Indeed. Another of those tantalising ‘chocolate paradoxes.’
Further reading and links
http://numberphile.com/videos/zeno_paradox.html
Thomson, James, F. ‘Tasks and Super-Tasks’, Analysis, 15 (1), 1-13.
The famed correspondence between two titans of 17th century French intellectual thought, Blaise Pascal (Pascal’s Wager) and Pierre Fermat (Fermat’s Last Theorem) was to mark the foundation of modern probability theory. But it was sparked off by a question posed to Pascal by legendary French gambler of the time, Antoine Gombaud, better known as the Chevalier de Mere.
The question related to a new dice game the Chevalier had invented. According to the rules of the game, he asked for even money odds that a pair of dice, when rolled 24 times, will come up with a double-6 at least once. His reasoning seemed impeccable. If the chance of a 6 on one roll of the die = 1/6, then the chance of a double-6 when two dice are thrown = 1/6 x 1/6 (as they are independent events) = 1/36.
So, he reasoned, the chance of at least one double-6 in 24 throws is: 24/36 = 2/3. So this should be a profitable game for the Chevalier. When it didn’t turn out that way, he asked the great philosopher and mathematician, Blaise Pascal to look into it, as you do.
Pascal derived the correct probabilities as follows:
Probability of a double-6 in one throw of a pair of dice = 1/6 x 1/6 = 1/36.
So probability of NO double-6 in one throw of a pair of dice = 35/36.
So, probability of no double-6 in 24 throws of a pair of dice = 35/36 x 35/36 … 24 times = 35/36 to the power of 24, i.e. (35/36)24 = 0.5086.
So, probability of at least one double-6 = 1 – 0.5086 = 0.4914
So the Chevalier was betting at even money on a game which he lost (albeit marginally) more often than he won, which is why he was losing over time.
What if he changed the game to give himself 25 throws?
Now, the probability of throwing at least one double-6 in 25 throws of a pair of dice is:
1 – (35/36)25 = 0.5055.
These odds, at even money, are in favour of the Chevalier, but this probability is still lower than the probability of obtaining one ‘6’ in four throws of a single die.
In the single-die game, the Chevalier has a house edge of 51.77% – 48.23% = 3.54%.
In the ‘pair of dice’ game (24 throws), the Chevalier’s edge =
49.14% – 50.81% = -1.72%
In the ‘pair of dice’ game (25 throws), the Chevalier’s edge =
50.55% – 49.45% = 1.1%
A better game for the Chevalier would have been to offer even money that he could get at least one run of ten heads in a row in 1024 tosses of a coin. The derivation of this probability is similar in method to the dice problem.
First, we need to determine the probability of 10 heads in 10 tosses of a fair coin.
The odds are: ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½
Odds = (1/2)10 = 1/1024, i.e. 1023/1.
Based on this, what is the probability of at least one run of 10 heads in 1024 tosses of the coin? Is it 0.5? No, because although you can expect ONE run of 10 heads on average, you could obtain zero, 2, 3, 4, etc.
So what is the probability of NO RUN of 10 heads in 1024 tosses of the coin?
This is: (1-1/1024)1024
The probability of NO RUNS OF TEN HEADS = (1023/1024)1024 = 37%
So probability of AT LEAST one run of 10 heads = 63%.
Now assume you have tossed the coin already 234 times out of 1024, without a run of 10 heads, what is your chance now of getting 10 heads?
Probability of NO RUNS OF TEN HEADS in remaining 790 tosses = (1023/1024)790 = 46%
So probability of at least one success = 54%.
The Chevalier could have played either of these games and expected to come out ahead. But the game would have taken a long time. He preferred the shorter game, which produced the longer loss.
Until he was put right by Monsieur Pascal.
Most importantly, though, the Chevalier’s question led to a correspondence, most of which has survived, which led to the foundations of modern probability theory.
I will examine just one of the conclusions of this correspondence today, and it relates to the infamous ‘Gambler’s Ruin’ problem.
This is an idea set in the form of a problem by Pascal for Fermat, subsequently published by Christiaan Huygens (‘On reasoning in games of chance’, 1657) and formally solved by Jacobus Bernoulli (‘Ars Conjectandi’, 1713).
One way of stating the problem is as follows. If you play any gambling game long enough, will you eventually go bankrupt, even if the odds are in your favour, if your opponent has unlimited funds?
Example: You and your opponent toss a coin, where the loser pays the winner £1. The game continues until either you or your opponent has all the money. Suppose you have £10 to start and your opponent has £20. What are the probabilities that a) you and b) your opponent, will end up with all the money?
The answer is that the player who starts with more money has more chance of ending up with all of it. The formula is:
P1 = n1 / (n1 + n2)
P2 = n2 / (n1 + n2)
Where n1 is the amount of money that player 1 starts with, and n2 is the amount of money that player 2 starts with, and P1 and P2 are the probabilities that player 1 or player 2, your opponent, wins.
In this case, you start with £10 of the £30 total, and so have a 10/(10+20) = 10/30 = 1/3 chance of winning the £30; your opponent has a 2/3 chance of winning the £30. But even if you do win this game, and you play the game again and again, against different opponents, or the same one who has borrowed more money, eventually you will lose your entire bankroll. This is true even if the odds are in your favour. Eventually you will meet a long-enough bad streak to bankrupt you.
In other words, infinite capital will overcome any finite odds against it. This is one version of the ‘Gambler’s Ruin’ problem, and many gamblers over the years have been ruined because of their unawareness of it.
So how can we avoid falling victim to the problem of ‘Gambler’s Ruin?’ Formally, we might turn to the Kelly formula, more of which I shall examine elsewhere. Informally, though, I shall reduce it to two simple bits of advice.
‘Never bet more than you can afford to lose’.
‘When the Fun Stops, Stop!’
Now that’s a start.
Further Reading and Links
Letters between Fermat and Pascal on Probability: https://www.york.ac.uk/depts/maths/histstat/pascal.pdf
