The history of forecasting election outcomes for betting purposes is well-documented for open elections, such as presidential elections in the US, and for longer, though in less detail for the closed elections of the Pope.
In the former, it has been traced, according to contemporaries, to the election of George Washington and has existed in organized markets since the 1860s.
The first recorded example of betting on a papal election, however, can be traced much further back, to the papal conclave of September, 1503, at which time it was considered already an old practice.
The brokers in the Roman banking houses (sensali) who made books and offered odds on who would be elected, made Cardinal Francesco Piccolomini the 100/30 favourite, ahead of Cardinals Guiliano della Rovere (100/15) and Georges d’Amboise (the favourite if judged by the vocal support of the street crowds) at 100/13.
Although Piccolomini is thought to have trailed in the first round of voting with 4 votes to 13 for d’Amboise and 15 for della Rovere, Piccolomini apparently benefited from a switch of votes from d’Amboise to himself in subsequent voting, and duly became Pope Pius III.
The bookmakers were proved right.
The next conclave for which we have the betting odds is that of December, 1521, in which odds were offered on no fewer than twenty cardinals.
Giulio de’Medici, the cousin of Leo X, was the betting favourite, at 100 to 25 (4/1), followed closely by Cardinal Alessandro Farnese at 100/20 (5/1), whose odds shortened to 100 to 40 (5/2) after a Roman mob plundered his house.
Though Farnese at one point came close to being elected Pope, he could not reach the required two-thirds of the vote, and ultimately the cardinals looked outside of the conclave, electing Adrian of Utrecht as Pope Adrian VI.
In the papal election of 1549-50, Cardinal Gianmaria del Monte (who was eventually elected Julius III) had opened in the betting as the 5/1 (against) favourite, but within three days Cardinal Reginald Pole had been established at odds of 4/1. On December 5, as balloting began, Pole was clear favourite at 100/95.
On that day, he received 26 of the 28 votes that would have given him the two-thirds majority required to elect him Pontiff. Although on the point of being made Pope by acclamation, Pole insisted on waiting until he won the formal two-thirds majority.
By the time that four additional French cardinals, opposed to Pole, arrived December 11, however, he was trading at 5/2, and a month later he was being offered at odds of 100/16. His chance had gone.
In the papal conclave of April, 1555, Gian Pietro Carafa stood a good chance of being elected pope, ranking among the top three papabile in the first ballot of the conclave. It is reported that brokers intentionally “spread the rumour that Naples [i.e. Carafa] had died”, in order to attract money on the other candidates. Carafa went on to be elected Pope.
The first 1590 conclave, in September, is the earliest in which reports of insider trading emerged, when two of the key influencers of votes in the conclave, Cardinals Montalto and Sforza secretly agreed to join forces in support of Niccolo Sfondrato.
It is reported that both made fortunes betting on him, at odds of 10/1 the day before he was elected as Pope Urban VII.
As the conclave opened, he had been trading at 100/11, compared to Giambattista Castagna, who was offered at 100/22.
During the second conclave of 1590, Cardinal Gabriele Paleotti at one point increased to an implied probability of 70 per cent in the betting: “Wednesday at the twenty-second hour rumour began to hold Paleotti as pope, and it went on increasing so that at the end of the morning, he had risen to 70 in the wagering.” The odds were not reflected in the outcome. Giovanni Battista Castagna was elected Pope Urban VII.
In 1603, despite a papal bull ‘Cogit Nos’, by Pope Gregory XIV, issued on March 21, 1591, which imposed a penalty of excommunication for wagering on papal or cardinal elections, or length of the papal reign, 21 cardinals were quoted odds of winning by the bookmakers.
The favourite was Cesare Baronius, at 10/1. The closest he came to election, however, was gaining the support of 32 cardinal electors, nine short of the required tally. Ultimately, Alessandro de’Medici became Pope Leo XI.
This ban on papal betting was abrogated in 1918 by Pope Benedict XV’s reforms.
In relation to the papal conclave of 1878, a New York Times correspondent wrote that: “The death and advents of the Popes has always given rise to an excessive amount of gambling in the lottery, and today the people of Italy are in a state of excitement that is indescribable.” There is no available known record, however, of the odds offered on that election. Similarly, the papal conclaves of 1903 and 1922 also attracted a great deal of wagering interest, which was reported widely in the international press, though no known record remains of the odds offered.
Bookmaker odds in Milan are available, however, for the 1958 conclave, which show Cardinal Angelo Roncalli the 2/1 favourite, followed by Cardinals Agagianian and Ottaviani at 3/1, then Stefan Wyszynski and Giuseppe Siri at 4/ 1. The odds were justified when Cardinal Roncalli was elected Pope John XXIII.
For the first conclave of 1978, bookmakers in London were offering odds of 5/2 about Cardinal Sergio Pignedoli, 7/2 about Sebastian Baggio and Ugo Poletti and 4/1 about Carlo Benelli. The best odds about a non-Italian were 8/1 about Johannes Willebrands. Of these only Pignedoli showed any strength in the voting, unconfirmed reports of the voting indicating that he obtained about 18 votes in the first ballot, compared to about 23 for Albino Luciani and 25 for Giuseppe Siri. Ultimately, Cardinal Luciani was elected Pope John Paul I.
For the second conclave of 1978, following the death of Pope John Paul I, the Associated Press noted that:
“Once again, there is no odds-on favourite to be elected as the new pope of the Roman Catholic Church … mentioned most often are Corradi Ursi, Salvatore Pappalardo, Ugo Poletti, Giuseppe Siri, Giovanni Colombo, Giovanni Benelli and Antonio Poma… Non-Italian front-runners include Argentinian Eduardo Pironio, 57, and Dutchman Johannes Willebrands, 68.”
Cardinal Carol Wojtyla, archbishop of Krakow, was elected Pope John Paul II, after the eighth ballot.
In 2005, Cardinal Joseph Ratzinger opened in the betting at 12/1 with one major bookmaker.
At that point, another leading bookmaker made Cardinal Arinze favourite, with Archbishop Tettamanzi, Cardinal Ratzinger and Cardinal Hummes as the next in the betting.
After three ballots, Ratzinger was favourite on two out of the three online betting boards monitored by CNN, his shortest odds being 5/2. He was at that point in the conclave being offered at between 9/2 favourite and 11/2 second favourite.
By the last day of the conclave, Cardinal Ratzinger had shortened to a clear 3/1 favourite, closely followed by Carlo Martini at 100/30 and Jean-Marie Lustiger at 7/2.
By that point, Francis Arinze had dropped back to 8/1, the same price as Claudio Hummes (who was now in the top six in all three lists). He had opened at 12/1. At the same time, Jorge Bergoglio was trading at 12/1 and Angelo Scola at 25/1.
According to a newspaper report, “among those speculating about who the next pope will be, the big money – literally is on Joseph Ratzinger, who delivered a stirring homily at the late Pope’s funeral … As of yesterday, most gambling sites gave Ratzinger … the best odds, with a host of second-tier candidates not far behind.”
Side bets were available on the name of the next pope.
Benedict was the 3 to 1 favourite. John Paul was offered at 7 to 2. Pius at 6 to 1. Peter at 8 to 1. John at 10 to 1.
Joseph Ratzinger was elected Benedict XVI.
The first show of odds following the 2005 conclave for the successor to Benedict was: Angelo Scola 6-1; Christoph Schonborn 7-1; Oscar Maradiga 7-1; Jorge Bergoglio 9-1; Francis Arinze 10-1; Dionigi Tettamanzi 25-1.
In 2013, a survey of the so-called experts made Angelo Scola favourite, although the expert assessment and the betting odds diverged to some degree after that. A survey of Vatican watchers by YouTrend.It listed Timothy Dolan of the United States as the second most likely pope, followed by Cardinals Marc Ouellet, Odilo Scherer and Thomas O’Malley. Luis Tagle of the Phillipines was sixth was ranked sixth. Some of the bookmakers’ favourites, notably Cardinals Turkson and Bertone, did not appear on this experts’ list.
The implied win probabilities in the Oddschecker display of best bookmaker odds on March 3^{rd} were as follows: Scola, 23%; Turkson, 22%; Bertone, 16%; Ouellet, 12%; Bagnasco, 10%; Ravasi, 8%; Sandri, 7%; Erdo, 7%; Scherer, 6%; Schonborn, 6%; Maradiaga, 5%; Arinze, 5%; O’Malley, 4%; Tagle, 4%; Bergoglio, 4%; Dolan, 3%; Hummes, 3%; Grocholewski, 3%; Dziwisz, 3%; Carrera, 2%; Piacenza, 2%; Marini, 2%; Rylko, 2%; Sarah, 2%; Martino 2%. Note that the probabilities add up to more than 100 due to rounding and the in-built margin in the bookmakers’ odds.
A Washington Post analysis, published on March 11th, calculated the implied probabilities of the ‘frontrunners’ based on betting sites including the betting exchange, Betfair.
The results were: Scola, 19.9%; Scherer, 11.9%, Turkson, 9.7%; Bertone, 8.3%; Ouellet, 5%; Erdo, 4.9%; O’Malley, 3.8%; Schonborn, 3.7%; Ravasi, 3.4%; Tagle, 2.6%; Sandri, 2.5%; Dolan, 2.3%; Bagnasco, 2.3%.
On the morning of the final ballot, on March 13th, 2013, the Guardian newspaper Liveblog reported that: “Ladbrokes has Scola at 9/4, Scherer at 3/1 and Turkson at 6/1. Paddy Power has Scola at 11/4, Scherer at 7/2 and Turkson at 9/2.”
A post by Vatican Insider journalist Andrea Tornielli was also published ahead of the final ballot, stating that “The first casting of ballots, which will serve as a primary, will see votes merge towards the Archbishop of Milan, Angelo Scola, as well as the Canadian Marc Ouellet and the Brazilian Odilo Pedro Scherer. Some votes might also go to the Argentinian Jorge Mario Bergoglio and to other cardinals mentioned during the past few hours, such as the Sinhalese Malcolm Ranjith, the American Timothy Dolan and others. It remains to be seen if, among these nominations, there will be one able to garner at least two-thirds of the votes.”
Despite this level of detail, the same article declared that “From the moment cardinal electors entered the Santa Marta residence, they have not had any contact with the outside world and have to use protected paths that are constantly under surveillance, to get about. Every space they enter is monitored and blocked off from all forms of communication… All those who have to access the Holy See during the Conclave are bound to the strictest confidentiality.”
Then came the three strikes of the clock.
The first strike of the clock was a post by Vatican Insider journalist Giacomo Galeazzi, time-stamped on Vatican Insider Twitter at 8.24am that morning. It noted that there were only five candidates left in the running: Scola, Scherer, Bergoglio, Ouellet, Dolan.
The second strike of the clock was a link to a post by Vatican Insider journalist Giacomo Galeazzi, time-stamped on Vatican Insider Twitter at 11.12am: “After the first negative scrutinies, lunch breaks and dinners in Santa Marta House, the cardinals’ residence during the conclave, become opportunities for informal discussions on disregarding candidates with weaker consensuses, to the advantage of the papabile who have obtained more votes so far (Scola, Bergoglio, Ouellet).”
So, by 11.12 am, according to Galeazzi, it was effectively down to three – Cardinals Scola, Bergoglio and Ouellet.
The third strike of the clock came at 11.57am, when the Guardian Liveblog reported that: “La Stampa’s Vatican Insider claims that most of the votes have been going to Cardinals Scola, Bergoglio and Ouellet. This morning it was claiming most of them were going to Scola, Scherer, Bergoglio, Ouellet and Dolan. But it’s hard to know where they can be getting this information from.”
So what was actually going on while the clock was striking once, twice, thrice? A post-election report, published in La Repubblica, claims that Scola received approximately 35 votes in the first vote, to 20 for Bergoglio and 15 for Ouellet. National Catholic Reporter also reports that there was some support for Scherer: “After two rounds of voting Wednesday morning, it had become clear that neither Scola nor Scherer were likely to cross the finish line and gain the 77 votes needed for election … The fourth ballot, the first of Wednesday afternoon, saw Bergoglio separate himself from the pack.”
So it appears that Galeazzi’s tweeted reports conformed broadly to what we now understand to have been the case. Somehow it seems he knew!!!
But the markets failed to respond except for a flicker towards Bergoglio on the exchanges after the Guardian Liveblog posted the niche Galeazzi tweets to their wider audience.
So, either the new information was not (for good or bad reason) sufficiently believed. Or it was for the most part overlooked by those trading on the exchanges. Or the market was not sufficiently liquid to make it possible to earn a significant return, so most sophisticated traders did not bother to participate.
Whatever the reason, the betting markets did not perform as well as might have been expected in responding to new public information, which subsequently turned out to be accurate, unless the reports were accurate by sheer chance and deserved to be disbelieved. After all, it was ‘Vatican Insider’ itself that declared how “All those who have to access the Holy See during the Conclave are bound to the strictest confidentiality.”
This cannot be explained either in terms of the fog of conflicting signals as there were no other credible sources issuing conflicting information.
So the ‘Galeazzi anomaly’, as I term it, turns into a mystery, partly because he seemed to know what he shouldn’t have known, but also because hardly anyone seemed to believe him. Giacomo Galeazzi shouted wolf, and there was a wolf! It is a lesson that some, in an efficient market, will now have learned.
Further Reading.
Vaughan Williams, L. and Paton, D., (2015), Forecasting the Outcome of Closed-Door Decisions: Evidence from 500 Years of Betting on Papal Conclaves, Journal of Forecasting, 34 (5), August, 391-404.
The Favourite-Longshot Bias is the well-established tendency in most betting markets for bettors to over-bet ‘longshots’ (events with long odds, i.e. low probability events) and to relatively under-bet ‘favourites’ (events with short odds, i.e. high probability events).
Assume, for example, that Mr. Miller and Mr. Stiller both start with £1,000.
Now Mr. Miller places a level £10 stake on 100 horses quoted at 2 to 1
Mr. Stiller places a level £10 stake on 100 horses quoted at 20 to 1.
Who is likely to end up with more money at the end?
My Ladbrokes Flat Season Pocket Companion for 1990 provides a nicely laid out piece of evidence here for British flat horse racing between 1985 and 1989. The table conveniently presented in the Companion shows that not one out of 35 favourites sent off at 1/8 or shorter (as short as 1/25) lost between 1985 and 1989. This means a return of between 4% and 12.5% in a couple of minutes, which is an astronomical rate of interest. The point being made is that broadly speaking the shorter the odds, the better the return. The group of ‘white hot’ favourites (odds between 1/5 and 1/25) won 88 out of 96 races for a 6.5% profit. The following table looks at other odds groupings.
Odds Wins Runs Profit %
1/5-1/2 249 344 +£1.80 +0.52
4/7-5/4 881 1780 -£82.60 -4.64
6/4 -3/1 2187 7774 -£629 -8.09
7/2-6/1 3464 21681 -£2237 -10.32
8/1-20/1 2566 53741 -£19823 -36.89
25/1-100/1 441 43426 -£29424 -67.76
An interesting argument advanced by the Strathclyde-based statistician Dr. Robert Henery in 1985 is that the favourite-longshot bias is a consequence of bettors discounting a fixed fraction of their losses, i.e. they underweight their losses compared to their gains.
This argument also explains an observed link between the sum of bookmakers’ prices and the number of runners in a race. The prices being summed here are simply the odds. If, for example, odds of 3/1 (against) are offered about each of the five horses in a race, the implied probability of winning for each horse is ¼ and the sum of prices is 5/4.
In this context, an ‘over-round’ is defined as the excess of the sum of prices over 1, in this case ¼.
The rationale behind Henery’s hypothesis is that bettors will tend to explain away and therefore discount losses as atypical, or unrelated to the judgment of the bettor.
This is consistent with contemporaneous work on the psychology of gambling, such as Gilovich in 1983 and Gilovich and Douglas in 1986.
These studies demonstrate how gamblers tend to discount their losses, often as ‘near wins’ or the outcome of ‘fluke’ events, while bolstering their wins.
Let’s look more closely at how the Henery odds transformation works.
If the true probability of a horse losing a race is q, then the true odds against winning are q/(1-q).
For example, if the true probability of a horse losing a race (q) is ¾, the chance that it will win the race is ¼, i.e. 1- ¾. The odds against it winning are: q/(1-q) = 3/4/(1-3/4) = 3/4/(1/4) = 3/1.
Henery now applies a transformation whereby the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is ½ (q=1/2), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = ½. ¾ = 3/8, i.e. a subjective chance of winning of 5/8.
So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 50% (Evens, i.e. q=1/2) is 3/5 (60%), i.e. odds-on.
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 3/8/(1-3/8) = 3/8/(5/8) = 3/5
If the true probability of a horse losing a race is 80%, so that the true odds against winning are 4/1 (q = 0.8), then the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is 4/5 (q=0.2), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = 3/4. 4/5 = 12/20, i.e. a subjective chance of winning of 8/20 (2/5).
So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 20% (4 to 1, i.e. q=0.8) is 6/4 (40%).
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 12/20 / (1-12/20) = 12/8 = 6/4
To take this to the limit, if the true probability of a horse losing a race is 100%, so that the true odds against winning are ∞ to 1 against (q = 1), then the bettor will again assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is 100% (q=1), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = 3/4. 1 = 3/4, i.e. a subjective chance of winning of 1/4.
So the perceived (subjective) odds of winning associated with true (objective odds) of losing of 100% (∞ to 1, i.e. q=1) is 3/1 (25%).
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 3/4 / (1/4) = 3/1
Similarly, if the true probability of a horse losing a race is 0%, so that the true odds against winning are 0 to 1 against (q = 0), then the bettor will assess the chance of losing not as q, but as Q which is equal to fq, where f is the fixed fraction of losses undiscounted by the bettor.
If, for example, f = ¾, and the true chance of a horse losing is 0% (q=0), then the bettor will rate subjectively the chance of the horse losing as Q = fq.
So Q = 3/4. 0 = 0, i.e. a subjective chance of winning of 1.
So the perceived (subjective) odds associated of winning with true (objective odds) of losing of 0% (0 to 1, i.e. q=0) is also 0/1.
This is derived as follows:
Q/(1-Q) = fq/(1-fq) = 0 / 1 = 0/1
This can all be summarised in a table.
Objective odds (against) Subjective odds (against) | |
Evens 3/5 | |
4/1 6/4 | |
Infinity to 1 3/1 | |
0/1 0/1 |
We can now use these stylised examples to establish the bias.
In particular, the implication of the Henery odds transformation is that, for a given f of ¾, 3/5 is perceived as fair odds for a horse with a 1 in 2 chance of winning.
In fact, £100 wagered at 3/5 yields £160 (3/5 x £100, plus stake returned) half of the time (true odds = evens), i.e. an expected return of £80.
£100 wagered at 6/4 yields £250 (6/4 x £100, plus the stake back) one fifth of the time (true odds = 4/1), i.e. an expected return of £50.
£100 wagered at 3/1 yields £0 (3/1 x £100, plus the stake back) none of the time (true odds = Infinity to 1), i.e. an expected return of £0.
It can be shown that the higher the odds the lower is the expected rate of return on the stake, although the relationship between the subjective and objective probabilities remains at a fixed fraction throughout.
Now on to the over-round.
The same simple assumption about bettors’ behaviour can explain the observed relationship between the over-round (sum of win probabilities minus 1) and the number of runners in a race, n.
If each horse is priced according to its true win probability, then over-round = 0. So in a six horse race, where each has a 1 in 6 chance, each would be priced at 5 to 1, so none of the lose probability is shaded by the bookmaker. Here the sum of probabilities = (6 x 1/6) – 1 = 0.
If only a fixed fraction of losses, f, is counted by bettors, the subjective probability of losing on any horse is f(qi), where qi is the objective probability of losing for horse i, and the odds will reflect this bias, i.e. they will be shorter than the true probabilities would imply. The subjective win probabilities in this case are now 1-f(qi), and the sum of these minus 1 gives the over-round.
Where there is no discounting of the odds, the over-round (OR) = 0, i.e. n times correct odds minus 1. Assume now that f = ¾, i.e. ¾ of losses are counted by the bettor.
If there is discounting, then the odds will reflect this, and the more runners the bigger will be the over-round.
So in a race with 5 runners, q is 4/5, but fq = 3/4 x 4/5 = 12/20, so subjective win probability = 1-fq = 8/20, not 1/5. So OR = (5 x 8/20) – 1 = 1.
With 6 runners, fq = ¾ x 5/6 = 15/24, so subjective win probability = 1 – fq = 9/24. OR = (6x 9/24) – 1 = (54/24) -1 = 1_{1/4. }
With 7 runners, fq = ¾ x 6/7 = 18/28, so subjective win probability = 1-fq = 10/28. OR = (7 x 10/28) – 1 = 42/28 = 1_{1/2}
If there is no discounting, then the subjective win probability equals the actual win probability, so an example in a 5-horse is that each has a win probability of 1/5. Here, OR = (5×1/5) – 1 = 0. In a 6-horse race, with no discounting, subjective probability = 1/6. OR = (6 x 1/6) – 1 = 0.
Hence, the over-round is linearly related to the number of runners, assuming that bettors discount a fixed fraction of losses (the ‘Henery Hypothesis’).
If the Henery Hypothesis is correct as a way of explaining the favourite-longshot bias, the bias can be explained as the natural outcome of bettors’ pre-existing perceptions and preferences.
This is quite consistent with a market efficiently processing the information available to it.
Are there other explanations for the favourite-longshot bias, and the observed link between over-round and runners, which do not rely on the Henery Hypothesis? Any coherent theory of the favourite-longshot bias should be able to explain both observed regularities. That is a topic for another time.
How large should a randomly chosen group of people be, to make it more likely than not that at least two of them share a birthday?
For convenience, assume that all dates in the calendar are equally likely as birthdays, and ignore the Leap Year special of February 29^{th}
The first thing to look at is the likelihood that two randomly chosen people would share the same birthday.
Let’s call them Fred and Felicity. Say Felicity’s birthday is May 1^{st}. What is the chance that Fred shares this birthday with Felicity? Well there are 365 days in the year, and only one of these is May 1^{st} and we are assuming that all dates in the calendar are equally likely as birthdays.
So, the probability that Fred’s birthday is May 1^{st} is 1/365, and the chance he shares a birthday with Felicity is 1/365.
So what is the probability that Fred’s birthday is not May 1^{st? }It is 364/365. This is the probability that Fred doesn’t share a birthday with Felicity.
More generally, for any randomly chosen group of two people, the probability that the second person has a different birthday to the first is 364/365.
With 3 people, the chance that all three are different is the chance that the first two are different (364/365) multiplied by the chance that the third birthday is different (363/365).
So, the probability that 3 people have different birthdays = 364/365 x 363/365
This can be written as (364)_{2 }/ 365^{2}
Similarly, probability that 5 people have different birthdays = (364)_{4} / 365^{4}
= 364x363x362x361/365^{4}
So far, the chance of no matches is very high. But by the tenth person the probability of no matches is:
(364/365)*(363/365)(362/365)*(361/365)(360/365)*(359/365)(358/365)*(357/365) (356/365) = 0.8831
More generally, for n people, probability they all have different birthdays =
(364)_{n-1 } / 365^{n-1}
For 23 people, probability of all different birthdays = (364)_{22 }/ 365^{2} = 0.4927
For 22 people, probability of all different birthdays = (364)_{21 }/ 365^{2} = 0.5243
So, in a group of 23 people, there is a (1-0.4927) = 0.5073 chance of that at least two of the group share a birthday.
So how large should a randomly chosen group of people be, to make it more likely than not that at least two of them share a birthday? The answer is 23.
The intuition behind this is quite straightforward if we recognise just how many pairs of people there are in a group of 23 people, any pair of which could share a birthday.
In a group of 23 people, there are, according to the standard formula, ^{23}C_{2 }pairs of people (called 23 Choose 2) pairs of people.
Generally, the number of ways k things can be chosen from n is:
^{n} C _{k} = n! / (n-k)! k!
Thus, ^{23}C_{2 }= 23! / 21! 2! = 23 x 22 / 2 = 253
So, in a group of 23 people, there are 253 pairs of people to choose from.
_{ }Therefore, a group of 23 people generates 253 chances, each of size 1/365, of having at least two people in the group sharing the same birthday.
These chances have some overlap: if A and B have a common birthday, and A and C have a common birthday, then inevitably so do B and C. So the probability of at least two people sharing a birthday in a group of 23 is less than 253/365 (69.3%). It is, as shown previously, 50.73%.
To conclude, the next time you see two football teams line up, include the referee. It is now more likely than not that two of those on the pitch share the same birthday. Strange, but true!
Let’s suppose Bill and Ben each toss separate coins. Let A represent the variable “Bill’s coin toss outcome”, and B represent the variable “Ben’s coin toss outcome”. Both A and B have two possible values (Heads and Tails). It would be uncontroversial to assume that A and B are independent. Evidence about B will not change our belief in A. In other words, the fact that Ben’s coin lands heads does not affect the likelihood that Bill will throw heads. What happens to Bill’s coin and Ben’s coin are unrelated. They are independent.
Now suppose both Bill and Ben toss the same coin. Again let A represent the variable “Bill’s coin toss outcome”, and B represent the variable “Ben’s coin toss outcome”. Assume also that there is a possibility that the coin is biased towards heads but we do not know this for certain. In this case A and B are not independent. Observing that Ben’s coin has landed heads might cause us to increase our belief that Bill will throw a Heads.
In the second example, the variables A and B are both dependent on a separate variable C, “the coin is biased towards Heads” (which has the values True or False). Although in this case A and B are not independent, it turns out that once we know for certain the value of C then any evidence about B cannot change our belief about A.
In such a case we say that A and B are conditionally independent given C.
In many real life situations variables which are believed to be independent are actually only independent conditional on some other variable. Let’s take an example. Suppose that Ted and Ned live on opposite sides of the city and come to work by completely different means. Let’s say Ted arrives by train while Ned drives to work. Let A represent the variable “Ted late” (which has values true or false) and similarly let B represent the variable “Ned late”. At first glance, it might seem that A and B are independent. However, even if Ted and Ned lived and worked in different countries there may be factors (such as an international fuel shortage) which could affect both Ted and Ned. In that case, A and B are not independent. Again, it doesn’t seem reasonable to exclude the possibility that both Ted and Ned may be affected by a rail strike (C). Clearly the likelihood that Ted will arrive late to work will increase if the rail strike takes place; but the likelihood that Ned will arrive late to work might also increase, indirectly, because of the additional traffic on the roads caused by the rail strike. ‘Ted to be late’ and ‘Ned to be late’ are in this case conditionally independent GIVEN the rail strike.
Two events, A and B, are defined to be conditionally independent, given some other event, C, if the probability of both A occurring and B occurring, given some other event, C, is equal to the probability of A occurring given C multiplied by the probability of B occurring given C, i.e.
The notation used for this is: P(AՈB I C) = P(AIC) . P(BIC)
In the example we have just considered, the probability that Ted and Ned are late to work given the train strike equals the probability that Ted is late given the strike multiplied by the probability that Ned is late given the strike.
This takes us to a new question.
Does conditional independence, given C, imply unconditional independence?
Say, for example, Jack is playing Jill at snooker. Jack and Jill know nothing about each other’s ability at snooker.
Now suppose Jill wins her first 5 games. This provides evidence for her to assess the strength of her opponent, Jack, and vice-versa.
But the games may be conditionally independent (Jill is equally likely to win the fifth game as the second given Jack and Jill’s relative skill at chess).
Even so, they are not independent (that would mean that winning the first five games tells you nothing about the likelihood of winning the sixth).
So the answer to the latest question is No. Conditional independence does not imply unconditional independence.
Finally, does unconditional independence imply conditional independence?
To answer this, let’s imagine an event with multiple causes.
Let A be the event that the fire alarm goes off.
Now suppose this could be caused by a genuine fire (F) or someone making popcorn (P), which sets off a false alarm.
Now let’s suppose that the probability of a fire is completely independent of the probability of someone making popcorn. But also that the probability the alarm is indicating a real fire is 100 per cent if nobody is making popcorn.
So the probability of a fire and the probability of making popcorn are independent of each other, yet the probability it’s a genuine fire if the alarm goes off is conditionally dependent on whether someone is making popcorn (you can be sure it’s a genuine fire if nobody is making popcorn).
So, does unconditional independence imply conditional independence? The answer is No.
So, in summary, events may be independent or they may be conditionally independent. Conditional independence does not, however, imply unconditional independence, and unconditional independence does not imply conditional independence.
Further Reading and Links
One of the most celebrated pieces of correspondence in the history of probability and gambling, and one of which I am particularly fond, involves an exchange of letters between the greatest diarist of all time, Samuel Pepys, and the greatest scientist of all time, Sir Isaac Newton.
The six letters exchanged between Pepys in London and Newton in Cambridge related to a problem posed to Newton by Pepys about gambling odds. The interchange took place between November 22 and December 23, 1693. The ostensible reason for Mr. Pepys’ interest was to encourage the thirst for truth of his young friend, Mr. Smith. Whether Sir Isaac believed that tale or not we shall never know. The real reason, however, was later revealed in a letter written to a confidante by Pepys indicating that he himself was about to stake 10 pounds, a considerable sum in 1693, on such a bet. Now we’re talking!
The first letter to Newton introduced Mr. Smith as a fellow with a “general reputation…in this towne (inferiour to none, but superiour to most) for his maistery [of]…Arithmetick”.
What emerged has come down to us as the aptly named Newton-Pepys problem.
Essentially, the question came down to this:
Which of the following three propositions has the greatest chance of success.
A. Six fair dice are tossed independently and at least one ‘6’ appears
B. 12 fair dice are tossed independently and at least two ‘6’s appear.
C. 18 fair dice are tossed independently and at least three ‘6’s appear.
Pepys was convinced that C. had the highest probability and asked Newton to confirm this.
Newton chose A as the highest probability, then B, then C, and produced his calculations for Pepys, who wouldn’t accept them.
So who was right? Newton or Pepys?
Well, let’s see.
The first problem is the easiest to solve.
What is the probability of A?
Probability that one toss of a coin produces a ‘6’ = 1/6
So probability that one toss of a coin does not produce a ‘6’ = 5/6
So probability that six independent tosses of a coin produces no ‘6’ = (5/6)^{6}
So probability of AT LEAST one ‘6’ in 6 tosses = 1 – (5/6)^{6} = 0.6651
So far, so good.
The probability of problem B and probability of problem C are more difficult to calculate and involve use of the binomial distribution, though Newton derived the answers from first principles, by his method of ‘Progressions’.
Both methods give the same answer, but using the more modern binomial distribution is easier.
So let’s do it, along the way by introducing the idea of so-called ‘Bernoulli trials’.
The nice thing about a Bernoulli trial is that it has only two possible outcomes.
Each outcome can be framed as a ‘yes’ or ‘no’ question (success or failure).
Let probability of success = p.
Let probability of failure = 1-p.
Each trial is independent of the others and the probability of the two outcomes remains constant for every trial.
An example is tossing a coin. Will it lands heads?
Another example is rolling a die. Will it come up ‘6’?
Yes = success (S); No = failure (F).
Let probability of success, P (S) = p; probability of failure, P (F) = 1-p.
So the question: How many Bernoulli trials are needed to get to the first success?
This is straightforward, as the only way to need exactly five trials, for example, is to begin with four failures, i.e. FFFFS.
Probability of this = (1-p) (1-p) (1-p) (1-p) p = (1-p)^{4 }p
Similarly, the only way to need exactly six trials is to begin with five failures, i.e. FFFFFS.
Probability of this = (1-p) (1-p) (1-p) (1-p) (1-p) p = (1-p)^{5} p
More generally, the probability that success starts on trial number n =
(1-p)^{n-1} p
This is a geometric distribution. This distribution deals with the number of trials required for a single success.
But what is the chance that the first success takes AT LEAST some number of trials, say 12 trials?
One method is to add the probability of 12 trials to prob. of 13 trials to prob. of 14 trials to prob. of 15 trials, etc. …………………………
Easier method: The only time you will need at least 12 trials is when the first 11 trials are all failures, i.e. (1-p)^{11}
In a sequence of Bernoulli trials, the probability that the first success takes at least n trials is (1-p)^{n-1}
Let’s take a couple of examples.
Probability that the first success (heads on coin toss) takes at least three trials (tosses of the coin)= (1-0.5)^{2} = 0.25
Probability that the first success (heads on coin toss) takes at least four trials (tosses of the coin)= (1-0.5)^{3} = 0.125
But so far we have only learned how to calculate the probability of one success in so many trials.
What if we want to know the probability of two, or three, or however many successes?
To take an example, what is the probability of exactly two ‘6’s in five throws of the die?
To determine this, we need to calculate the number of ways two ‘6’s can occur in five throws of the die, and multiply that by the probability of each of these ways occurring.
So, probability = number of ways something can occur multiplied by probability of each way occurring.
How many ways can we throw two ‘6’s in five throws of the die?
Where S = Success in throwing a ‘6’, F = Fail in throwing a ‘6’, we have:
SSFFF; SFSFF; SFFSF; SFFFS; FSSFF; FSFSF; FSFFS; FFSSF; FFSFS; FFFSS
So there are 10 ways of throwing two ‘6’s in five throws of the dice.
More formally, we are seeking to calculate how many ways 2 things can be chosen from 5. This is known as ‘5 Choose 2’, written as:
^{5 }C _{2}= 10
More generally, the number of ways k things can be chosen from n is:
^{n}C _{k} = n! / (n-k)! k!
n! (known as n factorial) = n (n-1) (n-2) … 1
k! (known as k factorial) = k (k-1) (k-2) … 1
Thus, ^{5}C _{2} = 5! / 3! 2! = 5x4x3x2x1 / (3x2x1x2x1) = 5×4/(2×1) = 20/2=10
So what is the probability of throwing exactly two ‘6’s in five throws of the die, in each of these ten cases? p is the probability of success. 1-p is the probability of failure.
In each case, the probability = p.p.(1-p).(1-p).(1-p)
= p^{2} (1-p)^{3}
Since there are ^{5} C _{2 }such sequences, the probability of exactly 2 ‘6’s =
10 p^{2 }(1-p)^{3}
Generally, in a fixed sequence of n Bernoulli trials, the probability of exactly r successes is:
^{n}C _{r} x p^{r} (1-p) ^{n-r}
This is the binomial distribution. Note that it requires that the probability of success on each trial be constant. It also requires only two possible outcomes.
So, for example, what is the chance of exactly 3 heads when a fair coin is tossed 5 times?
^{5}C _{3} x (1/2)^{3} x (1/2)^{2} = 10/32 = 5/16
And what is the chance of exactly 2 sixes when a fair die is rolled five times?
^{5 }C _{2}x (1/6)^{2} x (5/6)^{3} = 10 x 1/36 x 125/216 = 1250/7776 = 0.1608
So let’s now use the binomial distribution to solve the Newton-Pepys problem.
- What is the probability of obtaining at least one six with 6 dice?
- What is the probability of obtaining at least two sixes with 12 dice?
- What is the probability of obtaining at least three sizes with 18 dice?
First, what is the probability of no sixes with 6 dice?
P (no sixes with six dice) = ^{n} C _{x }. (1/6)^{x} . (5/6)^{n-x, }x = 0,1,2,…,n
Where x is the number of successes.
So, probability of no successes (no sixes) with 6 dice =
n!/(n-k)!k! = 6!/(6-0)!0! x (1/6)^{0} . (5/6)^{6-0} = 6!/6! X 1 x 1 x (5/6)^{6 = }(5/6)^{6}
Note that: 0! = 1
Here’s the proof: n! = n. (n-1)!
At n=1, 1! = 1. (1-1)!
So 1 = 0!
So, where x is the number of sixes, probability of at least one six is equal to ‘1’ minus the probability of no sixes, which can be written as:
P (x≥ 1) = 1 – P(x=0) = 1 – (5/6)^{6 }= 0.665 (to three decimal places).
i.e. probability of at least one six = 1 minus the probability of no sixes.
That is a formal solution to Part 1 of the Newton-Pepys Problem.
Now on to Part 2.
Probability of at least two sixes with 12 dice is equal to ‘1’ minus the probability of no sixes minus the probability of exactly one six.
This can be written as:
P (x≥2) = 1 – P(x=0) – P(x=1)
P(x=0) in 12 throws of the dice = (5/6)^{12}
P (x=1) in 12 throws of the dice = ^{12} C _{1} . (1/6)^{1} . (5/6)^{11}^{n}C _{k} = n! / (n-k)! k!
So ^{12} C _{1 }
= 12! / (12-1)! 1! = 12! / 11! 1! = 12
So, P (x≥2) = 1 – (5/6)^{12 }– 12. (1/6) . (5/6)^{11 }
= 1 – 0.112156654 – 2 . (0.134587985) = 0.887843346 – 0.26917597 =
= 0.618667376 = 0.619 (to 3 decimal places)
This is a formal solution to Part 2 of the Newton-Pepys Problem.
Now on to Part 3.
Probability of at least three sixes with 18 dice is equal to ‘1’ minus the probability of no sixes minus the probability of exactly one six minus the probability of at exactly two sixes.
This can be written as:
P (x≥3) = 1 – P(x=0) – P(x=1) – P(x=2)
P(x=0) in 18 throws of the dice = (5/6)^{18}
P (x=1) in 18 throws of the dice = ^{18} C _{1} . (1/6)^{1} . (5/6)^{17}
^{n}C _{k} = n! / (n-k)! k!
So ^{18} C _{1}
= 18! / (18-1)! 1! = 18
So P (x=1) = 18. (1/6)^{1} . (5/6)^{17}
P (x=2) = ^{18 }C _{2 .} (1/6)^{2} .(5/6)^{16}
^{18 }C _{2 }
_{ }= 18! / (18-2)! 2! = 18!/16! 2! = 18. (17/2)
So P (x=2) = 18. (17/2) (1/6)^{2 }(5/6)^{16}
So P(x=3) = 1 – P (x=0) – (P(x=1) – P (x=2)
P (x=0) = (5/6)^{18}
= 0.0375610365
P (x=1) = 18. 1/6. (0.0450732438) = 0.135219731
P (x=2) = 18. (17/2) (1/36) (0.0540878926) = 0.229873544
So P(x=3) = 1 – 0.0375610365 – 0.135219731 – 0.229873544 =
P(x≥3) = 0.597345689 = 0.597 (to 3 decimal places, )
This is a formal solution to Part 3 of the Newton-Pepys Problem.
So, to re-state the Newton-Pepys problem.
Which of the following three propositions has the greatest chance of success?
A. Six fair dice are tossed independently and at least one ‘6’ appears.
B. 12 fair dice are tossed independently and at least two ‘6’s appear.
C. 18 fair dice are tossed independently and at least three ‘6’s appear.
Pepys was convinced that C. had the highest probability and asked Newton to confirm this.
Newton chose A, then B, then C, and produced his calculations for Pepys, who wouldn’t accept them.
So who was right? Newton or Pepys?
According to our calculations, what is the probability of A? 0.665
What is the probability of B? 0.619
What is the probability of C? 0.597
So Sir Isaac’s solution was right. Samuel Pepys was wrong, a wrong compounded by refusing to accept Newton’s solution. How much he lost gambling on his misjudgement is mired in the mists of history. The Newton-Pepys Problem is not, and continues to tease our brains to this very day.
Further Reading and Links
http://datagenetics.com/blog/february12014/index.html
Zeno of Elea was a Greek philosopher of the 5^{th} century BC, best known for his paradoxes of motion, described by Aristotle in his ‘Physics’. Of these perhaps the best known is his paradox of the tortoise and Achilles, in its various forms. In a modern version, the antelope starts 100 metres ahead of the cheetah and moves at half the speed of the cheetah. Will the cheetah ever catch the antelope, assuming they don’t slow down?
Zeno’s paradox relies on the fact that when the cheetah reaches the starting position of the antelope, the antelope will have travelled 50 metres further. When the cheetah arrives at that point, the antelope will have travelled a further 25 metres, and so on. Zeno argued that this was an infinite process, and so does not have a final, finite step. So how can the cheetah ever catch the antelope?
There is a mathematical solution to the paradox, which goes like this:
Let S be the distance the cheetah runs and let 1 = 100 metres.
So S = 1 + ½ + ¼ + 1/8 + 1/16 + 1/32 …..
½ S = ½ + ¼ + 1/8 + 1/16 + 1/32 …..
Therefore, S – ½ S = 1
Therefore, S = 2
So the cheetah catches the antelope in 200 metres.
So an infinite process, with no final step, has a finite conclusion.
That’s the mathematical solution, but does that solve the intuitive paradox? How can an infinite process, with no final step, come to an end? I understand the mathematical solution, but somehow it is as unsatisfying as the wrapper of a chocolate bar. To me, the real chocolate remains untouched. Such paradoxes I refer to as ‘chocolate paradoxes.’ What they have in common is that they can be solved mathematically without really being solved at all.
For those who might differ with me, the Thomson’s Lamp thought experiment offers a related challenge. Devised by philosopher James F. Thomson in 1954, it goes like this. Think of a lamp with a switch. You flick the switch to turn the light on. At the end of one minute exactly you flick it off. At the end of a further half minute, you turn it on again. At the end of a further quarter minute you turn it off. And so on. The time between each turning on and off the lamp is always half the duration of the time before. Assume you have the superpower to do each turning on and turning off instantaneously.
Adding these up gives: 1 minute plus half a minute plus a quarter of a minute ….
1 + ½ + ¼ + 1/8 + 1/16 + 1/32 + … = 2.
In other words, all of these infinitely many time intervals add up to exactly two minutes.
So here’s the question. At the end of two minutes, is the lamp on or off?
And here’s a second question. Say the lamp starts out being off and you turn it on after one minute, then off after a further half minute and so on. Does this make any difference to your answer?
Thomson claimed there was no solution, and that the problem led to a contradiction.
“It seems impossible to answer this question. It cannot be on, because I did not ever turn it on without at once turning it off. It cannot be off, because I did in the first place turn it on, and thereafter I never turned it off without at once turning it on. But the lamp must be either on or off. This is a contradiction.”
While considering the relationship between the infinite and the finite, consider in conclusion the following.
Can a number of infinite length be represented by a line of finite length? Solution below.
Spoiler Alert (Solution)
The square root of 2 is an irrational number, with no finite solution. In other words, it goes on for ever. 1.4142135623730950488……………………….. for ever…..
So can a line with a finite length exactly equal to this infinitely long number be drawn?
Draw a right-angled triangle, of vertical length (a) and horizontal length (b) equal to 1.
Then, the length of the hypoteneuse of the triangle, c, can be derived from the length of the adjacent (a) and opposite (b) sides, using Pythagoras’ Theorem.
a^{2} + b^{2} = c^{2}
So, 1^{2} + 1^{2} = c^{2}
^{ }So c^{2} = 2
c = √2
This is a line of finite length, representing a number of infinite length. So the answer to the question is yes. Strange? Indeed. Another of those tantalising ‘chocolate paradoxes.’
Further reading and links
http://numberphile.com/videos/zeno_paradox.html
Thomson, James, F. ‘Tasks and Super-Tasks’, Analysis, 15 (1), 1-13.
The famed correspondence between two titans of 17^{th} century French intellectual thought, Blaise Pascal (Pascal’s Wager) and Pierre Fermat (Fermat’s Last Theorem) was to mark the foundation of modern probability theory. But it was sparked off by a question posed to Pascal by legendary French gambler of the time, Antoine Gombaud, better known as the Chevalier de Mere.
The question related to a new dice game the Chevalier had invented. According to the rules of the game, he asked for even money odds that a pair of dice, when rolled 24 times, will come up with a double-6 at least once. His reasoning seemed impeccable. If the chance of a 6 on one roll of the die = 1/6, then the chance of a double-6 when two dice are thrown = 1/6 x 1/6 (as they are independent events) = 1/36.
So, he reasoned, the chance of at least one double-6 in 24 throws is: 24/36 = 2/3. So this should be a profitable game for the Chevalier. When it didn’t turn out that way, he asked the great philosopher and mathematician, Blaise Pascal to look into it, as you do.
Pascal derived the correct probabilities as follows:
Probability of a double-6 in one throw of a pair of dice = 1/6 x 1/6 = 1/36.
So probability of NO double-6 in one throw of a pair of dice = 35/36.
So, probability of no double-6 in 24 throws of a pair of dice = 35/36 x 35/36 … 24 times = 35/36 to the power of 24, i.e. (35/36)^{24 }= 0.5086.
So, probability of at least one double-6 = 1 – 0.5086 = 0.4914
So the Chevalier was betting at even money on a game which he lost (albeit marginally) more often than he won, which is why he was losing over time.
What if he changed the game to give himself 25 throws?
Now, the probability of throwing at least one double-6 in 25 throws of a pair of dice is:
1 – (35/36)^{25} = 0.5055.
These odds, at even money, are in favour of the Chevalier, but this probability is still lower than the probability of obtaining one ‘6’ in four throws of a single die.
In the single-die game, the Chevalier has a house edge of 51.77% – 48.23% = 3.54%.
In the ‘pair of dice’ game (24 throws), the Chevalier’s edge =
49.14% – 50.81% = -1.72%
In the ‘pair of dice’ game (25 throws), the Chevalier’s edge =
50.55% – 49.45% = 1.1%
A better game for the Chevalier would have been to offer even money that he could get at least one run of ten heads in a row in 1024 tosses of a coin. The derivation of this probability is similar in method to the dice problem.
First, we need to determine the probability of 10 heads in 10 tosses of a fair coin.
The odds are: ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½
Odds = (1/2)^{10} = 1/1024, i.e. 1023/1.
Based on this, what is the probability of at least one run of 10 heads in 1024 tosses of the coin? Is it 0.5? No, because although you can expect ONE run of 10 heads on average, you could obtain zero, 2, 3, 4, etc.
So what is the probability of NO RUN of 10 heads in 1024 tosses of the coin?
This is: (1-1/1024)^{1024}
The probability of NO RUNS OF TEN HEADS = (1023/1024)^{1024} = 37%
So probability of AT LEAST one run of 10 heads = 63%.
Now assume you have tossed the coin already 234 times out of 1024, without a run of 10 heads, what is your chance now of getting 10 heads?
Probability of NO RUNS OF TEN HEADS in remaining 790 tosses = (1023/1024)^{790 }= 46%
So probability of at least one success = 54%.
The Chevalier could have played either of these games and expected to come out ahead. But the game would have taken a long time. He preferred the shorter game, which produced the longer loss.
Until he was put right by Monsieur Pascal.
Most importantly, though, the Chevalier’s question led to a correspondence, most of which has survived, which led to the foundations of modern probability theory.
I will examine just one of the conclusions of this correspondence today, and it relates to the infamous ‘Gambler’s Ruin’ problem.
This is an idea set in the form of a problem by Pascal for Fermat, subsequently published by Christiaan Huygens (‘On reasoning in games of chance’, 1657) and formally solved by Jacobus Bernoulli (‘Ars Conjectandi’, 1713).
One way of stating the problem is as follows. If you play any gambling game long enough, will you eventually go bankrupt, even if the odds are in your favour, if your opponent has unlimited funds?
Example: You and your opponent toss a coin, where the loser pays the winner £1. The game continues until either you or your opponent has all the money. Suppose you have £10 to start and your opponent has £20. What are the probabilities that a) you and b) your opponent, will end up with all the money?
The answer is that the player who starts with more money has more chance of ending up with all of it. The formula is:
P_{1} = n_{1} / (n_{1} + n_{2})
P_{2} = n_{2 }/ (n_{1} + n_{2})
Where n_{1} is the amount of money that player 1 starts with, and n_{2 }is the amount of money that player 2 starts with, and P1 and P2 are the probabilities that player 1 or player 2, your opponent, wins.
In this case, you start with £10 of the £30 total, and so have a 10/(10+20) = 10/30 = 1/3 chance of winning the £30; your opponent has a 2/3 chance of winning the £30. But even if you do win this game, and you play the game again and again, against different opponents, or the same one who has borrowed more money, eventually you will lose your entire bankroll. This is true even if the odds are in your favour. Eventually you will meet a long-enough bad streak to bankrupt you.
In other words, infinite capital will overcome any finite odds against it. This is the ‘Gambler’s Ruin’ problem, and many gamblers over the years have been ruined because of their unawareness of it.
So how can we avoid falling victim to the problem of ‘Gambler’s Ruin?’
‘Never bet more than you can afford to lose’.
‘When the Fun Stops, Stop!’
Now that’s a start.
Further Reading and Links
Letters between Fermat and Pascal on Probability: https://www.york.ac.uk/depts/maths/histstat/pascal.pdf