Applicants Admitted

Men 8442 44%

Women 4321 35%

Looks pretty damning, until it was decided to break the admittance figures down by department. In doing so, it revealed a paradox.

Dept. Men Women

Applicants Admitted Applicants Admitted

A 825 62% 108 82%

B 560 63% 25 68%

C 325 37% 593 34%

D 417 33% 375 35%

E 191 28% 393 24%

F 373 6% 341 7%

In other words, a higher proportion of women were admitted to four of the six departments than men.

So what was going on? Those with statistical training soon realised that this was a simple example of Simpson’s Paradox. Simpson’s Paradox arises when different groups of frequency data are combined, revealing a different performance rate overall than is the case when examining a breakdown of the performance rate. Put another way, Simpson’s paradox is the appearance of trends within different groups which disappear when data for the groups are combined together.

In the case of Berkeley, a study published in 1975 by Bickel, Hammel and O’Connell, in ‘Science’ reached the conclusion that women tended to apply to the more competitive departments with low rates of admission, such as the English Department, while men tended to apply to less competitive departments with high rates of admission, such as engineering and chemistry. As such the University was not actively discriminating against women, at least not on the basis of the statistics used to make the charge.

Ignorance of the implications of Simpson’s Paradox might also generate false conclusions in the case of medical trials.

Take the following drugs, and their success rate in medical trials over two different days.

Drug A Drug B

Day 1 63/90 = 70% 8/10 = 80%

Day 2 4/10 = 40% 45/90 = 50%

Overall, Drug A = 67% success rate; Drug B = 53% success rate.

But Drug B performs better on both days.

So which is the better drug? In the medical trials, I would certainly choose to be treated by Drug A. Others might differ, but I doubt they would persuade any reasonable judge of the outcome of the trials.

Take another example. In this trial, there are two groups, consisting of a control group of 240 patients who are supplied with a placebo drug, such as a sugar pill, which is known to have no effect on the illness under evaluation, and a test group of 240 patients who are supplied with the real drug. The 240 patients are made up of four groups. Group A is elderly adults, Group B is middle-aged adults, Group C is young adults and Group D is children.

Here are the results, with success rate measured by the proportion recovering from the illness within two days of taking the drug:

*Those taking the placebo.*

Group A: 20; Group B: 40; Group C: 120; Group D: 60

Success rates are:

Group A: 10%; Group B: 20%; Group C: 40%; Group D: 30%

Overall success rate for those taking the placebo = 2+8+48+18 Divided By 240 = 76/240 = 31.7%.

*Those taking the real drug.*

Group A: 120; Group B: 60; Group C: 20; Group D: 10

Success rates are:

Group A: 15%; Group B: 30%; Group C: 60%; Group D: 45%

Overall success rate for those taking the real drug = 18+18+12+18 Divided By 240 = 66/240 = 27.5%.

This compares with an overall success rate for those taking the placebo of 31.7%.

So the placebo, over the whole sample, produced a higher success rate than the real drug.

Breaking the numbers down by group, however, reveals a discrepancy.

**For the real drug**

Group A: 10%; Group B: 20%; Group C; 40%; Group D: 30%

**For the placebo**

Group A: 15%; Group B: 30%; Group C; 60%; Group D: 45%

So, in each individual group (elderly adults, middle-aged adults, young adults, children) the success rate is greater for those taking the real drug, although in the group as a whole, it is less.

How can we resolve the paradox?

The answer lies in the size and age distribution of each group, which differs between those who received the real drug and those who received the placebo. In this study, the group which received the placebo consists of a whole lot more young adults, for example, than the other groups, in contrast with the number taking the real drug. This is important because the natural recovery rates from this illness (as defined in the test) are normally higher in this demographic than the other groups, whether they receive the real drug or the placebo. Again, the elderly (whose recovery rates are normally lower than average) are much more heavily represented among those taking the real drug than the placebo.

Take another example from baseball. In the 1995/96 seasons, fans were divided between those who claimed Derek Jeter as the best performing player and those who claimed that title for David Justice. It is easy to see why. Here are their batting averages.

1995 1996 Combined

Derek Jeter 12/48 (.250) 183/582 (.314) 195/630 (.310)

David Justice 104/411 (.253) 45/140 (.321) 149/551 (.270)

Here we see that Jeter has the better overall batting average but Justice records a better average in each of the two years making up that overall average. To anyone conversant with Simpson’s Paradox this is nothing weird. It is certainly possible in theory for one player to score a better batting average in successive years than another, yet record a worse batting average overall. The case of Jeter and Justice is an example where the theory clearly shows up in practice.

Indeed, forward to 1997 and the paradox grows even stronger. In that year, Jeter averaged 0.291 (190/654), while Justice scored a better average (163/495). So, in three successive years, Justice recorded a better average than Jeter. Over the whole period, though, the batting average for Derek Jeter was 0.300 (385/1284), superior to David Justice, on 0.298 (312/1046).

For those more familiar with cricket than baseball, let’s take the following example of two mythical matches played by Harold Larwood and Bill Voce.

*First Match*:

Harold Larwood takes 3 wickets while bowling but concedes 60 runs off his bowling (an average of 20 runs conceded per wicket).

Bill Voce takes 2 wickets while bowling but concedes 68 runs (an average of 24 runs conceded per wicket).

*Second Match:*

Harold Larwood takes 1 wicket and concedes 8 runs (an average of 8 runs conceded per wicket).

Bill Voce takes 6 wickets and concedes 60 runs (an average of 10 runs conceded per wicket).

Here, Larwood has the superior performance in both matches (20 runs conceded per wicket compared to Voce’s 34 per wicket, and 8 runs conceded per wicket compared to Voce’s 10 per wicket). In the overall match, however, Larwood took 4 wickets for 68 runs (1 for 17) while Voce did slightly better, taking 8 wickets for 128 runs (1 for 16).

So who is the better baseball player? Who is the better bowler? Were the University of California, Berkeley, discriminating on the basis of gender? Which is the better drug? All of these questions are examples of Simpson’s Paradox.

**Reference and links**

P.J. Bickel, E.A. Hammel and J.W. O’Connell (1975), Sex Bias in Graduate Admissions: Data from Berkeley, Science, 187, 398-404.

]]>

Now, according to her father’s will, a suitor must choose the casket containing the portrait to win Portia’s hand in marriage. The first suitor, the Prince of Morocco, must choose from one of the three caskets. Each is engraved with a cryptic inscription. The gold casket reads, “Who chooseth me shall gain what many men desire.” The silver casket reads, “Who chooseth me shall get as much as he deserves.” The lead casket reads, “Who chooseth me must give and hazard all he hath”. He chooses the gold casket, hoping to find “an angel in a golden bed.” Instead, he finds a skull and a scroll inserted into the skull’s “empty eye.” The message he reads on the scroll says, “All that glisters is not gold.” The Prince beats a hasty exit. “A gentle riddance”, says Portia. The next suitor is the Prince of Arragon. “Who chooseth me shall get as much as he deserves”, he reads on the silver casket. “I’ll assume I deserve the very best”, he declares, and opens the casket. Inside he finds a picture of a fool with a sharp dismissive note which says “With one fool’s head I came to woo, But I go away with two.”

Now let us think about a plot twist where Portia must open one of the other caskets and give Arragon a chance to switch choice of caskets if he wishes. She is not allowed to indicate where the portrait is and in this case must open the gold casket (she knows it is in the lead casket so can’t open that) and show it is not in there. She now asks the Prince whether he wants to stick with his original choice of the silver casket or switch to the lead casket.

Let us imagine that he believes that Portia has no better idea than he has of which casket contains the prize. In that case, should he switch from his original choice of the silver casket to the lead casket? Well, since Portia had no knowledge of the location of the portrait, she might have inadvertently opened the casket containing the portrait, so she adds new information by opening the casket. But if he knows that she is aware of the location of the portrait, her decision to open the gold casket and not the lead casket has doubled the chance that the lead casket contains the portrait compared to his original choice, other things equal. This is because there was just a one third chance that his original choice (silver) was correct and a two thirds chance that one of the other choices (gold, lead) was correct. She is forced to eliminate the losing casket of the two (in this case, gold), so the two thirds chance converges on the lead casket.

So should he switch to the lead casket or stay with the silver? It depends whether things actually are equal. In particular, it depends on how valuable any information contained in the inscriptions is. If he has little faith in the inscriptions to arbitrate, he should definitely switch and improve his chance of winning fair Portia’s hand from 1/3 to 2/3. If he thinks, however, that he has unlocked the secret from the inscriptions, the decision is more difficult. If so, he might stick with his choice in good conscience.

In summary, the key to the problem is the new information Portia introduced by opening a casket which she knew did not contain the portrait. By acting on this new information, the Prince can potentially improve his chance of correctly predicting which casket will reveal the portrait from 1 in 3 to 2 in 3 – by switching boxes when given the chance. Unless he has other information which makes the opening probabilities different to 1/3 for each casket, such as those cryptic inscriptions. If this information is potentially valuable, or at least if the Prince thinks so, that complicates matters!

]]>

She next comes to the telephone and tells you she has been charged with smashing the shop window, based on the evidence of a police officer who positively identified her as the culprit. She claims mistaken identity. You must evaluate the probability that she did commit the offence before deciding how to advise her. So the condition is that she has been charged with criminal damage; the hypothesis you are interested in evaluating is the probability that she did it. Bayes’ Theorem, of course, helps to answer this type of question.

There are three things to estimate. The first is the Bayesian prior probability (which we represent as ‘a’). This is the probability you assign to the hypothesis being true before you become aware of the new information. In this case, it means the probability you would assign to your friend breaking the shop window immediately before you got the new information from her on the telephone that she had been charged on the basis of the witness evidence.

The second is the probability that the new evidence would have arisen if the hypothesis was true (which we represent as ‘b’). In this case, you need to estimate the probability of the police officer identifying your friend if your friend actually did break the window.

The third is to estimate the probability that the new evidence would have arisen if the hypothesis was false (which we represent as ‘c’). In this case, you need to estimate the probability of the police officer identifying your friend if your friend did NOT break the window.

According to Bayes’ Theorem, Posterior probability = **ab/ [ab+c(1-a)]**

So let’s apply Bayes’ Theorem to the case of the shattered shop window. Let’s start with a. Well, you have known her for years, and it is totally out of character, although she does live just a stone’s throw from the shop, and it is her day off work, so she could in principle have done it. Let’s say 5% (0.05). Assigning the prior probability is fraught with problems, however, as awareness of the new information might easily affect the way you assess the prior information. You need to make every effort to estimate this probability as it would have been before you received the new information. You also have to be precise as to the point in the chain of evidence at which you establish the prior probability.

What about b? This is the probability of the new evidence if the hypothesis was true. What is the hypothesis? That your friend broke the window. What is the new evidence? That the police officer has identified your friend as the person who smashes the window. So b is an estimate of the probability that the police officer would have identified your friend if she was indeed guilty. If she threw the brick, it’s easy to imagine how she came to be identified by the police officer. Still, he wasn’t close enough to catch the culprit at the time, which should be borne in mind. Let’s say that the probability he has identified her and that she is guilty is 80% (0.8).

Let’s move on to c. This is the probability of the new evidence if the hypothesis was false. What is the hypothesis again? That your friend broke the window. What is the new evidence again? That the police officer has identified your friend as the person who did it. So c is an estimate of the probability that the police officer would have identified her if she was not the guilty party, i.e. a false identification. If your friend didn’t shatter the window, how likely is the police officer to have wrongly identified her when he saw her in the street later that day? It is possible that he would see someone of similar age and appearance, wearing similar clothes, and jump to the wrong conclusion, or he may just want to identify someone to advance his career. Let us estimate the probability as 15% (0.15).

Once we’ve assigned these values, Bayes’ theorem can now be applied to establish a posterior probability. This is the number that we’re interested in. It is the measure of how likely is it that your friend broke the window, given that she’s been identified as the culprit by the police officer and charged on the basis of this evidence.

Given these estimates, we can use Bayes’ Theorem to update our probability that our friend is guilty to 21.9%, despite assigning a reliability of 80% to the police officer’s identification.

The most interesting takeaway from this application of Bayes’ Theorem is the relatively low probability you should assign to the guilt of your friend even though you were 80% sure that the police officer would identify her if she was guilty, and the small 15% chance you assigned that he would falsely identify her. The clue to the intuitive discrepancy is in the prior probability (or ‘prior’) you would have attached to the guilt of your friend before you were met face to face with the charge based on the evidence of the police officer. If a new piece of evidence now emerges (say a second witness), you should again apply Bayes’ Theorem to update to a new posterior probability, gradually converging, based on more and more pieces of evidence, ever nearer to the truth.

It is, of course, all too easy to dismiss the implications of this hypothetical case on the grounds that it was just too difficult to assign reasonable probabilities to the variables. But that is what we do implicitly when we don’t assign numbers. Bayes’ Theorem is not at fault for this in any case. It will always correctly update the probability of a hypothesis being true whenever new evidence is identified, based on the estimated probabilities. In some cases, such as the crime case illustrated here, that is not easy, though the approach you adopt to revising your estimate will always be better than using intuition to steer a path to the truth.

In many other cases, we do know with precision what the key probabilities are, and in those cases we can use Bayes’ Theorem to identify with precision the revised probability based on the new evidence, often with startlingly counter-intuitive results. In seeking to steer the path from ignorance to knowledge, the application of Bayes is always the correct method.

**Appendix**

The calculation and the simple algebraic expression that we have identified in this setting is:

ab/[ab+c(1-a)]

a is the prior probability of the hypothesis (she’s guilty) being true. This is more traditionally represented by the notation P(H). In the example, a = 0.05.

b is the probability the police officer identifies her conditional on the hypothesis being true, i.e. she’s guilty. This is more traditionally represented by the notation (PEIH), i.e. probability of E (the evidence) given the hypothesis is true, P(H). In the example, b = 0.8.

c is the probability the police officer identifies her conditional on the hypothesis not being true, i.e. she’s not guilty. This is more traditionally represented by the notation (PEIH’), i.e. probability of E (the evidence) given the hypothesis is false, P(H’). In the example, c = 0.15.

In our example, a = 0.05, b = 0.8, c = 0.15

Using Bayes’ Theorem, the updated (posterior) probability that the friend is guilty is:

ab/[ab+c(1-a)] = 0.04/(0.04+ 0.1425) = 0.04/0.1825

Posterior probability = 0.219 = 21.9%

]]>

Since only 5 per cent of the common beetles bear the distinctive pattern and 98 per cent of the rare beetles do, intuition would tell you that you have come across a rare insect when you espy the pattern. Bayes’ Theorem tells you something quite different.

To calculate just how likely the beetle is to be rare given that we see the pattern on its back, we apply Bayes’ Theorem.

Posterior probability = ab/ [ab+c(1-a)]

a is the prior probability of the hypothesis (beetle is rare) being true. b is the probability we observe the pattern and the beetle is rare (hypothesis is true). c is the probability we observe the pattern and the beetle is not rare (hypothesis is false).

In this case, a = 0.001 (0.1%); b = 0.98 (98%); c = 0.05 (5%).

So, updated probability = ab/ [ab+c(1-a)] = 0.0192. So there is just a 1.92 per cent chance that the beetle is rare when the entomologist spots the distinctive pattern on its back.

Why the counterintuitive result? Because so few of the population of all beetles are rare. Specifically, the prior probability that the beetles is rare is very small and it would take a lot more evidence than that acquired to make a reasonable case for the beetle being rare.

So what is the probability that the beetle is rare given that we observe the distinctive pattern? In other words, what is the probability that the hypothesis (the beetle is rare) is true given the evidence (the pattern). That is 1.92 per cent. What is the probability that we will observe the distinctive pattern if the beetle is rare? In other words, what is the probability of observing the evidence (the pattern) if the hypothesis (the beetle is rare) is true. That is 98 per cent.

To conflate these, to believe these two concepts are the same, is to commit the classic Prosecutor’s Fallacy, i.e. to falsely equate the probability that the defendant is guilty given the observed evidence with the probability of observing the evidence given that the defendant is guilty. It’s a potentially very dangerous fallacy to commit, especially when you happen to be the defendant and the jury has never heard of the Reverend Thomas Bayes!

**Appendix**

We can also solve the Beetle problem using the traditional notation version of Bayes’ Theorem.

P (HIE) = P (EIH). P (H) / [P (EIH) . P(H) + P (EIH’) . P(H’)]

In this case, P (H) = 0.001 (0.1%); P (EIH) = 0.98 (98%); P (EIH’) = 0.05 (5%).

So, P (HIE) = 0.98 x 0.001/ [0.98 x 0.001 +0.05 x 0.999)] = 0.00098 / 0.00098 + 0.04995 = 0.00098 / 0.05093 = 0.0192. So there is just a 1.92 per cent chance that the beetle is rare when the entomologist spots the distinctive pattern on its back.

Note also that P (HIE) = 0.0192, while P (EIH) = 0.98. The Prosecutor’s Fallacy is to conflate these two expressions.

]]>

So does that mean that the mornings should start to get lighter after today (earlier sunrise), as well as the evenings (later sunset). Not so, and there’s a simple reason for that. The length of a solar day, i.e. the period of time between the solar noon (the time when the sun is at its highest elevation in the sky) on one day and the next, is not 24 hours in December, but about 30 seconds longer than that.

For this reason, the days get progressively about 30 seconds longer throughout December, so that by the end of the month a standard 24-hour clock is lagging roughly 15 minutes behind real solar time.

Let’s say just for a moment that the hours of sunlight (the time difference between sunrise and sunset) stayed constant through December. This means that a 24-hour clock which timed sunset at 3.50pm one day would be 30 seconds slow by 3.50pm the next day. The solar day would be 30 seconds longer than this, so the sun would not set the next day till 3.50pm and 30 seconds. After ten days the sun would not set till 3.55pm according to the 24-hour clock. So the sunset would actually get later through all of December. For the same reason, the sunrise would get later through the whole of December.

In fact, the sunset doesn’t get progressively later through all of December because the hours of sunlight shorten for about the first three weeks. The effect of this is that the sun would set earlier and rise later.

These two things (the shortening hours of sunlight and the extended solar day) work in the opposite direction. The overall effect is that the sun starts to set later from a week or so before the shortest day, but doesn’t start to rise earlier till about a week or so after the shortest day.

So the old adage that that the evenings will start to draw out after the end of the third week of December or so, and the mornings will get lighter, is false. The evenings have already been drawing out for several days before the shortest day, and the mornings will continue to grow darker for several days more.

There’s one other curious thing. The solar noon coincides with noon on our 24-hour clocks just four times a year. One of those days is Christmas Day! So set your clock to noon on December 25^{th}, look up to the sky and you will see the sun at its highest point. Just perfect!

Links

http://www.timeanddate.com/astronomy/uk/nottingham

http://www.bbc.co.uk/news/magazine-30549149

http://www.rmg.co.uk/explore/astronomy-and-time/time-facts/the-equation-of-time

http://en.wikipedia.org/wiki/Solar_time

http://earthsky.org/earth/everything-you-need-to-know-december-solstice

]]>This wasn’t the kind of shock that occurred in 2016, when the EU referendum tipped to Brexit and the US presidential election to Donald Trump. Nor the type that followed the 2015 and 2017 UK general elections, which produced a widely unexpected Conservative majority and a hung parliament respectively.

On those occasions, the polls, pundits and prediction markets got it, for the most part, very wrong, and confidence in political forecasting took a major hit. The shock on this occasion was of a different sort – surprise related to just how right most of the forecasts were.

Take the FiveThirtyEight political forecasting methodology, most closely associated with Nate Silver, famed for the success of his 2008 and 2012 US presidential election forecasts.

In 2016, even that trusted methodology failed to predict Trump’s narrow triumph in some of the key swing states. This was reflected widely across other forecasting methodologies, too, causing a crisis of confidence in political forecasting. And things only got worse when much academic modelling of the 2017 UK general election was even further off targetthan it had been in 2015.

So what happened in the 2018 US midterm elections? This time, the FiveThirtyEight “Lite” forecast, based solely on local and national polls weighted by past performance, predicted that the Democrats would pick up a net 38 seats in the House of Representatives. The “Classic” forecast, which also includes fundraising, past voting and historical trends, predicted that they would pick up a net 39 seats. They needed 23 to take control.

**Read more: Women candidates break records in the 2018 US midterm elections**

With almost all results now declared, it seems that those forecasts are pretty near spot on the projected tally of a net gain of 40 seats by the Democrats. In the Senate, meanwhile, the Republicans were forecast to hold the Senate by 52 seats to 48. The final count is likely to be 53-47. There is also an argument that the small error in the Senate forecast can be accounted for by poor ballot design in Florida, which disadvantaged the Democrat in a very close race.

Some analysts currently advocate looking at the turnout of “early voters”, broken down by party affiliation, who cast their ballot before polling day. They argue this can be used as an alternative or supplementary forecasting methodology. This year, a prominent advocate of this methodology went with the Republican Senate candidate in Arizona, while FiveThirtyEight chose the Democrat. The Democrat won. Despite this, the jury is still out over whether “early vote” analysis can add any value.

There has also been research into the forecasting efficiency of betting/prediction markets compared to polls. This tends to show that the markets have the edge over polls in key respects, although they can themselves be influenced by and overreact to new poll results.

There are a number of theories to explain what went wrong with much of the forecasting prior to the Trump and Brexit votes. But looking at the bigger picture, which stretches back to the US presidential election of 1868 (in which Republican Ulysses S Grant defeated Democrat Horatio Seymour), forecasts based on markets (with one notable exception, in 1948) have proved remarkably accurate, as have other forecasting methodologies. To this extent, the accurate forecasting of the 2018 midterms is a return to the norm.

But what do the results mean for politics in the US more generally? The bottom line is that there was a considerable swing to the Democrats across most of the country, especially among women and in the suburbs, such that the Republican advantage of almost 1% in the House popular vote in 2016 was turned into a Democrat advantage of about 8% this time. If reproduced in a presidential election, it would be enough to provide a handsome victory for the candidate of the Democratic Party.

The size of this swing, and the demographics underpinning it, were identified with a good deal of accuracy by the main forecasting methodologies. This success has clearly restored some confidence in them, and they will now be used to look forward to 2020. Useful current forecasts for the 2020 election include PredictIt, OddsChecker, Betfairand PredictWise.

Taken together, they indicate that the Democratic candidate for the presidency will most likely come from a field including Senators Kamala Harris (the overall favourite), Bernie Sanders, Elizabeth Warren, Amy Klobuchar, Kirsten Gillibrand and Cory Booker. Outside the Senate, the frontrunners are former vice-president, Joe Biden, and the recent (unsuccessful) candidate for the Texas Senate, Beto O’Rourke.

Whoever prevails is most likely to face sitting president, Donald Trump, who is close to even money to face impeachment during his current term of office. If Trump isn’t the Republican nominee, the vice-president, Mike Pence, and former UN ambassador Nikki Haley are attracting the most support in the markets. The Democrats are currently about 57% to 43% favourites over the Republicans to win the presidency.

With the midterms over, our faith in political forecasting, at least in the US, has been somewhat restored. The focus now turns to 2020 – and whether they’ll accurately predict the next leader of the free world, or be left floundering by the unpredictable forces of a new world politics.

]]>Common sense would seem to indicate that it is either alive or dead, but we don’t know until we open the box. Traditional quantum theory suggests otherwise. The cat is both alive, with a certain probability, and dead, with a certain probability, until we open the box and find out, when it has to become one or the other with a probability of 100 per cent. In quantum terminology, the cat is in a superposition (two states at the same time) of being alive and dead, which only collapses into one state (dead or alive) when the cat is observed. This might seem absurd when applied to a cat. After all surely it was either alive or dead before we opened the box and found out. It was simply that we didn’t know which. That may be true, when applied to cats. But when applied to the microscopic quantum world, such common sense goes out the window as a description of reality. For example, photons (the smallest measure of light) can exist simultaneously in both wave and particle states, and travel in both clockwise and anti-clockwise directions at the same time. Each state exists in the same moment. As soon as the photon is observed, however, it must settle on one unique state. In other words, the common sense that we can apply to cats we cannot apply to photons or other particles at the quantum level.

So what is going on? The traditional explanation as to why the same quantum particle can exist in different states simultaneously is known as the Copenhagen Interpretation. First proposed by Niels Bohr in the early twentieth century, the Copenhagen interpretation states that a quantum particle does not exist in any one state but in all possible states at the same time, with various probabilities. It is only when we observe it that it must in effect choose which of these states it exists as. At the sub-atomic level, then, particles seem to exist in a state of what is called ‘coherent superposition’, in which they can be two things at the same time, and only become one when they are forced to do so by the act of being observed. The total of all possible states is known as the ‘wave function.’ When the quantum particle is observed, the superposition ‘collapses’ and the object is forced into one of the states that make up its wave function.

The problem with this explanation is that all these different states exist. By observing the object, it might be that it reduces down to one of these states, but what has happened to the others? Where have they disappeared to?

This question lies at the heart of the so-called ‘Quantum Suicide’ thought experiment.

It goes like this. A man (not a cat) sits down in front of a gun which is linked to a machine that measures the spin of a quantum particle (a quark). If it is measured as spinning clockwise, the gun will fire and kill the man. If it is measured as spinning anti-clockwise, it will not fire and the man will survive to undergo the same experiment again.

The question is – will the man survive, and how long will he survive for? This thought experiment, proposed by Max Tegmark, has been answered in different ways by quantum theorists depending on whether or not they adhere to the Copenhagen Interpretation. In that interpretation, the gun will go off with a certain probability, depending on which way the quark is spinning. Eventually, by the laws of chance, the man will be killed, probably sooner rather than later. A growing number of theorists believe something else, however. They see both states (the particle is spinning clockwise and spinning anti-clockwise) as equally real, so there are two real outcomes. In one world, the man dies and in the other he lives. The experiment repeats, and the same split occurs. In one world there will exist a man who survives an indefinite number of rounds. In the other worlds, he is dead.

The difference between these alternative approaches is critical. The Copenhagen approach is to propose that the simultaneously existing states (for example, the quark that is spinning both clockwise and anti-clockwise simultaneously) exist in one world, and collapse into one of these states when observed. Meanwhile, the other states mysteriously disappear. The other approach is to posit that these simultaneously existing states are real states, and neither magically disappears, but branch off into different realities when observed. What is happening is that in one world, the particle is observed spinning clockwise (in the Quantum Suicide thought experiment, the man dies) and in the other world the particle is observed spinning the other way (and the man lives). Crucially, according to this interpretation both worlds are real. In other words, they are not notional states of one world but alternative realities. This is the so-called ‘Many Worlds Theory.’

Where is the burden of proof in trying to determine which interpretation of reality is correct? This depends on whether we take the one world that we can observe as the default position or the wave function of all possible states as represented in the mathematics of the wave function as the reality. Adherents to the Many Worlds position argue that the default is to go with what is described in the mathematics underpinning quantum theory – that the wave function represents all of reality. According to this argument, the minimal mathematical structure needed to make sense of quantum mechanics is the existence of many worlds which branch off, each of which contains an alternative reality. Moreover, these worlds are real. To say that our world, the one that we are observing, is the only real one, despite all the other possible worlds or measurement outcomes, has been likened to when we believed that the Earth was at the centre of the universe. There is no real justification, according to this interpretation, for saying that our branch of all possible states is the only real one, and that all other branches are non-existent or are ‘disappeared worlds.’ Put another way, the mathematics of quantum mechanics describes these different worlds. Nothing in the maths says that this world that we observe is more real than another world. So the burden of proof is on those who say it is. The viewpoint of the Copenhagen school is diametrically opposite. They argue that the hard evidence is of the world we are in, and the burden of proof is on those positing other worlds containing other branches of reality.

Depending on which default position we choose to adopt will determine whether we are adherents of the Copenhagen or the ‘Many Worlds’ schools.

For me personally, the logic of the argument points to the Many Worlds school. But to believe that they are right, and the Copenhagen school is wrong, seems kind of crazy, and totally counter-intuitive. In another world, of course, I’m probably saying the exact opposite.

]]>The argument around simulation goes like this. One of the following three statements must be correct.

a. That civilisations at our level of development always or almost always disappear before becoming technologically advanced enough to create these simulations.

b. That the proportion of these technologically advanced civilisations that wish to create these simulations is zero or almost zero.

c. That we are almost sure to be living in such a simulation.

To see this, let’s examine each proposition in turn.

a. Suppose that the first is not true. In that case, a significant proportion of civilisations at our stage of technology go on to become technologically advanced enough to create these simulations.

b. Suppose that the second is not true. In this case, a significant proportion of these civilisations run such simulations.

c. If both of the above propositions are not true, then there will be countless simulated minds indistinguishable to all intents and purposes from ours, as there is potentially no limit to the number of simulations these civilisations could create. The number of such simulated minds would almost certainly be overwhelmingly greater than the number of minds that created them. Consequently, we would be quite safe in assuming that we are almost certainly inside a simulation created by some form of advanced civilisation.

For the first proposition to be untrue, civilisations must be able to go through the phase of being able to wipe themselves out, either deliberately or by accident, carelessness or neglect, and never or almost never do so. This might perhaps seem unlikely based on our experience of this world, but becomes more likely if we consider all other possible worlds.

For the second proposition to be untrue, we would have to assume that virtually all civilisations that were able to create these simulations would decide not to do so. This again is possible, but would seem unlikely.

If we consider both propositions, and we think it is unlikely that no civilisations survive long enough to achieve what Bostrom calls ‘technological maturity’, and that it is unlikely that hardly any would create ‘ancestor simulations’ if they could, then anyone considering the question is left with a stark conclusion. They really are living in a simulation.

To summarise. An advanced ‘technologically mature’ civilisation would have the capability of creating simulated minds. Based on this, at least one of three propositions must be true.

a. The proportion of these advanced civilisations is close to zero or zero.

b. The proportion of these advanced civilisations that wish to run these simulations is close to zero.

c. The proportion of those consciously considering the question who are living in a simulation is close to one.

If the first of these propositions is true, we will almost certainly not survive to become ‘technologically mature.’ If the second proposition is true, virtually no advanced civilisations are interested in using their power to create such simulations. If the third proposition is true, then conscious beings considering the question are almost certainly living in a simulation.

Through the veil of our ignorance, it might seem sensible to assign equal credence to all three, and to conclude that unless we are currently living in a simulation, descendants of this civilisation will almost certainly never be in a position to run these simulations.

Strangely indeed, the probability that we are living in a simulation increases as we draw closer to the point at which we are able and willing to do so. At the point that we would be ready to create our own simulations, we would paradoxically be at the very point when we were almost sure that we ourselves were simulations. Only by refraining to do so could we in a certain sense make it less likely that we were simulated, as it would show that at least one civilisation that was able to create simulations refrained from doing so. Once we took the plunge, we would know that we were almost certainly only doing so as simulated beings. And yet there must have been someone or something that created the first simulation. Could that be us, we would be asking ourselves? In our simulated hearts and minds, we would already know the answer!

Now the odds of the game ending on the first toss is ½; of it ending on the second toss is (1/2)^2 = ¼; on the third, (1/2)^3 = 1/8, etc., so your expected win from playing the game = (1/2 x £1) + (1/2 x £2) + (1/4 x £4) + (1/8 x £8) + …, i.e. £0.5 + £1 + £1 + £1 … = infinity. It follows that you should be willing to pay any finite amount for the privilege of playing this game. Yet it seems irrational to pay very much at all.

According to this reasoning, any finite stake is justified because the eventual payout increases infinitely through time, so you must end up with a profit whenever the game ends. Yet most people are only willing to pay a few pounds, or at least not much more than this. So is this yet further evidence of our intuition letting us down?

That depends on why most people are not willing to pay much. There have been very many explanations proposed over the years, some more satisfying than others, but none has been universally accepted as getting near to a convincing explanation.

The best attempt, and one which I find the most convincing, is to address the issue of infinity. It is true, of course, that you will, if you play an infinite number of rounds of the game, win an infinite amount. But what happens in the real finite world? And here is the problem. Because playing to infinity pays an infinite amount, this does not mean that the game in finite time never stops paying out money. The key question in finite time is WHEN does the game turn profitable? The answer depends on the size of the stake per round. If this stake is £2, and you repeat the game over and over again, you are likely to make a lot of money very quickly. As the stake size increases, the number of rounds it takes to turn a profit becomes increasingly longer. Take the example of a stake of £4. In this case, you only make a profit if you throw three heads in a row, which is a 1 in 8 chance. You now need to factor in the losses you made in rounds where you didn’t throw three heads in a row. This extends the number of rounds it will take to turn a profit. So the game is not profitable at any stake size unless we are willing and able to play an infinite number of rounds. It is, in theoretical terms, profitable at any stake size, however large, but it will take forever to guarantee a profit. In a world of finite rounds and time scales, however, winnings generated by the game are easily countervailed by some specified level of stake size.

So what is the optimal stake size for playing the St. Petersburg game?

This depends on how many rounds you are willing to play and how likely you wish to be to come out ahead in that timescale.

This has been modelled empirically, using a computer program to calculate the outcome at different staking levels. What does it show? Well, if you stake a pound a round, you have a better than even chance of being in profit after just three rounds. If you pay £2 a round, the even-money chance of coming out ahead takes rather more rounds – about seven. At £3 a round we are looking at more than 20 rounds, at £4 approaching 100 rounds and £5 more than 300 rounds. By the time we are staking £10 a go, more than 350,000 rounds are needed to give you more than an even chance of being ahead of the game. An approximation that generates the 50-50 point to any staking level is 4 to the power of the stake, divided by 2.9. So what’s a reasonable spend per round to play the game? That depends on the person and the exact configuration of the game. Either way, it’s not that high.

Perhaps the median (the mean of the two middle values of the series), rather than the mean offers a pretty good approximation to the way most people think about this.

Let’s say that in the game as proposed, the game is run 1000 times. In this case, 500 of the values result in tails on the first toss with a return of £1. The next 25% of values result in tails on the second toss with a return of 2. The rest of the values are not then relevant. The 500^{th} value is 1 and the 501^{st} value is 2. The median is the mean of £1 and £2, i.e. £1.50.

Whichever of the two ways proposed here we look at it, the solution is much closer to most people’s intuitive answer than it is to the answer implied by the classic formulation of the St. Petersburg problem.

Reading

Koelman, J. Statistical Physics Attacks St. Petersburg: Paradox Resolved.

Fine, T.A. The Saint Petersburg Paradox is a Lie.

https://medium.com/@thomasafine/the-saint-petersburg-paradox-is-a-lie-62ed49aeca0b

Hayden, B.Y. and Platt, M.L. (2009), The mean, the median, and the St. Petersburg Paradox. Judgment and Decision Making, 4 (4), June, 256-272.

]]>Let us base a thought experiment based around Portia’s quest for love in which she meets the successive suitors in turn. Her problem is when to stop looking and start choosing. To make the problem more general, let’s say she has 100 suitors to choose from. Each will be presented to her in random order and she has twenty minutes to decide whether he is the one for her. If she turns someone down there is no going back, but the good news is that she is guaranteed not to be turned down by anyone she selects. If she comes to the end of the line and has still not chosen a partner, she will have to take whomever is left, even if he is the worst of the hundred. All she has to go on in guiding her decision are the relative merits of the pool of suitors.

Let’s say that the first presented to her, whom we shall call No.1, is perfectly charming but she has some doubts. Should she choose him anyway, in case those to follow will be worse? With 99 potential matches left, it seems more than possible that there will be at least one who is a better match than No.1. The problem facing Portia is that she knows that if she dismisses No. 1, he will be gone forever, to be betrothed to someone else.

She decides to move on. The second suitor turns out to be far worse than the first, as does the third and fourth. She starts to think that she may have made a mistake in not accepting the first. Still, there are potentially 96 more to see. This goes on until she sees No. 20, whom she actually prefers to No. 1. Should she now grasp her opportunity before it is too late? Or should she wait for someone even better?

She is looking for the best of the hundred, and this is the best so far. But there are still 80 suitors left, one of whom might be better than No. 20. Should she take a chance? What is Portia’s optimal strategy in finding Mr. Right?

This is an example of an ‘Optimal Stopping Problem’, which has come to be known as the ‘Secretary Problem.’ In this variation, you are interviewing for a secretary, with your aim being to maximise your chance of hiring the single best applicant out of the pool of applicants. Your only criterion to measure suitability is their relative merits, i.e. who is better than whom. As with Portia’s Problem, you can offer the post to any of the applicants at any time before seeing any more candidates, but you lose the opportunity to hire that applicant if you decide to move on to the next in line.

This sort of stopping strategy can be extended to anything including the search for a place to live, a place to eat, the choice of a used car, and so on.

In each of these cases, there are two ways you can fail to meet your goal of finding the best option out there. The first is by stopping too early, and the second is by stopping too late. By stopping too early, you leave the best option out there. By stopping too late, you have waited for a better option that turns out not to exist. So how do you find the right balance?

Let’s consider the intuition. Obviously, the first option is the best yet, and the second option (assuming we are taking the options in a random order) has a 50% chance of being the best yet. Likewise, the tenth option has a 10% chance of being the best to that point. It follows logically that the chance of any given option being the best to that point declines as the number of options there have been before increases. So the chance of coming across the ‘best yet’ becomes more and more infrequent as we go through the process.

To see how we might best approach the problem, let’s go back to Portia and her suitors and look at her best strategy when faced with different-sized pools of suitors. Can she do better using some strategy other than choosing at some random position in the order of presentation to her? It can be shown mathematically that she can certainly expect to do better, given that there are more than two to choose from.

Let’s return to the original play where there are three suitors. If she chooses No. 1, she has no information with which to compare the relative merits of her suitors. On the other hand, by the time she reaches No. 3, she must choose him, even if he’s the worst of the three. In this way, she has maximum information but no choice. In the case of No. 2, she has more information than she did when she saw No. 1, as she can compare the two. She also has more control over her choice than she will if she leaves it until she meets No. 3.

So she turns down No. 1 to give herself more information about the relative merits of those available. But what if she finds that No. 2 is worse than No. 1? What should she do? It can in fact be shown that she should wait and take the risk of ending up with No. 3, as she must do if she leaves it to the last. On the other hand, if she finds that she prefers No. 2 to No. 1, she should chose him on the spot and forego the chance that No. 3 will be a better match.

It can also be shown that in the three-suitor scenario, she will succeed in finding her best available match exactly half the time by selecting No. 2 if he is better than No. 1. If she chooses No. 1 or No. 3, on the other hand, she will only have met that aim one time in three.

If there are four suitors, Portia should use No. 1 to gain information on what she should be measuring her standards against, and select No. 2 if he is a better choice than No. 1. If he is not, do the same with No. 3. If he is still not better than No. 1, go to No. 4 and hope for the best. The same strategy can be applied to any number of people in the pool.

So, in the case of a hundred suitors, how many should she see to gain information before deciding to choose someone? It can, in fact, be demonstrated mathematically that her optimal strategy (‘stopping strategy’) before turning looking into leaping is 37. She should meet with 37 of the suitors, then choose the first of those to come after who is better than the best of the first 37. By following this rule, she will find the best of the princely bunch of a hundred with a probability, strangely enough, of 37 per cent. By choosing randomly, on the other hand, she has a chance of 1 in 100 (1%) of settling upon the best.

This stopping rule of 37% applies to any similar decision, such as the secretary problem or looking for a house in a fast-moving market. It doesn’t matter how many options are on the table. You should always use the first 37% as your baseline, and then select the first of those coming after that is better than any of the first 37 per cent.

The mathematical proof is based on the mathematical constant, e (sometimes known as Euler’s number) and specifically 1/e, which can be shown to be the stopping point along a range from 0 to 1, after which it is optimal to choose the first option that is better than any of those before. The value of e is approximately equal to 2.71828, so 1/e is about 0.36788 or 36.788%. This has simply been rounded up to 37 per cent in explaining the stopping rule. It can also be shown that the chance that implementing this stopping rule will yield the very best outcome is also equal to 1/e, i.e. about 37 per cent.

If there is a chance that your selection might actually become unavailable, the rule can be adapted to give a different stopping rule, but the principle remains. For example, if there is a 50% chance that your selection might turn out to be unavailable, than the 37% rule is converted into a 25% rule. The rest of the strategy remains the same. By doing this, you will have a 25% chance of finding the best of the options, however, compared to a 37% chance if you always get to make the final choice. This is still a lot better than the 1 per cent chance of selecting the best out of a hundred options if you choose randomly. The lower percentage here (25% compared to 37%) reflects the additional variable (your choice might not be final) which adds uncertainty into the mix. There are other variations on the same theme, where it is possible to go back with a given probability that the option you initially passed over is no longer available. Take the case, for example, where an immediate proposal will certainly be accepted but a belated proposal is accepted half of the time. The cut-off proportion in one such scenario rises to 61% as the possibility of going back becomes real.

There is also a rule-of-thumb which can be derived when the aim is to maximise the chance of selecting a good option, if not the very best. This strategy has the advantage of reducing the chance of ending up with one of the worst options. It is the square root rule, which simply replaces the 37% criterion with the square root of the number of options available. In the case of Portia’s choice, she would meet the first ten of the hundred (instead of 37) and choose the first of the remaining 90 who is better than the best of those ten. Whatever variation you adopt, the numbers will change but the principle stays the same.

All this assumes that we are lacking in some objective standard about which we can measure each of our options objectively, without needing to compare which option is better than which. For example, Portia might simply be interested in choosing the richest of the suitors and she knows the distribution of wealth of all potential suitors. This ranges evenly from the bankrupt suitors to those worth 100,000 ducats.

This means that the upper percentile of potential suitors in the whole population are worth upwards of 99,000 ducats. The lowest percentile is worth up to 1,000 ducats. The 50^{th} percentile is worth between 49,000 and 50,000 ducats.

Now Portia is presented with a hundred out of this population of potential suitors, and let’s assume that the suitors presented to her are representative of this population. Say now that the first to be presented to her is worth 99,500 ducats. Since wealth is her only criterion, and he is in the upper percentile in terms of wealth, her optimal decision is to accept his proposal of marriage. It is possible that one of the next 99 is worth more than 99,500 ducats but that isn’t the way to bet.

On the other hand, say that the first suitor is worth 60,000 ducats. Since there are 99 more to come, it is a good bet that at least one of them will be worth more than this. If she has turned down all suitors, however, until she is being presented with the 99^{th}, her optimal decision now is to accept him. In other words, Portia’s decision as to whether to accept the proposal comes down to how many potential matches she has left to see. When down to the last two, she should choose him if he is above the 50^{th} percentile, in this case 50,000 ducats. The more there are to come the higher the percentile of wealth at which she should accept. She can set a higher threshold. She should never accept anyone who is below the average unless she is out of choices. In this version of the stopping problem, the probability that Portia will end up with the wealthiest of the available suitors turns out to be 58 per cent. More information, of course, increases the chance of success. Indeed, any criterion that provides information on where an option is relative to the relevant population as a whole will increase the probability of finding the best choice of those available. As such, it seems that if Portia is only interested in the money, she is more likely to find it than if she is looking for love.

And who did fair Portia choose in the original play? Well, there are no spoilers here. But I can reveal that it was the best of the three.

]]>