Skip to content

The Kelly Criterion – in a nutshell.

How much should we bet when we believe the odds are in our favour. The answer to this question was first formalised in 1956, by daredevil pilot, recreational gunslinger and physicist John L. Kelly, Jr. at Bell Labs. The so-called Kelly Criterion is a formula employed to determine the optimal size of a series of bets when we have the advantage, in other words when the odds favour us. It takes account of the size of our edge over the market as well as the adverse impact of volatility. In other words, even when we have the edge, we can still go bankrupt along the way if we stake too much on any individual wager or series of wagers.

Essentially, the Kelly strategy is to wager a proportion of our capital which is equivalent to our advantage at the available odds. So if we are being offered even money, and we back heads, and we are certain that the coin will come down heads, we have a 100% advantage. So the recommended wager is the total of our capital. If there is a 60% chance of heads, and a 40% chance of tails, our advantage is now 20%, and we are advised to stake accordingly. This is a simplified representation of the literature on Kelly, Half-Kelly, and other derivatives of same, but the bottom line is clear. It is just as important to know how much to stake as it is to gauge when we have the advantage. But it’s not easy unless we can accurately identify that advantage.

Put more technically, the Kelly criterion is the fraction of capital to wager to maximise compounded growth of capital. The problem it seeks to address is that even when there is an edge, beyond some threshold larger bets will result in lower compounded return because of the adverse impact of volatility. The Kelly criterion defines the threshold, and indicates the fraction that should be wagered to maximise compounded return over the long run (F), which is given by:

F = Pw – (Pl/W)

where

F = Kelly criterion fraction of capital to bet

W = Amount won per amount wagered (i.e. win size divided by lose size)

Pw = Probability of winning

Pl = Probability of losing

When win size and loss size are equal, W = 1, and the formula reduces to:

F = Pw – Pl

For example, if a trader loses £1,000 on losing trades and gains £1,000 on winning trades, and 60 per cent of all trades are winning trades, the Kelly criterion indicates an optimal  trade size equal to 20 per cent (0.60-0.40 = 0.20). As another example, if a trader wins £2,000 on winning trades and loses £1,000 on losing trades, and the probability of winning and losing are both equal to 50 per cent, the Kelly criterion indicates an optimal trade size equal to 25 per cent of capital: 0.50- (0.50/2) = 0.25.

In other words, Kelly argues that, in the long run, we should wager a percentage of our bankroll equal to the expected profit divided by than the amount we would receive if we win.

Proportional over-betting is more harmful than under-betting. For example, betting half the Kelly criterion will reduce compounded return by 25 per cent, while betting double the Kelly criterion will eliminate 100 per cent of the gain. Betting more than double the Kelly criterion will result in an expected negative compounded return, regardless of the edge on any individual bet. The Kelly criterion implicitly assumes that there is no minimum bet size. This assumption prevents the possibility of total loss. If there is a minimum trade size, as is the case in most practical investment and trading situations, then ruin is possible if the amount falls below the minimum possible bet size.

So should we bet the full amount recommended by the Kelly criterion? In fact, betting the full amount recommended by the Kelly formula may be unwise for a number of reasons. Notably, accurate estimation of the advantage of the bets is critical; if we overestimate the advantage by more than a factor of two, Kelly betting will cause a negative rate of capital growth, and this is easily done. So, full Kelly betting may be a rough ride, and a fractional Kelly betting strategy might be substituted, i.e. a strategy wherein we bets some fraction of the recommended Kelly bet, such as a half or a third.

Ironically, John Kelly himself died in 1965, never having used his own criterion to make money.

So that’s the Kelly criterion. In a nutshell, the advice is only to bet when you believe you have the edge, and to do so using a stake size related to the size of the edge. Mathematically, it means betting a fraction of your capital equal to the size of your advantage. So, if you have a 20% edge at the odds, bet 20% of your capital. In the real world, however, we need to allow for errors that can creep in, like uncertainty as to the true edge, if any, that we have at the odds. So, unless we’re happy to risk a very bumpy ride, and we have total confidence in our judgment, a preferred strategy may to be stake a defined fraction of that amount, known as a fractional Kelly strategy.

Exercise

If a trader is offered even money on a heads/tails bet, and knows that the chance of heads is 70%, the Kelly criterion indicates an optimal trade size equal to x per cent of capital. Calculate x.

 

References and Links

The Kelly Criterion. LessWrong. 15 October, 2018. https://www.lesswrong.com/posts/BZ6XaCwN4QGgH9CxF/the-kelly-criterion

Kelly Criterion. Wikipedia. https://en.wikipedia.org/wiki/Kelly_criterion

Half-Kelly. CapitalIdeasOnline.com https://www.capitalideasonline.com/wordpress/half-kelly/

The Pascal-Fermat ‘Problem of Points’ – in a nutshell.

What is the fair division of stakes in a game which is interrupted before its conclusion? This was the problem posed in 1654 by French gambler Chevalier de Mere to philosopher and mathematician, Blaise Pascal, of Pascal’s Triangle and Pascal’s Wager fame, who shared it in now-famous correspondence with mathematician Pierre de Fermat (best known these days perhaps for Fermat’s Last Theorem). It has come to be known as the Problem of Points.

The question had first been formally addressed by Franciscan Friar and Leonardo Da Vinci collaborator, Luca Bartolomeo de Pacioli, father of the double-entry system of bookkeeping. Pacioli’s method was to divide the stakes in proportion to the number of rounds won by each player to that point. There is an obvious problem with this method, however. What happens, for example, if only a single round of many has been played. Should the entire pot be allocated to the winner of that single round? In the mid-1500s, Venetian mathematician and founder of the theory of ballistics, Niccolo Fontana Tartaglia, proposed basing the division on the ratio between the size of the lead and the length of the game. But this method is not without its own problems. For example, it would split the stakes in the same proportion whether one player was ahead by 40-30 or by 99-89, although the latter situation is hugely more advantageous than the former.

The solution adopted by Pascal and Fermat defied prevailing intuition by basing the division of the stakes not on the history of the interrupted game to that point as on the possible ways the game might have continued were it not interrupted. In this method, a player leading by 6-4 in a game to 10 would have the same chance of winning as a player leading by 16-14 in a game to 20, so that an interruption at either point should lead to the same division of stakes. As such, what is important in the Pascal-Fermat solution is not the number of rounds each player has yet won but the number of rounds each player still needs to win.

Take another example. Suppose that two players agree to play a game of coin-tossing repeatedly to won £32, and the winner is the first player to win four times.

If the game is interrupted when one of the players is ahead by two games to one, how should the  £32 be divided fairly between the players?

In Fermat’s method, imagine playing another four games. Outcomes of each coin-tossing game are equally likely and are P (won by Player 1) and Q (won by Player 2).

The possible outcomes of the next four games are as follows:

PPPP; PPPQ; PPQP; PPQQ

PQPP; PQPQ; PQQP; PQQQ

PQQQ; PQQP; PQPQ; PQPP

PPQQ; PPQP; PQQP; QQQQ

The probability that Player 1 would have won is 11/16 (in bold) = 68.75%.

The probability that Player 2 would have won is 5/16 = 31.25%.

The method can be generalised to any game of chance which ends before the game is complete.

 

Appendix

Pascal proposed an alternative method which dispenses with the need to consider possible steps after the game had already been won. In doing so, he was able to devise a relatively simple formula which would solve all possible Problems of Points, without needing to go beyond the point at which the game resolves in favour of one or other of the players, based on Pascal’s Triangle, demonstrated below.

1                                     = 1

1       1                                 = 2

1    2      1                            = 4

1   3    3     1                        = 8

1   4   6   4    1                   = 16

1  5  10  10  5   1                = 32

1 6 15  20 15  6   1            = 64

1 7 21 35 35 21 7 1           = 128

Each of the numbers in Pascal’s Triangle is the sum of the adjacent numbers immediately above it.

If the game is interrupted, as above, 2-1, after three games, in a first to four match, the resolution is 1+4+6 / 16 to 4+1 / 16, i.e. 11/16 to Player 1 and 5/16 to Player 2.

More generally, Pascal’s method establishes the modern method of expected value when reasoning about probability. To show this, consider the probability that Player 1 would win if leading 3-2 in a game in which the first player to win four games is the outright winner. If Player 1 wins the next coin toss, he goes ahead 4-2 and wins outright (value = 1). There is a 50% chance of this. There is a 50% chance of player 2 winning the coin toss, however, in which case the game is level (3-3). If the game is level, there is a 50% chance of player 1 winning (and a 50% chance of player 2 winning).

So the expected chance of player 1 winning when leading 3-2 = 50% x 1 + 50% x 50% = 50% + 25% = 75%. Expected chance of player 2 winning = 25%.

Now consider the probability that Player 1 would win if leading 3-1 in a game in which the first player to win four games is the outright winner. If Player 1 wins the next coin toss, he goes ahead 4-1 and wins outright (value = 1). There is a 50% chance of this. There is a 50% chance of player 2 winning the coin toss, however, in which case the game goes to 3-2. We know that the expected chance of player 1 winning if ahead 3-2 is 75% (derived above).

So the expected chance of player 1 winning when leading 3-1 = 50% x 1 + 50% x 75% = 50% + 25% = 87.5%. Expected chance of player 2 winning = 12.5%.

Now consider the question that we solved using Fermat’s method, i.e. the probability that Player 1 would win if leading 2-1 in a game in which the first player to win four games is the outright winner. If Player 1 wins the next coin toss, he goes ahead 3-1, and has an expected chance of winning of 87.5% and wins outright (value = 1). There is a 50% chance of this. There is a 50% chance of player 2 winning the coin toss, however, in which case the game goes to 2-2. We know that the expected chance of player 1 winning if tied is 50%. There is a 50% chance of this.

So the expected chance of player 1 winning when leading 2-1 = 50% x 87.5% + 50% x 50% = 43.75% + 25% = 68.75% (i.e. 11/16). Expected chance of player 2 winning = 31.25% (i.e. 5/16).

So both Fermat’s method and Pascal’s method yield the same solution, by different routes, and will always do so in determining the correct division of stakes in an interrupted game of this nature.

Exercise

Suppose that two players agree to play a game of coin-tossing repeatedly to won £32, and the winner is the first player to win four times.

If the game is interrupted when one of the players is ahead by two games to zero, determine how the  £32 should be divided fairly between the players.

 

References and Links

Pascal – 17th Century Mathematics. https://www.storyofmathematics.com/17th_pascal.html

Problem of Points. Wikipedia. https://en.wikipedia.org/wiki/Problem_of_points

The Problem of Points. http://www.wwu.edu/teachingmathhistory/docs/psfile/probpoints1-student.pdf

What happened when the Butch challenged the Brain to a game of dice?

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

This is a true story about New York gambling-house operator, The Butch, who made his fortune booking dice games. In 1952 he was famously challenged by a bigtime gambler known as The Brain to a simple wager. The bet was an even-money proposition that the Butch could throw a double-six in 21 rolls of two dice. We can assume symmetry – the dice were not loaded or biased in any way. All faces were equally likely to come up. So the probability of any number appearing on a given roll of either one of the dice is 1/6.

On the face of it, the edge seems to be with Butch. After all, there are 36 possible combinations that could come up when throwing two dice, from 1-1, 1-2, 1-3, to 6-4, 6-5, 6-6. Intuition might suggest, therefore, that 18 throws should give you a 50-50 chance of throwing any one of these combinations, including a double-six. In 21 throws, the chance of a double-six should, therefore, be more than 50-50. On this basis, the Butch accepted the even money bet at $1,000 a roll. After twelve hours of rolling, the Brain was $49,000 up, at which point the Butch called it a day, sensing that something was wrong with his strategy.

The Brain had in fact profited from a classic probability puzzle known as the Chevalier’s Dice problem, which can be traced to the 17th French gambler and bon vivant, Antoine Gombaud, better known as the Chevalier de Méré. The Chevalier would agree even money odds that in four rolls of a single die he would get at least one six. His logic seemed impeccable. The Chevalier reasoned that since the chance that a 6 will come up in any one roll of the die is 1 in 6, then the chance of getting a 6 in four rolls is 4/6, or 2/3, which is a good bet at even money. If the probability was a half, he would break even at even money. For example, in 300 games, at 1 French franc a game, he would stake 300 francs and expect to win 150 times, returning him 150 francs for each win with his stake returned on each occasion (total of 300 francs). With a probability of 2/3, he would expect to win 200 times, yielding a good profit.

In fact, it is straightforward to show that this reasoning is faulty, for if it were correct, then we would calculate the chance of a 6 in five rolls of the die as 5/6, and therefore the chance of a 6 in six rolls of the die would be 6/6 = 100%, and in 7 rolls, 7/6!!! Something is therefore clearly wrong here.

Still, even though his reasoning was faulty, he continued to make a profit by playing the game at even money. To see why, we need to calculate the true probability of getting a 6 in four rolls of the die. The key idea here is that the number that comes up on each roll is independent of any other rolls, i.e. dice have no memory. Since each event is independent, we can (according to the laws of probability) multiply the probabilities.

So the probability of a 6 followed by a 6, followed by a 6, followed by a 6, is: 1/6 x 1/6 x 1/6 x 1/6 = 1/1296.

So what is the chance of getting at least one six in four rolls of the die?

Since the probability of getting a 6 in any one roll of the die = 1/6, the probability of NOT getting a 6 in any one roll of the die = 5/6.

So the chance of NOT getting a 6 in four rolls of the die is:

5/6 x 5/6 x 5/6 x 5/6 = 625/1296

So the chance of getting at least one 6 is 1 minus this, i.e. 1 – (625/1296) = 671/1296 = 0.5177, which > 0.5.

So, the odds are still in favour of the Chevalier, since he is agreeing even money odds on an event with a probability of 51.77%.

This was all very well as long as it lasted, but eventually the Chevalier decided to branch out and invent a new, slightly modified game. In the new game, he asked for even money odds that a pair of dice, when rolled 24 times, will come up with a double-6 at least once. His reasoning was the same as before, and quite similar to the reasoning employed by the Butch. If the chance of a 6 on one roll of the die is 1/6, then the chance of a double-6 when two dice are thrown = 1/6 x 1/6 (as they are independent events) = 1/36.

So, reasoned the Chevalier, the chance of at least one double-6 in 24 throws is: 24/36 = 2/3.

So this is very profitable game for the Chevalier. Or is it? No it isn’t, and this time Monsieur Gombaud paid for his faulty reasoning. He started losing. In desperation, he consulted the mathematician and philosopher, Blaise Pascal. Pascal derived the correct probabilities as follows:

The probability of a double-6 in one throw of a pair of dice = 1/6 x 1/6 = 1/36.

So the probability of NO double-6 in one throw of a pair of dice = 35/36.

So, the probability of no double-6 in 24 throws of a pair of dice = 35/36 x 35/36 …  24 times = 35/36 to the power of 24, i.e. (35/36)24  = 0.5086.

So probability of at least one double-6 is 1 minus this, i.e. 1 – 0.5086 = 0.4914, i.e. less than 0.5. Under the terms of the new game, the Chevalier was betting at even money on a game which he lost more often than he won. It was an error that the Butch was to repeat almost 300 years later!

What if the Chevalier had changed the game to give himself 25 throws?

Now, the probability of throwing at least one double-6 in 25 throws of a pair of dice is:

1 – (35/36)25 = 0.5055.

These odds, at even money, are in favour of the Chevalier, but this probability is still lower than the probability of obtaining one ‘6’ in four throws of a single die.

In the single-die game, the Chevalier has a house edge of 51.77% – 48.23% = 3.54%.

In the ‘pair of dice’ game (24 throws), the Chevalier’s edge =

49.14% – 50.81% = -1.72%

In the ‘pair of dice’ game (25 throws), the Chevalier’s edge =

50.55% – 49.45% = 1.1%

A better game for the Chevalier would have been to offer even money that he could get at least one run of ten heads in a row in 1024 tosses of a coin. The derivation of this probability is similar in method to the dice problem.

First, we need to determine the probability of 10 heads in 10 tosses of a fair coin.

The odds are: ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½

Odds = (1/2)10 = 1/1024, i.e. 1023/1.

Based on this, what is the probability of at least one run of 10 heads in 1024 tosses of the coin? Is it 0.5? No, because although you can expect ONE run of 10 heads on average, you could obtain zero, 2, 3, 4, etc.

So what is the probability of NO RUN of 10 heads in 1024 tosses of the coin?

This is: (1-1/1024)1024

The probability of NO RUNS OF TEN HEADS = (1023/1024)1024 = 37%

So probability of AT LEAST one run of 10 heads = 63%.

Now assume you have tossed the coin already 234 times out of 1024, without a run of 10 heads, what is your chance now of getting 10 heads?

Probability of NO RUNS OF TEN HEADS in remaining 790 tosses = (1023/1024)790 = 46%

So probability of at least one success = 54%.

The Chevalier could have played either of these games and expected to come out ahead. But the game would have taken a long time. He preferred the shorter game, which produced the longer loss.

Until he was put right by Monsieur Pascal.

Most importantly, though, the Chevalier’s question led to a correspondence, most of which has survived, which led to the foundations of modern probability theory.

Out of this correspondence emerged quite a few jewels, one of which has become known as the ‘Gambler’s Ruin’ problem.

This is an idea set in the form of a problem by Pascal for Fermat, subsequently published by Christiaan Huygens (‘On reasoning in games of chance’, 1657) and formally solved by Jacobus Bernoulli (‘Ars Conjectandi’, 1713).

One way of stating the problem is as follows. If you play any gambling game long enough, will you eventually go bankrupt, even if the odds are in your favour, if your opponent has unlimited funds?

For example say that you and your opponent toss a coin, where the loser pays the winner £1. The game continues until either you or your opponent has all the money. Suppose you have £10 to start and your opponent has £20. What are the probabilities that a) you and b) your opponent, will end up with all the money?

The answer is that the player who starts with more money has more chance of ending up with all of it. The formula is:

P1 = n1 / (n1 + n2)

P2 = n2   / (n1 + n2)

Where n1 is the amount of money that player 1 starts with, and n2 is the amount of money that player 2 starts with, and P1 and P2 are the probabilities that player 1 or player 2, your opponent, wins.

In this case, you start with £10 of the £30 total, and so have a 10/ (10+20) = 10/30 = 1/3 chance of winning the £30; your opponent has a 2/3 chance of winning the £30. But even if you do win this game, and you play the game again and again, against different opponents, or the same one who has borrowed more money, eventually you will lose your entire bankroll. This is true even if the odds are in your favour. Eventually you will meet a long-enough bad streak to bankrupt you. In other words, infinite capital will overcome any finite odds against it. This is one version of the ‘Gambler’s Ruin’ problem, and many gamblers over the years have been ruined because of their unawareness of it.

Exercise

  1. What is the probability of throwing at least one double-six in 26 throws of a pair of dice?
  2. You and your opponent toss a coin, where the loser pays the winner £10. The game continues until either you or your opponent has all the money. Suppose you have £100 to start and your opponent has £400. What are the probabilities that a) you and b) your opponent, will end up with all the money?

References and Links

DeMere’s Paradox. ProofWiki. https://proofwiki.org/wiki/De_M%C3%A9r%C3%A9%27s_Paradox

One gambling problem that launched modern probability theory. Introductory Statistics. https://introductorystats.wordpress.com/2010/11/12/one-gambling-problem-that-launched-modern-probability-theory/

deMere’s Problem. WolframMathWorld. http://mathworld.wolfram.com/deMeresProblem.html

Gambler’s Ruin. WolframMathWorld. http://mathworld.wolfram.com/GamblersRuin.html

The Problem of Existence – Guide Notes.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

It shouldn’t be possible for us to exist. But we do. That’s counterintuitive. Take, for example the ‘Cosmological Constant.’ What it represents is a sort of unobserved ‘energy’ in the vacuum of space which possesses density and pressure, which prevents a static universe from collapsing in upon itself. We know how much unobserved energy there is because we know how it affects the Universe’s expansion. But how much should there be? The easiest way to picture this is to visualise ‘empty space’ as containing ‘virtual’ particles that continually form and then disappear. This ‘empty space’, it turns out, ‘weighs’ 10 to the power of 93 grams per cubic centimetre. Yet the actual figure differs from that predicted by a factor of 10 to the power of 120. The ‘vacuum energy density’ as predicted is simply 10120 times too big. That’s a 1 with 120 zeros after it. So there is something cancelling out all this energy, to make it 10 to the power of 120 smaller in practice than it should be in theory. In other words, the various components of vacuum energy are arranged so that they essentially cancel out.

Now this is very fortuitous. If the cancellation figure was one power of ten different, 10 to the power of 119, then galaxies could not form, as matter would not be able to condense, so no stars, no planets, no life. So we are faced with the fact that the positive and negative contributions to the cosmological constant cancel to 120 digit accuracy, yet fail to cancel beginning at the 121st digit. In fact, the cosmological constant must be zero to within one part in roughly 10120 (and yet be nonzero), or else the universe either would have dispersed too fast for stars and galaxies to have formed, or else would have collapsed upon itself long ago. How likely is this by chance? Essentially, it is the equivalent of tossing a coin and needing to get heads 400 times in a row and achieving it.

Now, that’s just one constant that needs to be just right for galaxies and stars and planets and life to exist. There are quite a few, independent of this, which have to be equally just right, most notably the strength of gravity and of the strong nuclear force relative to electromagnetism and the observed strength of the weak nuclear force. Others include the difference between the masses of the two lightest quarks and the mass of the electron relative to the quark masses, the value of the global cosmic energy density in the very early universe, and the relative amplitude of density fluctuations in the early universe. If any of these constants had been slightly different, stars and galaxies could not have formed.

There is also the symmetry/asymmetry paradox. When symmetry is required of the Universe, for example in a perfect balance of positive and negative charge, conservation of electric charge is critically ensured. If there were an equal number of protons and antiprotons, of matter and antimatter, produced by the Big Bang, they would have annihilated each other, leaving a Universe empty of its atomic building blocks. Fortuitously for the existence of a live Universe, protons actually outnumbered antiprotons by a factor of just one in one billion. If the perfect symmetry of the charge and almost vanishingly tiny asymmetry of matter and antimatter were reversed, if protons and antiprotons had not differed in number by that one part in a billion, there would be no galaxies, no stars, no planets, no life, no consciousness, no question for us to consider.

In summary, then, if the conditions in the Big Bang which started our Universe had been even a tiniest of a tiniest of a tiny bit different, with regard to a number of independent physical constants, our galaxies, stars and planets would not have been able to exist, let alone lead to the existence of living, thinking, feeling things. So why are they so right?

Let us first tackle those who say that if they hadn’t been right we would not have been able to even ask the question. This sounds a clever point but in fact it is not. For example it would be absolutely bewildering how I could have survived a fall out of an aeroplane from 39,000 feet onto tarmac without a parachute, but it would still be a question very much in need of an answer. To say that I couldn’t have posed the question if I hadn’t survived the fall is no answer at all.

Others propose the argument that since there must be some initial conditions, these conditions which gave rise to the Universe and life within it possible were just as likely to prevail as any others, so there is no puzzle to be explained.

But this is like saying that there are two people, Jack and Jill, who are arguing over whether Jill can control whether a fair coin lands heads or tails. Jack challenges Jill to toss the coin 400 times. He says he will be convinced of Jill’s amazing skill if she can toss heads followed by tails 200 times in a row, and she proceeds to do so. Jack could now argue that a head was equally likely as a tail on every single toss of the coin, so this sequence of heads and tails was, in retrospect, just as likely as any other outcome. But clearly that would be a very poor explanation of the pattern that just occurred. That particular pattern was clearly not produced by coincidence. Yet it’s the same argument as saying that it is just as likely that the initial conditions were just right to produce the Universe and life to exist as that any of the other pattern of billions of initial conditions that would not have done so. There may be a reason for the pattern that was produced, but it needs a more profound explanation than proposing that it was just coincidence.

A second example. There is one lottery draw, devised by an alien civilisation. The lottery balls, numbered from 1 to 59, are to be drawn, and the only way that we will escape destruction, we are told, is if the first 59 balls out of the drum emerge as 1 to 59 in sequence. The numbers duly come out in that exact sequence. Now that outcome is no less likely than any other particular sequence, so if it came out that way a sceptic could claim that we were just lucky. That would clearly be nonsensical. A much more reasonable and sensible conclusion, of course, is that the aliens had rigged the draw to allow us to survive!

So the fact that the initial conditions are so fine-tuned deserves an explanation, and a very good one at that. It cannot be simply dismissed as a coincidence or a non-question.

An explanation that has been proposed that does deserve serious scrutiny is that there have been many Big Bangs, with many different initial conditions. Assuming that there were billions upon billions of these, eventually one will produce initial conditions that are right for the Universe to at least have a shot at existing.

In this apparently theory, we are essentially proposing a process statistically along the lines of aliens drawing lottery balls over and over again, countless times, until the numbers come out in the sequence 1 to 59.

On this basis, a viable Universe could arise out of re-generating the initial conditions at the Big Bang until one of the lottery numbers eventually comes up. Is this a simpler explanation of why our Universe and life exists than an explanation based on a primal cause, and in any case does simplicity matter as a criterion of truth? This is the first question and it is usually accepted in the realm of scientific enquiry. A simpler explanation of known facts is usually accepted as superior to a more complex one.

Of course, the simplest state of affairs would be a situation in which nothing had ever existed. This would also be the least arbitrary, and certainly the easiest to understand. Indeed, if nothing had ever existed, there would have been nothing to be explained. Most critically, it would solve the mystery of how things could exist without their existence having some cause. In particular, while it is not possible to propose a causal explanation of why the whole Universe or Universes exists, if nothing had ever existed, that state of affairs would not have needed to be caused. This is not helpful to us, though, as we know that in fact at least one Universe does exist.

Take the opposite extreme, where every possible Universe exists, underpinned by every possible set of initial conditions. In such a state of affairs, most of these might be subject to different fundamental laws, governed by different equations, composed of different elemental matter. There is no reason in principle, on this version of reality, to believe that each different type of Universe should not exist over and over again, up to an infinite number of times, so even our own type of Universe could exist billions of billions of times, or more, so that in the limit everything that could happen has happened and will happen, over and over again. This may be a true depiction of reality, but it or anything anywhere remotely near it, seems a very unconvincing one. In any case, our sole source of understanding about the make-up of a Universe is a study of our own Universe. On what basis, therefore, can we scientifically propose that the other speculative Universes are governed by totally different equations and fundamental physical laws? They may be, but that is a heroic assumption.

Perhaps the laws are the same, but the constants that determines the relative masses of the elementary particles, the relative strength of the physical forces, and many other fundamentals, differ but not the laws themselves. If so, what is the law governing how these constants vary from Universe to universe, and where do these fundamental laws come from? From nothing? It has been argued that absolutely no evidence exists that any other Universe exists but our own, and that the reason that these unseen Universes is proposed is simply to explain the otherwise baffling problem of explaining how our Universe and life within it can exist. That may well be so, but we can park that for now as it is still at least possible that they do exist.

So let’s step away from requiring any evidence, and move on to at least admitting the possibility that there are a lot of universes, but not every conceivable universe. One version of this is that the other Universes have the same fundamental laws, subject to the same fundamental equations, and composed of the same elemental matter as ours, but differ in the initial conditions and the constants. But this leaves us with the question as to why there should be only just so many universes, and no more. A hundred, a thousand, a hundred thousand, whatever number we choose requires an explanation of why just that number. This is again very puzzling. If we didn’t know better, our best ‘a priori’ guess is that there would be no universes, no life. We happen to know that’s wrong, so that leaves our Universe; or else a limitless number of universes where anything that could happen has or will, over and over again; or else a limited number of universes, which begs the question, why just that number?

Is it because certain special features have to obtain in the initial conditions before a Universe can be born, and that these are limited in number. Let us assume this is so. This only begs the question of why these limited features cannot occur more than a limited number of times. If they could, there is no reason to believe the number of universes containing these special features would be less than limitless in number. So, on this view, our Universe exists because it contains the special features which allow a Universe to exist. But if so, we are back with the problem arising in the conception of all possible worlds, but in this case it is only our own type of Universe (i.e. obeying the equations and laws that underpin this Universe) that could exist limitless times. Again, this may be a true depiction of reality, but it seems a very unconvincing one.

The alternative is to adopt an assumption that there is some limiting parameter to the whole process of creating Universes, along some version of string theory which claims that there are a limit of 10 to the power of 500 solutions (admittedly a dizzyingly big number) to the equations that make up the so-called ‘landscape’ of reality. That sort of limiting assumption, however realistic or unrealistic it might be, would seem to offer at least a lifeline to allow us to cling onto some semblance of common sense.

Before summarising where we have got to, a quick aside on the ‘Great Filter’ idea, relating to the question of how life of any form could arise out of inanimate matter, and ultimately to human consciousness. Observable civilisations don’t seem to happen much from what we know now, and possibly only once. Indeed, even in a universe that manages to exist, the mind-numbingly small improbability of getting from inanimate matter to conscious humans seems to require a series of steps of apparently astonishing improbability. The Filter refers to the causal path from simple inanimate matter to a visible civilisation. The underpinning logic is that almost everything that starts along this path is blocked along the way, which might be by means of one extremely hard step, or many very, very hard steps. Indeed, it’s commonly supposed that it has only once ever happened here on earth. Just exactly once, traceable so far to LUCA (our Last Universal Common Ancestor). If so, it may be why the universe out there seems for the most part to be quite dead. The biggest filter, so the argument goes, is that the origin of life from inanimate matter is itself very, very, very hard. It’s a sort of Resurrection but an order of magnitude harder because the ‘dead stuff’ had never been alive, and nor had anything else! And that’s just the first giant leap along the way. This is a big problem of its own but that’s for another day, so let’s leave that aside and go back a step, to the origin of the universe. Before we do so, let us as I suggested before our short detour, summarise very quickly.

Here goes. If we didn’t know better, our best guess, the simplest description of all possible realities, is that nothing exists. But we do know better, because we are alive and conscious, and considering the question. But our Universe is far, far, far too fine-tuned, by a factor of billions of billions, to exist by chance if it is the only Universe. So there must be more, if our Universe is caused by the roll of the die, a lot more. But how many more? If there is some mechanism for generating experimental universe upon universe, why should there be a limit to this process, and if there is not, that means that there will be limitless universes, including limitless identical universes, in which in principle everything possible has happened, and will happen, over and over again.

Even if we accept there is some limiter, we have to ask what causes this limiter to exist, and even if we don’t accept there is a limiter, we still need to ask what governs the equations representing the initial conditions to be as they are, to create one Universe or many. What puts life into the equations and makes a universe or universes at all? And why should the mechanism generating life into these equations have infused them with the physical laws that allow the production of any universe at all?

Some have speculated that we can create a universe or universes out of nothing, that a particle and an anti-particle, for example could in theory spontaneously be generated out of what is described as a ‘quantum vacuum’. According to this theoretical conjecture, the Universe ‘tunnelled’ into existence out of nothing.

This would be a helpful handle for proposing some rational explanation of the origin of the Universe and of space-time if a ‘quantum vacuum’ was in fact nothingness. But that’s the problem with this theoretical foray into the quantum world. In fact, a quantum vacuum is not empty or nothing in any real sense at all. It has a complex mathematical structure, it is saturated with energy fields and virtual-particle activity. In other words, it is a thing with structure and things happening in it. As such, the equations that would form the quantum basis for generating particles, anti-particles, fluctuations, a Universe, actually exist, possess structure. They are not nothingness, not a void.

To be more specific, according to relativistic quantum field theories, particles can be understood as specific arrangements of quantum fields. So one particular arrangement could correspond to there being 28 particles, another 240, another to no particles at all, and another to an infinite number. The arrangement which corresponds to no particles is known as a ‘vacuum’ state. But these relativistic quantum field theoretical vacuum states are indeed particular arrangement of elementary physical stuff, no less than so than our planet or solar system. The only case in which there would be no physical stuff would be if the quantum fields ceased to exist. But that’s the thing. They do exist. There is no something from nothing. And this something, and the equations which infuse it, has somehow had the shape and form to give rise to protons, neutrons, planets, galaxies and us.

So the question is what gives life to this structure, because without that structure, no amount of ‘quantum fiddling’ can create anything. No amount of something can be produced out of nothing. Yes, even empty space is something with structure and potential. More basically, how and why should such a thing as a ‘quantum vacuum’ even have existed, begun to exist, let alone be infused with the potential to create a Universe and conscious life out of non-conscious somethingness?

It is certainly a puzzle, and arguably one without an intuitive solution.

Exercise

If the conditions in the Big Bang which started our Universe had been even a tiniest of a tiniest of a tiny bit different, with regard to a number of independent physical constants, the galaxies, stars and planets would not have been able to exist. But if we didn’t exist, we couldn’t have asked the question as to why they were so right. In any case, since there must be some initial conditions, the conditions which gave rise to the Universe and life, however fortuitous, were just as likely to prevail as any others. So there is, for both reasons, no puzzle to be explained. Is this a convincing rebuttal of the ‘Fined Tuned’ universe problem. Why? Why not?

Reading and Links

Derek Parfit, ‘Why anything? Why this? Part 1. London Review of Books, 20, 2, 22 January 1998, pp. 24-27.

https://www.lrb.co.uk/v20/n02/derek-parfit/why-anything-why-this

Derek Parfit, ‘Why anything? Why this? Part 2. London Review of Books, 20, 3, 5 February 1998, pp. 22-25.

https://www.lrb.co.uk/v20/n03/derek-parfit/why-anything-why-this

John Piippo, Giving Up on Derek Parfit, July 22, 2012

http://www.johnpiippo.com/2012/07/giving-up-on-derek-parfit.html

A universe made for me? Physics, fine-tuning and life https://cosmosmagazine.com/physics/a-universe-made-for-me-physics-fine-tuning-and-life

John Horgan, ‘Science will never explain why there’s something rather than nothing’, Scientific American, April 23, 2012.

https://blogs.scientificamerican.com/cross-check/science-will-never-explain-why-theres-something-rather-than-nothing/

David Bailey, What is the cosmological constant paradox, and what is its significance? 1 January 2017. http://www.sciencemeetsreligion.org/physics/cosmo-constant.php

Fine Tuning of the Universe

http://reasonandscience.heavenforum.org/t1277-fine-tuning-of-the-universe

The Great Filter – are we almost past it? http://mason.gmu.edu/~rhanson/greatfilter.html

http://www.overcomingbias.com/2017/12/dragon-debris.html

Fine Tuning in Cosmology. Chapter 2. In: Bostrom, N. Anthropic Bias: Observation Selection Effects in Science and Philosophy. 2002. http://www.anthropic-principle.com/?q=book/chapter_2#2a

Last Common Universal Ancestor (LUCA)

https://en.wikipedia.org/wiki/Last_universal_common_ancestor

David Albert, ‘On the Origin of Everything’, Sunday Book Review, The New York Times, March 23, 2012.

Quantum World Thought Experiments – Guide Notes.

Is it possible to be both alive and dead at the same time? This is the question central to the famous Schrodinger’s Cat thought experiment. In the version posed by Erwin Schrodinger, a cat is placed in an opaque box for an hour with a small piece of radioactive material which has an equal probability of decaying or not in that time period. If some radioactivity is detected by a Geiger counter also placed in the box, a relay releases a hammer which breaks a flask of hydrocyanic acid, killing the cat. If no radioactivity is detected, the cat lives. Before we open the box at the end of the hour, we estimate the chance that the radioactive material will decay and the cat will be dead at 50/50, the same as that it will be alive. Before we open the box, however, is the cat alive (and we don’t know it yet), dead (and we don’t know it yet) or both alive and dead (until we open the box and find out).

Common sense would seem to indicate that it is either alive or dead, but we don’t know until we open the box. Traditional quantum theory suggest otherwise. The cat is both alive, with a certain probability, and dead, with a certain probability, until we open the box and find out, when it has to become one or the other with a probability of 100 per cent. In quantum terminology, the cat is in a superposition (two states at the same time) of being alive and dead, which only collapses into one state (dead or alive) when the cat is observed. This might seem absurd when applied to a cat. After all surely it was either alive or dead before we opened the box and found out. It was simply that we didn’t know which. That may be true, when applied to cats. But when applied to the microscopic quantum world, such common sense goes out the window as a description of reality. For example, photons (the smallest measure of light) can exist simultaneously in both wave and particle states, and travel in both clockwise and anti-clockwise directions at the same time. Each state exists in the same moment. As soon as the photon is observed, however, it must settle on one unique state. In other words, the common sense that we can apply to cats we cannot apply to photons or other particles at the quantum level.

So what is going on? The traditional explanation as to why the same quantum particle can exist in different states simultaneously is known as the Copenhagen Interpretation. First proposed by Niels Bohr in the early twentieth century, the Copenhagen interpretation states that a quantum particle does not exist in any one state but in all possible states at the same time, with various probabilities. It is only when we observe it that it must in effect choose which of these states it exists as. At the sub-atomic level, then, particles seem to exist in a state of what is called ‘coherent superposition’, in which they can be two things at the same time, and only become one when they are forced to do so by the act of being observed. The total of all possible states is known as the ‘wave function.’ When the quantum particle is observed, the superposition ‘collapses’ and the object is forced into one of the states that make up its wave function.

The problem with this explanation is that all these different states exist. By observing the object, it might be that it reduces down to one of these states, but what has happened to the others? Where have they disappeared to?

This question lies at the heart of the so-called ‘Quantum Suicide’ thought experiment.

It goes like this. A man (not a cat) sits down in front of a gun which is linked to a machine that measures the spin of a quantum particle (a quark). If it is measured as spinning clockwise, the gun will fire and kill the man. If it is measured as spinning anti-clockwise, it will not fire and the man will survive to undergo the same experiment again.

The question is – will the man survive, and how long will he survive for? This thought experiment, proposed by Max Tegmark, has been answered in different ways by quantum theorists depending on whether or not they adhere to the Copenhagen Interpretation. In that interpretation, the gun will go off with a certain probability, depending on which way the quark is spinning. Eventually, by the laws of chance, the man will be killed, probably sooner rather than later. A growing number of theorists believe something else, however. They see both states (the particle is spinning clockwise and spinning anti-clockwise) as equally real, so there are two real outcomes. In one world, the man dies and in the other he lives. The experiment repeats, and the same split occurs. In one world there will exist a man who survives an indefinite number of rounds. In the other worlds, he is dead.

The difference between these alternative approaches is critical. The Copenhagen approach is to propose that the simultaneously existing states (for example, the quark that is spinning both clockwise and anti-clockwise simultaneously) exist in one world, and collapse into one of these states when observed. Meanwhile, the other states mysteriously disappear. The other approach is to posit that these simultaneously existing states are real states, and neither magically disappears, but branch off into different realities when observed. What is happening is that in one world, the particle is observed spinning clockwise (in the Quantum Suicide thought experiment, the man dies) and in the other world the particle is observed spinning the other way (and the man lives). Crucially, according to this interpretation both worlds are real. In other words, they are not notional states of one world but alternative realities. This is the so-called ‘Many Worlds Theory.’

Where is the burden of proof in trying to determine which interpretation of reality is correct? This depends on whether we take the one world that we can observe as the default position or the wave function of all possible states as represented in the mathematics of the wave function as the reality. Adherents to the Many Worlds position argue that the default is to go with what is described in the mathematics underpinning quantum theory – that the wave function represents all of reality. According to this argument, the minimal mathematical structure needed to make sense of quantum mechanics is the existence of many worlds which branch off, each of which contains an alternative reality. Moreover, these worlds are real. To say that our world, the one that we are observing, is the only real one, despite all the other possible worlds or measurement outcomes, has been likened to when we believed that the Earth was at the centre of the universe. There is no real justification, according to this interpretation, for saying that our branch of all possible states is the only real one, and that all other branches are non-existent or are ‘disappeared worlds.’ Put another way, the mathematics of quantum mechanics describes these different worlds. Nothing in the maths says that this world that we observe is more real than another world. So the burden of proof is on those who say it is. The viewpoint of the Copenhagen school is diametrically opposite. They argue that the hard evidence is of the world we are in, and the burden of proof is on those positing other worlds containing other branches of reality.

Depending on which default position we choose to adopt will determine whether we are adherents of the Copenhagen or the ‘Many Worlds’ schools.

For me personally, the logic of the argument points to the Many Worlds school. But to believe that they are right, and the Copenhagen school is wrong, seems kind of crazy, and totally counter-intuitive. In another world, of course, I’m probably saying the exact opposite.

Exercise

Consider the main strength and weakness of the ‘Many Worlds’ interpretation of reality.

References and Links

Do Parallel Universes Really Exist? HowStuffWorks. https://science.howstuffworks.com/science-vs-myth/everyday-myths/parallel-universe.htm

How Quantum Suicide Works. HowStuffWorks. https://science.howstuffworks.com/innovation/science-questions/quantum-suicide.htm

 

Are we living inside the Matrix? Guide Notes.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

Do we live in a simulation, created by an advanced civilisation, in which we are part of some sophisticated virtual reality experience? For this to be a possibility we can make the obvious assumption that sufficiently advanced civilisations will possess the requisite computing and programming power to create what philosopher Nick Bostrom termed such ‘ancestor simulations’. These simulations would be complex enough for the minds that are simulated to be conscious and able to experience the type of experiences that we do. The creators of these simulations could exist at any stage in the development of the universe, even billions of years into the future.

The argument around simulation goes like this. One of the following three statements must be correct.

  1. That civilisations at our level of development always or almost always disappear before becoming technologically advanced enough to create these simulations.
  2. That the proportion of these technologically advanced civilisations that wish to create these simulations is zero or almost zero.
  3. That we are almost sure to be living in such a simulation.

To see this, let’s examine each proposition in turn.

  1. Suppose that the first is not true. In that case, a significant proportion of civilisations at our stage of technology go on to become technologically advanced enough to create these simulations.
  2. Suppose that the second is not true. In this case, a significant proportion of these civilisations run such simulations.
  3. If both of the above propositions are not true, then there will be countless simulated minds indistinguishable to all intents and purposes from ours, as there is potentially no limit to the number of simulations these civilisations could create. The number of such simulated minds would almost certainly be overwhelmingly greater than the number of minds that created them. Consequently, we would be quite safe in assuming that we are almost certainly inside a simulation created by some form of advanced civilisation.

For the first proposition to be untrue, civilisations must be able to go through the phase of being able to wipe themselves out, either deliberately or by accident, carelessness or neglect, and never or almost never do so. This might perhaps seem unlikely based on our experience of this world, but becomes more likely if we consider all other possible worlds.

For the second proposition to be untrue, we would have to assume that virtually all civilisations that were able to create these simulations would decide not to do so. This again is possible, but would seem unlikely.

If we consider both propositions, and we think it is unlikely that no civilisations survive long enough to achieve what Bostrom calls ‘technological maturity’, and that it is unlikely that hardly any would create ‘ancestor simulations’ if they could, then anyone considering the question is left with a stark conclusion. They really are living in a simulation.

To summarise. An advanced ‘technologically mature’ civilisation would have the capability of creating simulated minds. Based on this, at least one of three propositions must be true.

  1. The proportion of these advanced civilisations is close to zero or zero.
  2. The proportion of these advanced civilisations that wish to run these simulations is close to zero.
  3. The proportion of those consciously considering the question who are living in a simulation is close to one.

If the first of these propositions is true, we will almost certainly not survive to become ‘technologically mature.’ If the second proposition is true, virtually no advanced civilisations are interested in using their power to create such simulations. If the third proposition is true, then conscious beings considering the question are almost certainly living in a simulation.

Through the veil of our ignorance, it might seem sensible to assign equal credence to all three, and to conclude that unless we are currently living in a simulation, descendants of this civilisation will almost certainly never be in a position to run these simulations.

Strangely indeed, the probability that we are living in a simulation increases as we draw closer to the point at which we are able and willing to do so. At the point that we would be ready to create our own simulations, we would paradoxically be at the very point when we were almost sure that we ourselves were simulations. Only by refraining to do so could we in a certain sense make it less likely that we were simulated, as it would show that at least one civilisation that was able to create simulations refrained from doing so. Once we took the plunge, we would know that we were almost certainly only doing so as simulated beings. And yet there must have been someone or something that created the first simulation. Could that be us, we would be asking ourselves? In our simulated hearts and minds, we would already know the answer!

Exercise

With reference to Bostrom’s ‘simulation’ reasoning, generate an estimate as to the probability that we are living in a simulated world.

References and Links

The Simulation Argument. https://www.simulation-argument.com/

Do we live in a computer simulation? Nick Bostrom. New Scientist. 00Month 2006. 8-9. https://www.simulation-argument.com/computer.pdf

Are you living in a computer simulation? Bostrom, N. Philosophical Quarterly (2003). 53, 211. 243-255.

Click to access simulation.pdf

Hempel’s Paradox – in a nutshell.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

You spot a pink flamingo and wonder to yourself whether all flamingos are pink. What would it take to confirm or disprove the hypothesis? The nice thing about this sort of hypothesis is that it’s testable and potentially falsifiable. All it takes is to find a flamingo that is not pink, and I can conclude that not all flamingos are pink. Just one observation can change my flamingo world view. It doesn’t matter how many pink flamingos you witness, however, no number can prove the hypothesis short of the number of flamingos that potentially exist. Still, the more you see that are pink, the more probable it becomes that all flamingos are actually pink. How probable you consider that is at any given time is related to how probable you thought it was before you saw the latest one. While considering this, you see someone wearing blue tennis shoes. Does this make it more likely that all flamingos are pink? This is one example of a broader paradox first formally identified by Carl Gustav Hempel, sometimes known as Hempel’s paradox or else the Raven Paradox.

The Raven paradox arises from asking whether observing a green apple makes it more likely that all ravens are black, assuming that you don’t know the answer. It would intuitively seem not. Why should seeing a green apple tell you anything about the colour of ravens? The way to answer this is to re-state ‘All ravens are black’ as ‘Everything that is not black is not a raven.’ In fact, these two statements are logically equivalent. To see this, assume there are just two ravens and two tennis shoes (one right-foot, one left-foot) in the whole world. Now you identify the colour of each of these objects. You observe that both tennis shoes are blue and the other two objects are black. So you announce that everything that is not black (each of the tennis shoes) is not a raven. This is identical to saying that all ravens are black. The logic universalises to any number of objects and colours. Assume now we see just one of the tennis shoes and it turns out to be blue. You can now announce that one possible thing that is not black is not a raven. If you see the other tennis shoe and it is blue, that means that there are now two things that are not black that are not a raven. Each time you see something, it is possible that you would not be able to say this – i.e. you would say instead that you have seen something not black and it is a raven. It is like being dealt a playing card from a deck of four which contains only blue or black cards. You are dealt a black card, and it shows a raven. You know that at least one of the other cards is a raven, and it could be a black card or a blue card. You receive a blue card. Now, before you turn it over, what is the chance it is a raven? You don’t know, but whatever it is, the chance that only black cards show ravens improves if you turn the blue card over and it shows a tennis shoe. Each time you turn a blue card over it could show a raven. Each time that it doesn’t makes it more likely that none of the blue cards shows a raven. Substitute all non-ravens for tennis shoes and all colours other than black for the blue cards, and the result universalises. Every time you see an object that is not black and is not a raven, it makes it just that tiny, tiny bit more likely that everything that is not black is not a raven, i.e. that all ravens are black. How much more likely? This depends on how observable non-black ravens would be if they exist. If there is no chance that they would be seen even if they exist, because non-black ravens never emerge from the nest, say, it is much more difficult to falsify the proposition that all ravens are black. So when you observe a blue tennis shoe it offers less evidence for the ‘all ravens are black’ hypothesis than when it is just possible that the blue thing you saw would have been a raven and not a tennis shoe. More generally, the more likely a non-black raven is to be observed if it exists, the more evidence observation of a non-black object offers for the hypothesis that all ravens are black.

So to summarise, we want to test the hypothesis that all ravens are black. We could go out, find some ravens, and see if they are black. On the other hand, we could simply take the logically equivalent contrapositive of the hypothesis, i.e. that all non-black things are non-ravens. This suggests that we can conduct meaningful research on the colour of ravens from our home or office without observing a single raven, but by simply looking at random objects, noting that they are not black, and checking if they are ravens. As we proceed, we collect data that increasingly less support to the hypothesis that all non-black things are non-ravens, i.e. that all ravens are black. Is there a problem with this approach?

There is no logical flaw in the approach, but the reality is that there are many more non-black things than there are ravens, so if there was a pair (raven, non-black), then we would be much more likely to find it by randomly sampling a raven then by sampling a non-black thing. Therefore, if we sample ravens and fail to find a non-black raven, then we’re much more confident in the truth of our hypothesis that “all ravens are black,” simply because the hypothesis had a much higher chance of being falsified by sampling ravens than by sampling random non-black things.

The same goes for pink flamingos. So we have a paradox traceable to Hempel. I suggest we can do this by appeal to a ‘Possibility Theorem’ which I advance here.

Let’s do this by taking the propositions in the thought experiment in turn. Proposition 1: All flamingos are pink. Proposition 2 (logically equivalent to Proposition 1): Everything that is not pink is not a flamingo. Proposition 3 (advanced here as the Possibility Theorem): If something might or might not exist, but is unobservable, it is more likely to exist than something which can be observed, with any positive probability, but is not observed. If something might or might not exist, it is more likely to exist if it is less likely to be observed than something else which is more likely to be observed, and is not observed. So when I see two blue tennis shoes, I am ever more slightly more confident that all flamingos are pink than before I saw them, and especially so if any non-pink flamingos that might be out there would be easy to spot. And I’d still be wrong, but for all the right reasons, until I saw an orange or white flamingo, and then I’d be right, and sure.

Exercise

Does seeing a blue tennis shoe make it more or less likely that all flamingos are pink, or neither?

References and Links

Hempel’s Ravens Paradox. PRIME. http://platonicrealms.com/encyclopedia/Hempels-Ravens-Paradox

Raven Paradox. Wikipedia. https://en.wikipedia.org/wiki/Raven_paradox

The Paradox of Perspective: In a Nutshell

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

Imagine a world created by an external Being based on the toss of a fair coin, or a roll of the dice. It’s a thought experiment sometimes called the ‘God’s Coin Toss Problem.’ In the simplest version, Heads creates a version of the world in which one blue-bearded person is created. Let’s call that World A. Tails creates a version of the world in which a blue-bearded and a black-bearded person are created. Let’s call that World B.

You wake up in the dark in one of these worlds, but you don’t know which, and you can’t see what colour your beard is, though you do know the rule that created the world. What likelihood do you now assign to the hypothesis that the coin landed Tails and you have been born into world B?

This depends on what fundamental assumption you make about existence itself. One way of approaching this is to adopt what has been called the ‘Self Sampling Assumption’. This states that “you should reason as if you are randomly selected from everyone who exists in your reference class.” What do we mean by reference class? As an example, we can reasonably take it that in terms of our shared common existence a blue-bearded person and a black-bearded person belong in the same reference class, whereas a blue-bearded person and a black-coloured ball do not. Looked at another way, we need to ask, “What do you mean by you?” Before the lights came on, you don’t know what colour your beard is. It could be blue or it could be black, unless you mean that it’s part of the “essence” of who you are to have a blue-coloured beard. In other words, that there is no possible state of the world in which you had a black beard but otherwise would have been you.

Using this assumption, we see ourselves simply as a randomly selected bearded person among the reference class of blue and black bearded people. The coin could have landed Heads in which case we are in World A or if could have landed Tails in which case we are in World B. There is an equal chance that the coin landed Heads or Tails, so we should assign a credence of 1/2 to being in World A and similarly for World B. In World B the probability is 1/2 that we have a blue beard and 1/2 that we have a black beard.

The light is now turned on and we see that we are sporting a blue beard. What is the probability now that the coin landed Tails and we are in World B? Well, the probability we would sport a blue beard conditional on living in World A is 1, i.e. 100%. This is because we know that the one person who lives in World A has a blue beard. The conditional probability of having a blue beard in World B, in the other hand, is 1/2. The other inhabitant has a black beard. So there is twice the chance that we live on World A as World B conditional on finding out that we have a blue beard, i.e. a 2/3 chance the coin landed Heads and we live in World A.

Let’s now say you make a different assumption about existence itself. Your worldview in this alternative scenario is based in what has been termed the ‘Self-Indication Assumption.’ It can he stated like this. “Given the fact that you exist, you should (other things equal) favour a hypothesis according to which many observers exist over a hypothesis according to which few observers exist.”

According to this assumption, you note that there is one hypothesis (the World B hypothesis) according to which there are two observers (one blue-bearded and one black-bearded) and another hypothesis (the World A hypothesis) in which there is only one observer (who is blue-bearded). Since there are twice as many observers in World B as World A, then according to the Self-Indication Assumption, it is twice as likely (a 2/3 chance) that you live in World B as World A (a 1/3 chance). This is your best guess while the lights are out. When the lights are turned on, you find out that you have a blue beard. The new conditional probability you attach to living in World B is 1/2, as there is an equal chance that as a blue-bearded person you live in World B as World A.

So, under the Self-Sampling Assumption, your initial credence that you lived in World B was 1/2, which fell to 1/3 when you found out you had a blue beard. Under the Self-Indication Assumption, on the other hand, your initial credence of living on World B was 2/3, which fell to 1/2 when the lights came on.

So which is right and what are the wider implications?

Let us first consider the impact of changing the reference class of the ‘companion’ on World B. Instead of this being another bearded person, it is a black ball. In this case, what is the probability you should attribute to living on World B given the Self-Sampling Assumption? While the lights are out, you consider that there is a probability of 1/2 that the ball landed Tails, so the probability that you live on World B is 1/2.

When the lights are turned on, no new relevant information is added as you knew you were blue-bearded. There is one blue-bearded person on World A, therefore, and one on World B. So the chance that you are in World B is unchanged. It is 1/2.

Given the Self-Indication Assumption, the credence you should assign to being on World B given that your companion is a ball instead of a bearded person is now 1/2 as the number of relevant observers inhabiting World B is now one, the same as on World B. When the lights come on, you learn nothing new, and the chance the coin landed Tails and you are on World B stays unchanged at 1/2.

Unlike with the Self-Indicating Assumption (SIA), the Self-Sampling Assumption is dependent on the choice of reference class. The SIA is not dependent on the choice of reference class, as long as the reference class is large enough to contain all subjectively indistinguishable observers. If the reference class is large, SIA will make it more likely that the hypothesis is true but this is compensated by the much reduced probability that the observer will be any particular observer in the larger reference class.

The choice of underlying assumption has implications elsewhere, most famously in regard to the Sleeping Beauty Problem, which I have addressed from a different angle in a separate blog.

In that problem, Sleeping Beauty is woken either once (on Monday) if a coin lands Heads or twice (on Monday and Tuesday) if it lands Tails. She knows these rules. Either way, she will only be aware of the immediate awakening (whether she is woken once or twice). The question is how Sleeping Beauty should answer if she is asked how likely it is the coin landed Tails when she is woken and asked.

If she adopts the Self-Sampling Assumption, she will give an answer of 1/2. The coin will have landed Tails with a probability of 1/2 and there is no other observer than her. Only if she is told that this is her second awakening will she change her credence that it landed Tails to 1 and that it landed Heads to 1.

If she adopts the Self-Indication Assumption, she has a different worldview in which there are three observation points. In fact, there are two prevalent propositions which have been called the Self-Indication Assumption, the first of which is stated above, i.e. “Given the fact that you exist, you should (other things equal) favour a hypothesis according to which many observers exist over a hypothesis according to which few observers exist.” The other can be stated thus: “All other things equal, an observer should reason as if they are randomly selected from the set of all possible observers.”

According to this assumption, stated either way, there is one hypothesis (the Heads hypothesis) according to which there is one observer opportunity (Monday awakening) and another hypothesis (the Tails hypothesis) in which there are two observer opportunities (the Monday awakening and the Tuesday awakening). Since there are twice as many observation opportunities in the Tails hypothesis according to the Self-Indication Assumption, it is twice as likely (a 2/3 chance) that the coin landed Tails as that it landed Heads (a 1/3 chance).

Looked at another way, if there is a coin toss that on heads will create one observer, while on tails it will create two, then we have three possible observers (observer on heads, first observer on tails and second observer on tails, each existing with equal probability, so the Self-Indication Assumption assigns a probability of 1/3 to each. Alternatively, this could be interpreted as saying there are two possible observers (first observer on either heads or tails, second observer on tails), the first existing with probability one and the second existing with probability 1/2. So the Self-Indication Assumption assigns a 2/3 probability to being the first observer and 1/3 to being the second observer, which is the same as before. Whichever way we prefer to look at it, the Self-Indication Assumption gives a 1/3 probability of heads and 2/3 probability of tails in the Sleeping Beauty Problem.

Depending on which Assumption we adopt, however, very different implications for our wider view of the world obtain.

One of the most well-known of these is the so-called Doomsday Argument, which is explore in another Nutshell.

The argument basically goes like this. Imagine for the sake of simplicity that there are only two possibilities: Extinction Soon and Extinction Late. In one, the human race goes extinct very soon, whereas in the other it spreads and multiplies through the Milky Way. In each case, we can write down the number of people who will ever have existed. Suppose that 100 billion people will have existed in the Extinction Soon case, as opposed to a 100 trillion people in the Extinction Late case. So now, say that we’re at the point in history where 100 billion people have lived. If we’re in the Extinction Late situation, then the great majority of the people who will ever have lived will be born after us. We’re in the very special position of being in the first 100 billion humans. Conditional in that, the probability of being in the Extinction Late case is overwhelmingly greater than of being in the Extinction Soon case. Using Bayes’ Theorem, which I explore in a separate blog, to perform the calculation, we can conclude for example that if we view the two cases as equally likely, then after applying the Doomsday reasoning, we’re almost certainly in the Extinction Soon case. For conditioned on being in the Extinction Late case (100 trillion people), we almost certainly would not be in the special position of being amongst the first 100 billion people.

We can look at it another way. If we view the entire history of the human race from a timeless perspective, then all else being equal we should be somewhere in the middle of that history. That is, the number of people who live after us should not be too much different from the number of people who lived before us. If the population is increasing exponentially, it seems to imply that humanity has a relatively short time left. Of course, you may have special information that indicates that you aren’t likely to be in the middle, but that would simply mitigate the problem, not remove it.

A modern form of the Doomsday holds that the resolution of the Doomsday Argument depends on how you resolve the Blue Beard or Sleeping Beauty Problems. If you give ⅓ as your answer to the puzzle, that corresponds to the Self-Sampling Assumption (SSA). If you make that assumption about how to apply Bayes’ Theorem, then it seems very difficult to escape the early Doom conclusion.

If you want to challenge that conclusion, then you can use the Self-Indication Assumption (SIA). That assumption says that you are more likely to exist in a world with more beings than one with less beings. You would say in the Doomsday Argument that if the “real” case is the Extinction Late case, then while it’s true that you are much less likely to be one of the first 100 billion people, it’s also true that because there are so many more people, you’re much more likely to exist in the first place. If you make both assumptions, then they cancel each other out, taking you back to your prior assessment of the probabilities of Extinction Soon and Extinction Late.

On this view, the fate of humanity, in probabilistic terms, depends on which Assumption we adopt.

One problem that has been flagged with the SSA assumption is that what applies to the first million people out of a possible trillion people applies just as well in principle to the first two people out of billions. This is known as the Adam and Eve problem. According to the SSA, the chance (without an effectively certain prior knowledge) that they are the first two people as opposed to two out of countless billions which (it is assumed) would be produced by their offspring is so vanishingly small that they could act and cause outcomes as if it is impossible that they are the potential ancestors of billions. For example, they will plan to have a child unless Eve draws the Ace of Spades from a deck of cards, in which case they will go their separate ways. According to the logic of this thought experiment, it makes the card pretty definitely the Ace of Spades. If it wasn’t, they would be two out of billions of people which is such a small probability as to be effectively precluded by the SSA, i.e. that you can reason as if you are simply randomly selected from all humans. In this way, their world would be one in which all sorts of strange coincidences, precognition, psychokinesis and backward causation could occur.

So what is the problem, if any, with the Self-Indication Assumption? Here the Presumptuous Philosopher Problem has been flagged. It goes like this. Imagine that scientists have narrowed the possibilities for a final theory of physics down to two equally likely possibilities. The main difference between them is that Theory 1 predicts that the universe contains a billion times more observers than Theory 2 does. There is a plan to build a state of the art particle accelerator to arbitrate between the two theories. Now, philosophers using the SIA come along and say that Theory 2 is almost certainly correct to within a billion-to-one confidence, since conditional on Theory 2 being correct, we’re a billion times more likely to exist in the first place. So we can save a billion pounds on building the particular accelerator. Indeed, even if we did, and it produced evidence that was a million times more consistent with Theory 1, we should still tend to hold with the view of the philosophers who are sticking to their assertion that Theory 2 is the correct one. Indeed, we should award the Nobel Prize in Physics to them for their “discovery.”

So we are left with a choice between the Self-Sampling Assumption which leads to the Doomsday Argument, and the Self-Indication Assumption which leads to the Presumptuous Philosopher Problem. And we need to choose a side.

For reasons in my Nutshell on the Sleeping Beauty Problem, I identify the answer to the Sleeping Beauty Problem as 1/3, which is consistent with an answer of 2/3 for the Blue Beard Problem. This is all consistent with the Self-Indication Assumption, but not the Self-Sampling Assumption.

Appendix

We can address the God’s Dice problem using Bayes’ Theorem.

We are seeking to calculate the probability, P (H I Blue) that the coin landed heads, given that you have a blue beard. In the problem as posed, there are two people, and you are not more likely, a priori, to be either the blue-bearded or the black-bearded person. Now the probability, with a fair coin, of throwing heads as opposed to tails is 1/2. Adopting the Self-Sampling Assumption, we sample a person within their world at random.

First, what is the probability that you have a blue beard, P (Blue).

This is given by: P (Blue I Heads). P (Heads) + P (Blue I Tails) . P (Tails) = 1 . 1/2 + 1/2 . 1/2 = 3/4

Since if the coin lands Heads, P (Blue) = 1; P (Heads) = 1/2.

If the coin lands Tails, P (Blue) = 1/2; P (Tails) = 1/2.

By Bayes’ Theorem, P (Tails I Blue) = P (Blue I Tails) . P (Tails) / P (Blue) =  1/2 . 1/2 / (3/4) = 1/3

So the probability that you have a blue beard if the coin landed tails (World B) is 1/3.

What assumption needs to be made so that the probability of having a blue beard in World 2 is 1/2.

You could assume that whenever you exist, you have a blue beard. In that case, P (Blue I Heads) = 1. P (Blue B I Tails) = 1.

Now, P (Blue) =   P (Blue I Heads). P (Heads) + P (Blue I Tails) . P (Tails) = 1 . 1/2 + 1 . 1/2 = 1

Now, by Bayes’ Theorem, P (Tails I Blue) = P (Blue I Tails) . P (Tails) / P (Blue) =  1 . 1/2 / 1 = 1/2

Is there a way, however, to do so without a prior commitment about beard colour?

One approach is to note that there are twice as many people in the Tails world as in the Heads world in the first place. This is known as the Self-Indication Assumption, So you could argue that you are a priori twice as likely to exist in a world with twice as many people. In a world with more people, you are simply more likely to be picked at all. Put another way, your own existence is a piece of evidence that you should condition upon.

Now, P (Blue) = P(Blue I Heads world) . P (Heads world) + P (Blue I Tails world) . P (Tails world) = 1. 1/3 + 1/2 . 2/3 = 1/3 + 1/3 = 2/3

So using Bayes’ Theorem, P (Tails world I Blue) = P (Blue I Tails world) . P (Tails world) / P (Blue) = 1/2 . (2/3) / 2/3 = 1/2.

References and Links

Fun with the Anthropic Principle. Aaronson, S. https://www.scottaaronson.com/democritus/lec17.html

Self-Locating Beliefs in Big Worlds: Cosmology’s Missing Link to Observation. Section VI: An Illustration. http://philsci-archive.pitt.edu/1625/1/Big_Worlds_preprint.PDF

Self-Sampling Assumption. Wikipedia. https://en.m.wikipedia.org/wiki/Self-sampling_assumption

Self-Indication Assumption. Wikipedia. https://en.m.wikipedia.org/wiki/Self-indication_assumption

Benford’s Law – in a nutshell.

Benford’s Law is one of those laws of statistics that defies common intuition. Essentially, it states that if we randomly select a number from a table of real-life data, the probability that the first digit will be one particular number is significantly different to it being a different number. For example, the probability that the first digit will be a ‘1’ is about 30 per cent, rather than the intuitive 11 per cent or so, which assumes that all digits from 1 to 9 are equally likely. In particular, Benford’s Law applies to the distribution of leading digits in naturally occurring phenomena, such as the population of different countries or the heights of mountains. For example, choose a paper with a lot of numbers, and now circle the numbers that occur naturally, such as stock prices. So lengths of rivers and lakes could be included, but not artificial numbers like telephone numbers. About 30 per cent of these numbers will start with a 1, and it doesn’t matter what units they are in. So the lengths of rivers could be denominated in kilometres, miles, feet, centimetres, without it making a difference to the distribution frequency of the digits. Empirical support for this distribution can be traced to the man after whom the Law is named, physicist Frank Benford, in a paper he published in 1938, called ‘The Law of Anomalous Numbers.’ In that paper he examined 20,229 sets of numbers, as diverse as baseball statistics, the areas of rivers, numbers in magazine articles and so forth, confirming the 30 per cent rule for number 1. For information, the chance of throwing up a ‘2’ as first digit is 17.6 per cent, and of a ‘9’ just 4.6 per cent.

This has clear implications for fraud detection. In particular, if declared returns or receipts deviate significantly from the Benford distribution, we have an automatic red flag which those tackling fraud are, or should be, aware of.

To explain the basis of Benford’s Law, take £1 as a base. Assume this now grows at 10 per cent per day.

£1.10, £1.21, £1.33, £1.46, £1.61, £1.77, £1.94, £2.14, £2.35, £2.59, £2.85, £3.13, £3.45, £3.80, £4.18, £4.59, £5.05, £5.56, £6.11, £6.72, £7.40, £8.14, £8.95, £9.84, £10.83, £11.92, £13.11, £14.42, £15.86, £17.45, £19.19, £21.11, £23.22, £25.50, £28.10, £30.91, £34.00, £37.40, £41.14, £45.26, £49.79, £54.74, £60.24, £72.89, £80.18, £88.20, £97.02 …

So we see that the leading digits stay a long time in the teens, less in the 20s, and so on through the 90s, and this pattern continues through three digits and so forth. Benford noticed that the probability that a number starts with n = log (n+1) – log (n), so that:

NB log10 1 = 0; log10 2 = 0.301; log10 3 = 0.4771 … log10 10 = 1.

Leading digit                                    Probability

  • 1                                                                 30.1%
  • 2                                                                 17.6%
  • 3                                                                 12.5%
  • 4                                                                 9.7%
  • 5                                                                 7.9%
  • 6                                                                 6.7%
  • 7                                                                 5.8%
  • 8                                                                 5.1%
  • 9                                                                 4.6%

References and Links

Benford, F. (1938). The Law of Anomalous Numbers. Proceedings of the American Philosophical Society, 78, 4, 551-572. http://datacolada.org/wp-content/uploads/2018/08/4409-Benford-1938-Law-of-anomalous-numbers.pdf

What is Benford’s Law? StatisticsHowTo. https://www.statisticshowto.datasciencecentral.com/benfords-law/

Benford’s Law. DataGenetics. http://datagenetics.com/blog/march52012/index.html

Benford’s Law. Wikipedia. https://en.wikipedia.org/wiki/Benford%27s_law

The ‘Slower Lane’ Paradox – in a nutshell.

Is the line next to you at the check-in at the airport or the check-out at the supermarket really always quicker than the one you are in? Is the traffic in the neighbouring lane always moving a bit more quickly than your lane? Or does it just seem that way?

One explanation is to appeal to basic human psychology. For example, is it an illusion caused by us being more likely to glance over at the neighbouring lane when we are progressing forward slowly than quickly? Is it a consequence of the fact that we tend to look forwards rather than backwards, so vehicles that are overtaken become forgotten very quickly, whereas those that remain in front continue to torment us? Do we take more notice, or remember for longer the times we are passed than when we pass others? If this is the complete explanation, it seems we should passively accept our lot. On the other hand, perhaps we really are more often than not in the slower lane. If so, there may be a reason. Let me explain using an example.

How big is the smallest fish in the pond? You catch sixty fish, all of which are more than six inches long. Does this evidence add support to a hypothesis that all the fish in the pond are longer than six inches? Only if your net is able to catch fish smaller than six inches. What if the holes in the net allow smaller fish to pass through? This may be described as a selection effect, or an observation bias.

Apply the same principle to your place in the line or the lane.

To understand the effect in this context we need to ask, ‘For a randomly selected person, are the people or vehicles in the next line or lane actually moving faster?’

Well, one obvious reason why we might be in a slower lane is that there are more vehicles in it than in the neighbouring lane. This means that more of our time is spent in the slower lane. In particular, cars travelling at greater speeds are normally more spread out than slower cars, so that over a given stretch of road there are likely to be more cars in the slower lane, which means that more of the average driver’s time is spent in the slower lane or lanes. This is known as an observer selection effect, a key idea in the theory of which is that observers should reason as if they were a random sample from the set of all observers. In other words, when making observations of the speed of cars in the next lane, or the progress of the neighbouring line to the cashier, it is important to consider yourself as a random observer, and think about the implications of this for your observation.

To put it another way, if you are in a line and think of your present observation as a random sample from all the observations made by all relevant observers, then the probability is that your observation will be made from the perspective that most drivers have, which is the viewpoint of the slower moving queue, as that is where more observers are likely to be. It is because most observers are in the slower lane, therefore, that a typical or randomly selected driver will not only seem to be in the slower lane but actually will be in the slower lane. Let’s put it this way. If there are 20 in the slower lane and 10 in the equivalent section of the fast lane, there is a 2/3 chance that you are in the slow lane.

In other words, you actually are, on the average, in the slower lane, because slow lanes are (on average) the ones that have more vehicles in. So you are more likely to be in these lanes than in the faster moving ones which more vehicles are in. You won’t always be in the slower lane, but ‘on the average’ is the key proviso. But when you consider all the lanes you join, you will be more likely to be in the crowded lanes. This is where most people are, and you are one of these people.

So the next time you think that the other lane is faster, be aware that it very probably is.

References and Links

Cars in the next lane really do go faster. Nick Bostrom. +Plus magazine. December 1, 2000. https://plus.maths.org/content/os/issue17/features/traffic/index