Skip to content

Who shot Sir Caliban? A Case for the Bayesian Detective.

A murder has been committed. There are five suspects, all of whom we consider equally likely to be guilty at the start of the investigation.

So 20 per cent is the prior probability of guilt for each suspect, before any new evidence is found. The names of the suspects are: Reverend Green, Colonel Mustard, Miss Scarlett, Professor Plum and Mrs. Peacock. The codename for the murder investigation is Operation Cluedo. The victim was Sir Caliban Mackenzie, a famed anthropologist, who was shot in the library while examining a rare first edition of Newton’s Principia.

Four hours into the investigation, evidence turns up which eliminates Reverend Green. He was leading the Holy Communion Service in the chapel at the time of the murder. There are now four remaining suspects, and so the probability that each of the remaining suspects is guilty rises to 25 per cent.

Two hours later, a new clue now arises which casts some doubt on the alibi of Colonel Mustard, whose probability of guilt we now judge to rise from 25 per cent to 40 per cent.

As a result, the probability that one of the other three suspects is guilty falls by 15 per cent, down from a total of 75 per cent to 60 per cent. Since each of the three is equally likely to be guilty, we can now assign each a probability of guilt of 20 per cent, down from 25 per cent.

After a further 45 minutes, a third clue emerges, which eliminates Mrs. Peacock. She had been spotted by a number of reliable witnesses at the Communion service in the chapel along with Reverend Green.

So the big question is how should we now adjust the probabilities that Colonel Mustard, Miss Scarlett and Professor Plum pulled the trigger?

In other words, now that Mrs. Peacock has been eliminated, and taking account of the evidence which doubled the original likelihood that Colonel Mustard wielded the murder weapon (to 40 per cent), what is the best estimate of the revised probability that each of Mustard, Scarlett and Plum committed the murder?

 

Spoiler Alert: The Solution

 

One possibility would be take the 20 per cent probability of guilt we had previously attached to Mrs. Peacock, and divide this equally between the three remaining suspects.

But to do so would be wrong, and notably at variance with the toolkit of a Bayesian detective, i.e. a detective who conducts investigations using the Bayesian approach to evidence and probability.

The Bayesian approach to detective work tells us always to consider the prior probability that each suspect is guilty before deducing the probability after some new evidence is brought to bear on it. Applying this method, the correct way to adjust the probabilities attached to the remaining suspects is to do so in a way that is proportional to their prior probability of guilt before Mrs. Peacock was eliminated from the enquiry.

Since Colonel Mustard was the prime suspect, with a probability of guilt of 40 per cent before Peacock’s elimination (compared to 20 per cent for Miss Scarlett and Professor Plum), a good Bayesian needs to increase the probability we assign to his guilt by twice as much as we increase theirs. So we should now raise the estimate of the probability that Colonel Mustard shot Sir Caliban from 40 per cent to 50 per cent, while we should increase the probability we assign to Miss Scarlett and Professor Plum from 20 per cent to 25 per cent.

This is all derived from Bayes’ Theorem, which tells us that in order to calculate the probability of a hypothesis being true given new evidence, we must multiply by the prior probability of the hypothesis being true before we are aware of the new evidence (Mrs. Peacock’s elimination from the enquiry). This prior probability is twice as big for Colonel Mustard as for either of the other remaining suspects.

Epilogue:

The estimated 50 per cent probability of guilt was more than sufficient to persuade the Crown Prosecution Service to haul the Colonel before a jury of his peers. In the event they convicted him, falling victim to the classic Prosecutor’s Fallacy, which is to confuse the probability that someone is guilty given the evidence with the probability of the evidence arising if someone is guilty. The likelihood of Sir Caliban being shot in the library if the Colonel was guilty of murder was quite high, and this led to his conviction. Unfortunately for the Colonel, the relevant probability (that he was guilty of murder given that Sir Caliban was shot in the library) was rather smaller but bypassed in the jury’s deliberations.

Meanwhile, the actual killer, Miss Scarlett, got away scot-free. She had concealed an incriminating letter in the Principia, thinking it would be safe there, until Sir Caliban unhappily chanced upon it. This left her no option, in her mind, but to use the pistol hidden in the Georgian chest of drawers gracing the back wall of the library.

The Colonel’s appeal was unanimously rejected. He is serving a life sentence. Miss Scarlett is  living as a tax exile in Belize.

 

Further Reading and Links

https://selectnetworks.net/

 

 

 

The Nash Equilibrium may be the most important idea in economics. Here’s why!

If there is a set of ‘game’ strategies with the property that no ‘player’ can benefit by changing their strategy while the other players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute what is known as the ‘Nash equilibrium’.

Assume each Player can adopt a ‘Friendly’ or a ‘Hostile’ approach to the game. For example, a Friendly strategy might be to put down the weapon you are carrying in your hand. The Hostile strategy is to keep hold of it.

Now, depending on their respective actions let’s say the game organiser awards monetary payoffs to each player. These are shown below and are known to each player.

Player B ‘Friendly’ Player B ‘Hostile’
   
Player A ‘Friendly’ 750 to A; 1000 to B 25 to A; 2000 to B
Player A ‘Hostile’ 1000 to A; 50 to B 30 to A; 51 to B

 

What is Player A’s best response to each of Player B’s actions?

If Player B acts ‘Friendly’, player A’s best payoff is if he acts ‘Hostile.’ This yields a payoff of 1000. If he had acted ‘Friendly’ he would have earned a payoff of only 750.

If Player B acts ‘Hostile’, player A’s best response is if he acts ‘Hostile. He earns 30 instead of a payoff of 25 if he acted ‘Friendly.’

In both cases his best response is to act ‘Hostile’. 

What is Player B’s best response to each of Player A’s actions?

If Player A acts ‘Friendly’, player B’s best payoff is if he acts ‘Hostile.’ This yields a payoff of 2000. If he had acted ‘Friendly’ he would have earned a payoff of only 1000.

If Player A acts ‘Hostile’, player B’s best response is if he acts ‘Hostile’. He earns 51 instead of a payoff of 50 if he acted ‘Friendly.’

In both cases his best response is to act ‘Hostile.’

Now, a Nash Equilibrium exists when Player B’s best response is the same as Player A’s best response. In this case, both Player A and Player B have the same best response to either action of his opponent. Both should act ‘Hostile’, in which case Player A wins 30 and Player B wins 51.

But if both had been able to communicate and reach a joint, enforceable decision, they would both presumably have acted ‘Friendly.’

Let’s now turn to the world of espionage in seeking out a Nash equilibrium. Let’s assume that there are two possible codes, and Agent A can select either of them and so can Agent B. The payoff to selecting non-matching codes is zero.

Agent B ‘Uses Code A’ Agent B ‘Uses Code B’
Agent A ‘Uses Code A’ 1000 to A; 500 to B 0 to A; 0 to B
Agent A ‘Uses Code B’ 0      to A; 0 to B 500 to A; 1000 to B

 

So where is the Nash equilibrium?

Top left: Neither Agent can increase their payoff by choosing a different action to the current one. In other words, there is no incentive for either Agent to switch given the strategy of the other Agent. So this is a Nash equilibrium.

Bottom right: As above

Top right:  By choosing to switch to Code B instead of code A, Agent A obtains a payoff of 500, given Agent B’s actions. Similarly for Agent B, who would gain by switching to code A, given Agent A’s strategy. So top right (Agent A uses code A and Agent B uses Code B) is NOT a Nash equilibrium, as both Agents have an incentive to switch given what the other Agent is doing.

Bottom left is the same as Top right. As above, there are incentives to switch. So it is NOT a Nash equilibrium.

In conclusion, this game has two Nash equilibria, top left (Agent A and Agent B both use Code A) and bottom right (Agent A and Agent B both use Code B).

Turning now to the classic Safe/Crash problem. In this problem if both drivers drive on the left of the road, they will be safe, while they will crash if one decides to adhere to one side of the road and the other to the opposite. This is shown in the box diagram below.

Driver B ‘ Drives on left’ Driver B ‘Drives on right’
Driver A ‘Drives on left’ Safe; Safe Crash; Crash
Driver A ‘Drives on right’ Crash; Crash Safe; Safe

 

So there are again two Nash equilibria here. Top left and Bottom right. In both these scenarios, there is no incentive for either Driver to switch to the other side of the road given the driving strategy of the other driver.

Now let’s consider the case of two companies who each have the option of using one of two emblems. We shall call the first the Blue Badger emblem and the other the Black Bull emblem.

Firm B uses Black Bull emblem Firm B uses Blue Badger emblem
Firm A uses Black Bullemblem 1000 to A, 500 to B 500 to A, 1000 to B
Firm A uses Blue Badger emblem 500 to A, 1000 to B 1000 to A, 500 to B

 

If we consider each section in turn, we arrive at the following result.

Top left: Firm B gains by switching from the Black Bull to the Blue Badger emblem.

Top right: Firm A gains by switching from the Black Bull to the Blue Badger emblem.

Bottom left: Firm A gains by switching from the Blue Badger to the Black Bull emblem.

Bottom right: Firm B gains by switching from the Blue Badger to the Black Bull emblem.

So this game has no Nash equilibrium.

So we have highlighted examples of games with one, two, and no Nash equilibria.

 

This leads us to the classic ‘Prisoner’s Dilemma’ problem. Are there any Nash equilibria here, and if so how many? In this scenario, two prisoners, linked to the same crime, are offered a discount on their prison terms for confessing if the other prisoner continues to deny it, in which case the other prisoner will receive a much stiffer sentence. However, they will both be better off if both deny the crime than if both confess to it. The problem each faces is that they can’t communicate and strike an enforceable deal. The box diagram below shows an example of the Prisoner’s Dilemma in action.

Prisoner 2 Confesses Prisoner 2 Denies
Prisoner 1 Confesses 2 years each Freedom for P1; 8 years for P2
Prisoner 1 Denies 8 years for P1; Freedom for P2 1  year each

 

The Nash Equilibrium is for both to confess, in which case they will both receive 2 years. But this is not the outcome they would have chosen if they could have agreed in advance to a mutually enforceable deal. In that case they would have chosen a scenario where both denied the crime and received 1 year each.

So, to summarise, a Nash equilibrium is a stable state that involves interacting participants in which none can gain by a change of strategy as long as the other participants remain unchanged. It is not necessarily the best outcome for the parties involved, but it is the outcome we would predict in a non-cooperative game of rational, self-interested actors.

 

The Number Puzzle That Can Make You a Fortune

The Question: Choose an integer number between 0 and 100. You win a prize if your number is equal or closest to 2/3 of the average number chosen by all other participants. What number should you choose?

If you think that the other participants will choose a random number within the range, the average will be 50. Hence you choose 33.

But hang on. Just as you chose 33, so presumably will other participants, at least on average, based on your same line of reasoning. So if the average number chosen by all participants is 33, then the smart thing to do is to choose 22.

But do you really think you are smarter than the others? Just as you figured out that 22 is the smart choice, so will others, at least on average. So the super smart thing to do is to choose 15.

But … We are heading towards 0 (you get there after 12 iterations). Zero is the only rational choice to make if you don’t think you are smarter than the other participants.

You start to get the strong feeling that if you choose 0 you are not going to win the prize. This is because, although you don’t think you are smarter than most, it is reasonable to assume that at least some of the players are not as smart or rational as you.

For example, if 10 per cent of players are totally naïve and choose a random number – 50 on average – then the overall average will be 5 and the right answer will be 3.

However, if the rest of the players share your thoughts and assumptions, they will also choose 3, thereby increasing the average to 8 and the right answer to 5.  Then you answer 5, but so will the rest, thus increasing the right answer to 6.

The process converges to 8.

Well, 8 is the right answer if 90 per cent of players are as smart as you are and 10 per cent are totally naïve.

If 20 per cent are naïve, the process converges to 14; with 30 per cent it converges to 18, and so on.

But then it may also be the case that the less rational players are not totally naïve (Level 0 rationality) but, for example, exhibit Level 1 rationality, where the average answer is 33.

In this case, with 10 per cent Level 1 players the process converges to 5; with 20 per cent to 9; with 30 per cent to 12, and so on. Of course, there are plenty more combinations, with varying proportions of players at Level 0, Level 1, Level 2 and so on.

The higher the winning number, the larger is the percentage of less rational players in the game.

In an experiment conducted with Financial Times readers by economist Richard Thaler, made up of 1,476 participants, the winning number was in fact 13.

This is roughly consistent with:

1. All players exhibit Level 3 rationality

OR 2. 80% are fully rational and 20% are totally naïve.

OR 3. 70% are fully rational and 30% exhibit Level 1 rationality.

Etc.

John Maynard Keynes, in Chapter 12 of his ‘General Theory of Employment, Interest and Money’, frames the paradox in terms of the money markets, in a more prosaic way:

“Professional investment may be likened to those newspaper competitions in which the competitors have to pick out the six prettiest faces from a hundred photographs, the prize being awarded to the competitor whose choice most nearly corresponds to the average preferences of the competitors as a whole; so that each competitor has to pick, not those faces which he himself finds prettiest, but those which he thinks likeliest to catch the fancy of the other competitors, all of whom are looking at the problem from the same point of view. It is not a case of choosing those which, to the best of one’s judgment, are really the prettiest, not even those which average opinion genuinely thinks the prettiest. We have reached the third degree where we devote our intelligences to anticipating what average opinion expects the average opinion to be. And there are some, I believe, who practise the fourth, fifth and higher degrees.”

In other words, it is those who are able to best out-guess the best guesses of the rest of the crowd, who stand to win the prize.

Or put another way, the ten pound note you spot lying on the floor might well be real after all. Nobody has picked it up yet because they have all assumed that someone else would have picked it up if it were real. You realise that everyone else is thinking like this, and you win yourself a tenner. Let’s call that super-rationality. Ultimately, it’s the kind of rationality that can make you a fortune.

Further Reading and Links

https://selectnetworks.net

Bosch-Domenesch, J.G. Montalvo, R. Nagel and A. Satorra (2002), One, Two, Three, Infinity, … :Newspaper and Lab Beauty-Contest Experiments, American Economic Review, 1687-1701, December.

http://blog.massimofuggetta.com/2012/10/16/beauty-contests/

The Butch v. The Brain. The Story of the Greatest Dice Game in History.

This is a true story about New York gambling-house operator, Fat the Butch, who made his fortune booking dice games. In 1952 he was famously challenged by a bigtime gambler known as The Brain to a simple wager. The bet was an even-money proposition that the Butch could throw a double-six in 21 rolls of the dice. On the face of it, the edge seems to be with Butch. After all, there are 36 possible combinations that could come up when throwing two dice, from 1-1, 1-2, 1-3, to 6-4, 6-5, 6-6. Intuition would suggest, therefore, that 18 throws should give you a 50-50 chance of throwing any one of these combinations, including a double-six. In 21 throws, the chance of a double-six should, therefore, be more than 50-50. On this basis, the Butch accepted the even bet at $1,000 a roll. After twelve hours of rolling, the Brain was $49,000 up, at which point the Butch called it a day, sensing that something was wrong with his strategy.

The Brain had in fact profited from a classic probability puzzle known as the Chevaliers Dice problem, which can be traced to the 17th French gambler and bon vivant, Antoine Gombaud, better known as the Chevalier de Méré.

The Chevalier would agree even money odds that in four rolls of a single die he would get at least one six. His logic seemed impeccable. The Chevalier reasoned that since the chance that a 6 will come up in any one roll of the die is 1 in 6, then the chance of getting a 6 in four rolls is 4/6, or 2/3, which is a good bet at even money.

If the probability was a half, he would break even at even money. For example, in 300 games, at 1 French franc a game, he would stake 300 francs and expect to win 150 times, returning him 150 francs for each win with his stake returned on each occasion (total of 300 francs). With a probability of 2/3, he would expect to win 200 times, yielding a good profit.

It is easy to show intuitively that this reasoning is faulty, for if it were correct, then we would calculate the chance of a 6 in five rolls of the die as 5/6. And that therefore the chance of a 6 in six rolls of the die would be 6/6 = 100%, and in 7 rolls, 7/6!!! Something is therefore clearly wrong here.

Still, even though his reasoning was faulty, he continued to make a profit by playing the game at even money. To see why, we need to calculate the true probability of getting a 6 in four rolls of the die. The key idea here is that the number that comes up on each roll is independent of any other rolls, i.e. dice have no memory. Since each event is independent, we can (according to the laws of probability) multiply the probabilities.

So the probability of a 6 followed by a 6, followed by a 6, followed by a 6, is: 1/6 x 1/6 x 1/6 x 1/6 = 1/1296.

So what is the chance of getting at least one six in four rolls of the die?

Since the probability of getting a 6 in any one roll of the die = 1/6, the probability of NOT getting a 6 in any one roll of the die = 5/6.

So the chance of NOT getting a 6 in four rolls of the die is:

5/6 x 5/6 x 5/6 x 5/6 = 625/1296

So the chance of getting at least one 6 is 1 minus this, i.e. 1 (625/1296) = 671/1296 = 0.5177, which > 0.5.

So, the odds are still in favour of the Chevalier, since he is agreeing even money odds on an event with a probability of 51.77%.

This was all very well as long as it lasted, but eventually the Chevalier decided to branch out and invent a new, slightly modified game. In the new game, he asked for even money odds that a pair of dice, when rolled 24 times, will come up with a double-6 at least once. His reasoning was the same as before, and quite similar to the reasoning employed by the Butch.

If the chance of a 6 on one roll of the die is 1/6, then the chance of a double-6 when two dice are thrown = 1/6 x 1/6 (as they are independent events) = 1/36.

So, reasoned the Chevalier, the chance of at least one double-6 in 24 throws is: 24/36 = 2/3.

So this is very profitable game for the Chevalier. Or is it?

No it isnt, and this time Monsieur Gombaud paid for his faulty reasoning. He started losing. In desperation, he consulted the mathematician and philosopher, Blaise Pascal.

Pascal derived the correct probabilities as follows:

The probability of a double-6 in one throw of a pair of dice = 1/6 x 1/6 = 1/36.

So the probability of NO double-6 in one throw of a pair of dice = 35/36.

So, the probability of no double-6 in 24 throws of a pair of dice = 35/36 x 35/36   24 times = 35/36 to the power of 24, i.e. (35/36)24  = 0.5086.

So probability of at least one double-6 is 1 minus this, i.e. 1 0.5086 = 0.4914, i.e. less than 0.5

Under the terms of the new game, the Chevalier was betting at even money on a game which he lost more often than he won.

It was an error that the Butch was to repeat almost 300 years later!

Meantime, that letter from the Chevalier de Mere to Blaise Pascal was to lead to a historic correspondence between Pascal and Pierre Fermat (of Fermats Last Theorem fame) which was to lay the groundwork of modern probability theory. All from a dice game!

 

Further Reading and Links

https://selectnetworks.net

An Exercise in Elementary Logic

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

You are presented with four cards, with the face-up side on display, showing either a letter or a number. You are promised that each has a letter on one side and a number on the other.

Red Card displays the letter D

Orange Card displays the letter N

Blue Card displays the number 21

Yellow Card displays the number 16

You are now presented with the following statement: Every card with D on one side has 21 on the other side.

The Question is: What is the minimum number of cards needed to determine whether this statement is true? What are the colours of the cards you need to turn over to determine this?

Think about it: Do you need to turn over the Red Card? Do you need to turn over the Orange Card? Do you need to turn over the Blue Card? Do you need to turn over the Yellow Card?

Spoiler Alert (Solution).

When given this puzzle to solve, the great majority get it wrong.

You must turn over the Red Card to see if it has 21 on the other side. If it does not, the statement is false.

You must turn over the Blue Card to see if it has N on the other side. If it does, the statement is false.

Turning over the Orange Card does not help you verify or falsify the statement.

Turning over the Yellow Card does not help you verify or falsify the statement.

So the minimum number of cards need to determine whether the statement is true is two, and they are the Red Card and the Blue Card.

Bonus Question (The Tyre Problem)

Two employees turn up late to an important meeting. They claim that one of the tyres on their car had a puncture, but it is a lie.

Their suspicious boss send them to separate rooms and asks each of them to write down which tyre was punctured.

The Question: Assuming they have not colluded beforehand, and have no particular reason to think that one tyre is more likely to have been punctured, what is the likelihood that they will randomly name the same tyre?

Is it (1/4)2         = 1 in 16?

Think about it: There are four tyres. Each employee is choosing a tyre randomly and independently of one another. Maybe it is easier to think of a two-wheeled vehicle. In the same scenario, what is the likelihood they will randomly name the same tyre if they arrived on a motor bike? Is it (1/2) x (1/2) = 1 in 4?

Spoiler Alert (Solution)

Once the first employee randomly chooses a tyre on the car, there is a 1 in 4 chance that the other employee will choose the same one, e.g. if employee 1 chooses front left tyre, employee 2 has a 1 in 4 chance of randomly selecting the same one. Similarly, if the first employee randomly chose the back tyre on the motor bike, the chance that the second employee would come up with the same tyre is 1 in 2.

Further Reading and Links

https://selectnetworks.net

The Salem Witches Problem – Can they use game theory to survive?

Two suspected witches of Salem are subjected to a test by the Witchfinder General.

To ascertain whether they have magical powers of telepathy (They haven’t, by the way) they will be separated and seated at a table in the blue room (Suspect 1) and the yellow room (Suspect 2). They will be unable to see each other or communicate in any way.

Before being separated they are allowed a few private moments together.

After being separated, they are given a deck of cards each and asked to extract one card from the deck.

They are allowed to look at their chosen card if they wish, but what they must actually do is to name the colour of the card that the other suspect has drawn.

It is a standard deck of cards, so there is a 1 in 2 chance the chosen card is black, and the same that it is red.

The game will be repeated ten times, to reduce the chance that they will survive by simple good fortune.

If in any round they both correctly identify the colour of the other person’s card, then they will both die.

If both suspects are wrong, or one is wrong, in every round, then both are free to go.

There are two questions:

1.     What is the probability they will survive by chance?

2.     Is there a co-operative strategy they could agree on before being separated to guarantee they both survive?

Think about it: In any round, what is the chance that each suspect will correctly name the colour of the other suspect’s card? A half? A quarter? What about over ten successive rounds?

To survive, they must avoid this over ten rounds. Is there a way they can take chance out of it, and make sure that at least one of them names the wrong colour for the other suspect’s card, for ten rounds in a row.

If so, that is the door to freedom. Remember that they can secretly hatch a joint strategy and they either both survive or both die, so they can trust each other to stick to the plan, if there is one.

 

Spoiler Alert (The Solution)

In the first round, the chance that the suspect in the blue room will correctly name the colour of the other suspect’s card is ½. Similarly for the suspect in the yellow room.

These are independent events, so the probability of being condemned after first hands are dealt (i.e. both name the colour of the other suspect’s card correctly) = ½ x ½ = ¼.

So probability of surviving first hand = ¾

Probability of surviving 10 hands = (3/4)10 = 0.0563, i.e. 5.63%

But there is a strategy to ensure survival, if they can agree on it before.

 

Can you work it out?

The solution is for player 1 to guess the same colour as his own card, and player 2 to guess a different colour to his card. This way  they will always survive.

Thus:

Red Red gives Red Black – they survive.

Black Black gives Black Red – they survive.

Black Red gives Black Black – they survive.

Red Black gives Red Red – they survive.

To better conceal the strategy, they could also decide to alternate roles.

This is the optimal outcome in a game where the two players are able to co-ordinate a strategy in advance, and where trust is guaranteed because they both stand to gain by sticking to the strategy.

There are other scenarios in which the superior strategy from the point of view of one or both players is to defect from the strategy they would adopt if they were free to strike an enforceable deal. One such scenario is known as the Prisoner’s Dilemma problem. In this problem, the optimal strategy for each player, when a deal cannot be enforced, is to choose a strategy worse than they could reach co-operatively.

 

Further Reading and Links

https://selectnetworks.net

 

 

 

 

The First Digit Law and how it can help beat fraud.

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

Benford’s Law is one of those laws of statistics that defies common intuition. Essentially, it states that if we randomly select a number from a table of real-life data, the probability that the first digit will be one particular number is significantly different to it being a different number. For example, the probability that the first digit will be a ‘1’ is about 30 per cent, rather than the intuitive 11 per cent or so, which assumes that all digits from 1 to 9 are equally likely. In particular, Benford’s Law applies to the distribution of leading digits in naturally occurring phenomena, such as the population of different countries or the heights of mountains. For example, choose a paper with a lot of numbers, and now circle the numbers that occur naturally, such as stock prices. So lengths of rivers and lakes could be included, but not artificial numbers like telephone numbers. About 30 per cent of these numbers will start with a 1, and it doesn’t matter what units they are in. So the lengths of rivers could be denominated in kilometres, miles, feet, centimetres, without it making a difference to the distribution frequency of the digits. Empirical support for this distribution can be traced to the man after whom the Law is named, physicist Frank Benford, in a paper he published in 1938, called ‘The Law of Anomalous Numbers.’ In that paper he examined 20,229 sets of numbers, as diverse as baseball statistics, the areas of rivers, numbers in magazine articles and so forth, confirming the 30 per cent rule for number 1. For information, the chance of throwing up a ‘2’ as first digit is 17.6 per cent, and of a ‘9’ just 4.6 per cent.

This has clear implications for fraud detection. In particular, if declared returns or receipts deviate significantly from the Benford distribution, we have an automatic red flag which those tackling fraud are, or should be, aware of.

To explain the basis of Benford’s Law, take £1 as a base. Assume this now grows at 10 per cent per day.

£1.10, £1.21, £1.33, £1.46, £1.61, £1.77, £1.94, £2.14, £2.35, £2.59, £2.85, £3.13, £3.45, £3.80, £4.18, £4.59, £5.05, £5.56, £6.11, £6.72, £7.40, £8.14, £8.95, £9.84, £10.83, £11.92, £13.11, £14.42, £15.86, £17.45, £19.19, £21.11, £23.22, £25.50, £28.10, £30.91, £34.00, £37.40, £41.14, £45.26, £49.79, £54.74, £60.24, £72.89, £80.18, £88.20, £97.02 …

So we see that the leading digits stay a long time in the teens, less in the 20s, and so on through the 90s, and this pattern continues through three digits and so forth. Benford noticed that the probability that a number starts with n = log (n+1) – log (n), so that:

NB log10 1 = 0; log10 2 = 0.301; log10 3 = 0.4771 … log10 10 = 1.

Leading digit                                    Probability

      1                                                                 30.1%

      2                                                                 17.6%

      3                                                                 12.5%

      4                                                                 9.7%

      5                                                                 7.9%

      6                                                                 6.7%

      7                                                                 5.8%

      8                                                                 5.1%

      9                                                                 4.6%

Further Reading and Links:

https://selectnetworks.net

http://www.rexswain.com/benford.html

http://www.jstor.org/pss/984802

https://phys.org/news/2007-05-law-digits-scientists.html

Mathematical Magic:The Bus Problem

One of the classic problems of Mathemagistics, or Mathematical Magic, is the Bus Problem. It goes like this:

Question:

Every day, Fred gets the solitary 8 am bus to work. There is no other bus that will get him to his destination.

10 per cent of the time the bus is early and leaves before he arrives at 8 am.

10 per cent of the time the bus is late and leaves after 8.10 am.

The rest of the time the bus departs between 8 am and 8.10 am.

One morning Fred arrives at the bus stop at 8 am, sees no bus, and waits for 10 minutes without the bus arriving.

Now, what is the probability that Fred’s bus will still arrive?

Think about it:

Fred’s bus could yet arrive or he might have missed it. So there are two possibilities. So is it correct to assume that in the absence of further evidence the chance of each must be equal, so the probability at 8.10am that his bus will still arrive is 50 per cent?

But if that is the answer at 8.10am, was it also the correct answer at 8 am?

Or was 50 per cent the correct answer at 8am but not at 8.10am?

Or is it the wrong answer at both times, but was correct at 8.05am?

The solution is posted below.

Spoiler Alert (Solution):

Solution

When  Fred arrives at 8am, there is a 10 per cent chance that his bus will have already left. After Fred has waited for 10 minutes, he can eliminate the 80 per cent chance of the bus arriving in the period between 8 am and 8.10 am. So only two possibilities remain.

Either the bus has arrived ahead of schedule or it will arrive more than ten minutes late.

Both outcomes are unusual, but since the two outcomes are mutually exclusive and equally likely (10 per cent chance of each), and there are no other possibilities, we should update the probability that the bus will still arrive from 10 per cent (the likelihood, or prior probability, when Fred woke up) to 50 per cent, as there is (once the 80 per cent probability is eliminated) an equal probability (out of the remaining 20%) that the bus will still turn up and that he has missed it. So there is a 1 in 2 chance that he will still catch his bus if he has the patience to wait further, and a 1 in 2 chance that he will wait in vain. The follow-up question is how long he should wait. That’s for another day.

Puzzle Extra:

Every day, Fred gets the solitary 8 am bus to work. There is no other bus that will get him to his destination.

10 per cent of the time the bus is early and leaves before he arrives at 8 am.

30 per cent of the time the bus is late and leaves after 8.10 am.

The rest of the time the bus departs between 8 am and 8.10 am.

One morning Fred arrives at the bus stop at 8 am, sees no bus, and waits for 10 minutes without the bus arriving.

Now, what is the probability that Fred’s bus will still arrive?

Spoiler Alert (Solution):

Solution

When  Fred arrives at 8am, there is a 10 per cent chance that his bus will have already left. After Fred has waited for 10 minutes, he can eliminate the 60 per cent chance of the bus arriving in the period between 8 am and 8.10 am. So only two possibilities remain, that the bus has already left early or it will still arrive – more than ten minutes late.

Both outcomes are less likely than that the bus would arrive between 8 and 8.10am, but since the two outcomes are mutually exclusive (10 per cent chance and 30 per cent chance respectively), and there are no other possibilities, we should update the probability that the bus will still arrive from 30 per cent to something else, and the probability that it arrived early from 10 per cent to something else. Once the 60 per cent probability is eliminated, this probability should be distributed (using Bayesian principles) in the ratio of the prior probabilities of the remaining options (3 to 1 in favour of it arriving late), so 45 of the 60 per cent should be added to the prior probability of 30 per cent that it is still to arrive, and 15 of the 60 per cent should be added to the prior probability of 10 per cent that it arrived before 8am. So, at 8.10am there remains a 75 per cent that the bus will still arrive and a 25 per cent that it has already arrived and left.

 

Further Reading and Links

https://selectnetworks.net

How much should we stake when we have the edge?

How much should we bet when we believe the odds are in our favour. The answer to this question was first formalised in 1956, by daredevil pilot, recreational gunslinger and physicist John L. Kelly, Jr. at Bell Labs. The so-called Kelly Criterion is a formula employed to determine the optimal size of a series of bets when we have the advantage, in other words when the odds favour us. It takes account of the size of our edge over the market as well as the adverse impact of volatility. In other words, even when we have the edge, we can still go bankrupt along the way if we stake too much on any individual wager or series of wagers. 

Essentially, the Kelly strategy is to wager a proportion of our capital which is equivalent to our advantage at the available odds. So if we are being offered even money, and we back heads, and we are certain that the coin will come down heads, we have a 100% advantage. So the recommended wager is the total of our capital. If there is a 60% chance of heads, and a 40% chance of tails, our advantage is now 20%, and we are advised to stake accordingly. This is a simplified representation of the literature on Kelly, Half-Kelly, and other derivatives of same, but the bottom line is clear. It is just as important to know how much to stake as it is to gauge when we have the advantage. But it’s not easy unless we can accurately identify that advantage.

Put more technically, the Kelly criterion is the fraction of capital to wager to maximise compounded growth of capital. The problem it seeks to address is that even when there is an edge, beyond some threshold larger bets will result in lower compounded return because of the adverse impact of volatility. The Kelly criterion defines the threshold, and indicates the fraction that should be wagered to maximise compounded return over the long run (F), which is given by:  

F = Pw – (Pl/W)

where

F = Kelly criterion fraction of capital to bet

W = Amount won per amount wagered (i.e. win size divided by lose size)

Pw = Probability of winning

Pl = Probability of losing

When win size and loss size are equal, W = 1, and the formula reduces to:

F = Pw – Pl

For example, if a trader loses £1,000 on losing trades and gains £1,000 on winning trades, and 60 per cent of all trades are winning trades, the Kelly criterion indicates an optimal  trade size equal to 20 per cent (0.60-0.40 = 0.20). As another example, if a trader wins £2,000 on winning trades and loses £1,000 on losing trades, and the probability of winning and losing are both equal to 50 per cent, the Kelly criterion indicates an optimal trade size equal to 25 per cent of capital: 0.50- (0.50/2) = 0.25.

In other words, Kelly argues that, in the long run, we should wager a percentage of our bankroll equal to the expected profit divided by than the amount we would receive if we win.

Proportional over-betting is more harmful than under-betting. For example, betting half the Kelly criterion will reduce compounded return by 25 per cent, while betting double the Kelly criterion will eliminate 100 per cent of the gain. Betting more than double the Kelly criterion will result in an expected negative compounded return, regardless of the edge on any individual bet. The Kelly criterion implicitly assumes that there is no minimum bet size. This assumption prevents the possibility of total loss. If there is a minimum trade size, as is the case in most practical investment and trading situations, then ruin is possible if the amount falls below the minimum possible bet size.

So should we bet the full amount recommended by the Kelly Criterion? Not so according to sports betting legend, Bill Benter. Betting the full amount recommended by the Kelly formula, he says, is unwise for a number of reasons. Notably, he warns that accurate estimation of the advantage of the bets is critical; if we overestimate the advantage by more than a factor of two, Kelly betting will cause a negative rate of capital growth, and he says this is easily done. So, as he puts it “… full Kelly betting is a rough ride.” According to Benter, and I for one will  defer to his advice in these matters, a fractional Kelly betting strategy is advisable, that is, a strategy wherein one bets some fraction of the recommended Kelly bet (e.g. one half or one third). Ironically, John Kelly himself died in 1965, never having used his own criterion to make money.

So that’s the Kelly criterion. In a nutshell, the advice is only to bet when you believe you have the edge, and to do so using a stake size related to the size of the edge. Mathematically, it means betting a fraction of your capital equal to the size of your advantage. So, if you have a 20% edge at the odds, bet 20% of your capital. In the real world, however, we need to allow for errors that can creep in, like uncertainty as to the true edge, if any, that we have at the odds. So, unless we’re happy to risk a very bumpy ride, and we have total confidence in our judgment, a preferred strategy will to be stake a defined fraction of that amount, known as a fractional Kelly strategy. Purists will hate us for it, but it’s not their capital at risk. So if we are going to bet, the advice is to use Kelly, but with due caution, not least in the assessment of our advantage. And when the fun of betting stops, the best advice of all may of course be to just stop. Good luck!

 

Further Reading and Links

https://selectnetworks.net

 

Occam’s Razor, Leprechauns and The Search for Truth

William of Occam (also spelled William of Ockham) was a 14th century English philosopher. At the heart of Occam’s philosophy is the principle of simplicity, and Occam’s Razor has come to embody the method of eliminating unnecessary hypotheses. Essentially, Occam’s Razor holds that the theory which explains all (or the most) while assuming the least is the most likely to be correct. This is the principle of parsimony – explain more, assume less. Put more elegantly, it is the principle of ‘pluritas non est ponenda sine necessitate’ (plurality must never be posited beyond necessity).

Empirical support for the Razor can be drawn from the principle of ‘overfitting.’ In statistics, ‘overfitting’ occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. Critically, a model that has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data. For example, a complex polynomial function might after the fact be used to pass through each data point, including those generated by noise, but a linear function might be a better fit to the signal in the data. By this we mean that the linear function would predict new and unseen data points better than the polynomial function, although the polynomial which has been devised to capture signal and noise would describe/fit the existing data better.

Turning now to ‘ad hoc’ hypotheses and the Razor. In science and philosophy, an ‘ad hoc hypothesis’ is a hypothesis added to a theory in order to save it from being falsified. Ad hoc hypothesising is compensating for anomalies not anticipated by the theory in its unmodified form. For example, you say that there is a leprechaun in your garden shed. A visitor to the shed sees no leprechaun. This is because he is invisible, you say. He spreads flour on the ground to see the footprints. He floats, you declare. He wants you to ask him to speak. He has no voice, you say. More generally, for each accepted explanation of a phenomenon, there is generally an infinite number of possible, more complex alternatives. Each true explanation may therefore have had many alternatives that were simpler and false, but also approaching an infinite number of alternatives that are more complex and false.

This leads us the idea of what I term ‘Occam’s Leprechaun.’ Any new and more complex theory can always be possibly true. For example, if an individual claims that leprechauns were responsible for breaking a vase that he is suspected of breaking, the simpler explanation is that he is not telling the truth, but ongoing ad hoc explanations (e.g. “That’s not me on the CCTV, it’s a leprechaun disguised as me) prevent outright falsification. An endless supply of elaborate competing explanations, called ‘saving hypotheses’, prevent ultimate falsification of the leprechaun hypothesis, but appeal to Occam’s Razor helps steer us toward the probable truth. Another way of looking at this is that simpler theories are more easily falsifiable, and hence possess more empirical content.

All assumptions introduce possibilities for error; if an assumption does not improve the accuracy of a theory, its only effect is to increase the probability that the overall theory is wrong.

It can also be looked at this way. The prior probability that a theory based on n+1 assumptions is true must be less than a theory based on n assumptions, unless the additional assumption is a consequence of the previous assumptions. For example, the prior probability that Jack is a train driver must be less than the prior probability that Jack is a train driver AND that he owns a Mini Cooper, unless all train drivers own Mini Coopers, in which case the prior probabilities are identical.

Again, the prior probability that Jack is a train driver and a Mini Cooper owner and a ballet dancer is less than the prior probability that he is just the first two, unless all train drivers are not only Mini Cooper owners but also ballet dancers. In the latter case, the prior probabilities of the n and n+1 assumptions are the same.

From Bayes’ Theorem, we know that reducing the prior probability will reduce the posterior probability, i.e. the probability that a proposition is true after new evidence arises.

Science prefers the simplest explanation that is consistent with the data available at a given time, but even so the simplest explanation may be ruled out as new data become available. This does not invalidate the Razor, which does not state that simpler theories are necessarily more true than more complex theories, but that when more than one theory explains the same data, the simpler should be accorded more probabilistic weight.

The theory which explains all (or the most) and assumes the least is most likely. So Occam’s Razor advises us to keep explanations simple. But it is also consistent with multiplying entities necessary to explain a phenomenon. A simpler explanation which fails to explain as much as another more complex explanation is not necessarily the better one. So if leprechauns don’t explain anything they cannot be used as proxies for something else which can explain something. This is the classic riposte to the materialist who holds that there is nothing beyond what we observe in the natural or material world. If a non-materialist explanation better explains the origin of the universe, for example, that explanation may be true and consistent with Occam’s Razor. I explore this issue separately in my blog – ‘Why is there Something Rather than Nothing? A Solution’.

More generally, we can now unify Epicurus and Occam. From Epicurus’ Principle we need to keep open all hypotheses consistent with the known evidence which are true with a probability of more than zero. From Occam’s Razor we prefer from among all hypotheses that are consistent with the known evidence, the simplest. In terms of a prior distribution over hypotheses, this is the same as giving simpler hypotheses higher a priori probability, and more complex ones lower probability.

From here we can move to the wider problem of induction about the unknown by extrapolating a pattern from the known. Specifically, the problem of induction is how we can justify inductive inference. According to Hume’s ‘Enquiry Concerning Human Understanding’ (1748), if we justify induction on the basis that it has worked in the past, then we have to use induction to justify why it will continue to work in the future. This is circular reasoning. This is faulty theory. “Induction is just a mental habit, and necessity is something in the mind and not in the events.” Yet in practice we cannot help but rely on induction. We are working from the idea that it works in practice if not in theory – so far. Induction is thus related to an assumption about the uniformity of nature. Of course, induction can be turned into deduction by adding principles about the world (such as ‘the future resembles the past’, or ‘space-time is homogeneous.’) We can also assign to inductive generalisations probabilities that increase as the generalisations are supported by more and more independent events. This is the Bayesian approach, and it is a response to the perspective pioneered by Karl Popper. From the Popperian perspective, a single observational event may prove hypotheses wrong, but no finite sequence of events can verify them correct. Induction is from this perspective theoretically unjustifiable and becomes in practice the choice of the simplest generalisation that resists falsification. The simpler a hypothesis, the easier it is to be falsified. Induction and falsifiability are in practice, from this viewpoint, is as good as it gets in science. Take an inductive inference problem where there is some observed data and a set of hypotheses, one of which may be the true hypothesis generating the data. The task then is to decide which hypothesis, or hypotheses, are the most likely to be responsible for the observations.

A better way of looking at this seems to be to abandon certainties and think probabilistically. Entropy is the tendency of isolated systems to move toward disorder and a quantification of that disorder, e.g. assembling a deck of cards in a defined order requires introducing some energy to the system. If you drop the deck, they become disorganised and won’t re-organise themselves automatically. This is the tendency in all systems to disorder. This is the Second Law of Thermodynamics, which implies that time is asymmetrical with respect to the amount of order: as the system, advances through time, it will statistically become more disordered. By ‘Order’ and ‘Disorder’ we mean how compressed the information is that is describing the system. So if all your papers are in one neat pile, then the description is “All paper in one neat pile.” If you drop them, the description becomes ‘One paper to the right, another to the left, one above, one below, etc. etc.” The longer the description, the higher the entropy. According to Occam’s Razor, we want a theory with low entropy, i.e. low disorder, high simplicity. The lower the entropy, the more likely it is that the theory is the true explanation of the data, and hence that theory should be assigned a higher probability.

More generally, whatever theory we develop, say to explain the origin of the universe, or consciousness, or non-material morality, must itself be based on some theory, which is based on some other theory, and so on. At some point we need to rely on some statement which is true but not provable, and so we think may be false, although it is actually true. We can never solve the ultimate problem of induction, but Occam’s Razor combined with Epicurus, Bayes and Popper is as good as it gets if we accept that. So Epicurus, Occam, Bayes and Popper help us pose the right questions, and help us to establish a good framework for thinking about the answers.

At least that applies to the realm of established scientific enquiry and the pursuit of scientific truth. How far it can properly be extended beyond that is a subject of intense and continuing debate.

Further Reading and Links

https://selectnetworks.net

Bayes’ Theorem: The Most Powerful Equation in the World. https://leightonvw.com/2017/03/12/bayes-theorem-the-most-powerful-equation-in-the-world/

Why is there Something Rather than Nothing https://wordpress.com/post/leightonvw.com/639