# Introducing Bayes

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, *Probability, Choice and Reason.*

Bayes’ theorem concerns how we should update our beliefs about the world when we encounter new evidence. The original presentation of Rev. Thomas Bayes’ work, ‘An Essay toward Solving a Problem in the Doctrine of Chances’, was given in 1763, after Bayes’ death, to the Royal Society, by Richard Price. In framing Bayes’ work, Price gave the example of a person who emerges into the world and sees the sun rise for the first time. At first, he does not know whether this is typical or unusual, or even a one-off event. However, each day that he sees the sun rise again, his confidence increases that it is a permanent feature of nature. Gradually, through this purely statistical form of inference, the probability he assigns to his prediction that the sun will rise again tomorrow approaches (although never exactly reaches) 100 per cent. The Bayesian viewpoint is that we learn about the universe and everything in it through approximation, getting closer to the truth as we gather more evidence, and thus rationality is regarded as a probabilistic matter. As such, Bayes applies and formalises the laws of probability to the science of reason, to the issue of cause and effect.

In its most basic form, Bayes’ Theorem is simply an algebraic expression with three known variables and one unknown. It is true by construction. But this simple formula can lead to important predictive insights. So Bayes’ Theorem is concerned with conditional probability. That is, it tells us the probability that a theory or hypothesis is true if some new information comes to light, based on the probability we attach to it being true before the new information is known, updated in light of the new information.

Presented most simply, it looks like this:

Probability that a hypothesis is true given some new evidence (‘Posterior Probability’) =

**ab/[ab+c(1-a)], **where:

a is the prior probability of the hypothesis being true (the probability we attach before the new evidence arises)

b is the probability that the new evidence would arise if the hypothesis is true

c is probability the new evidence would arise if the hypothesis is not true

1-a is the prior probability that the hypothesis is not true

Using more traditional notation,

P(H) = probability the hypothesis is true (x)

P(E) = probability of the evidence

P(H’) = probability the hypothesis is not true

P(EIH) = probability of the evidence given that the hypothesis is true (y)

P(EIH’) = probability of the evidence given that the hypothesis is not true (z)

P(HIE) = posterior (updated) probability (PP)

The equation is easily derived.

The entry point is this equation, both sides of which are equal to the probability of the evidence and hypothesis taken together.

P(HIE).P(E) = P(H).P(EIH)

To derive Bayes’ Theorem, divide through by P (E).

P (HIE) = P (H).P(EIH) / P(E) … Bayes’ Theorem

P (E) = P (H). P (EIH) + P (EIH’).P (H’)

So P (HIE) = P (H).P (EIH) / [P (H). P (EIH) + P (EIH’).P(H’)]

*Take a simple card example.*

There are 52 cards in the deck. 26 are black, 26 are red. One of the cards is the Ace of Spades.

Hypothesis: A selected card is the Ace of Spades.

Evidence: The card is revealed to be black.

So P (HIE) = 1/26 (there are 26 black cards, of which one is the Ace of Spades).

P (E) = ½ (1/2 of the cards are black); P(H) = 1/52 (there are 52 cards of which one is the Ace of Spades)

P (EIH) = 1 (if it’s the Ace of Spades, it must be black).

So P (HIE). P (E) = 1/52

And P (H). P (EIH) = 1/52

As expected, and as always, P (HIE). P (E) = P (EIH).P(H). This is the fundamental expression from which Bayes’ Theorem is easily derived, as above by dividing both sides by P (E).

Thus, Bayes’ Theorem states that: P (HIE) = P (EIH).P (H) / P(E)

As before, P (E) = P (H). P (EIH) + P (EIH’).P (H’), i.e. Probability of the evidence = Probability of the evidence if the hypothesis is true times the probability the hypothesis is true PLUS probability of the evidence if the hypothesis is not true times the probability the hypothesis is not true.

So, P (HIE) = P (H).P (EIH) / [P (H).P(EIH) + P(EIH’).P(H’)] – Longer expression of Bayes’ Theorem

OR, P (HIE) = xy / [xy + z (1-x)] – Bayes’ Theorem.

*Does P (HIE) = P (EIH)?*

Is the probability that a selected card is the Ace of Spades given the evidence (it is a black card) equal to the probability it is a black card given that it is the Ace of Spades?

In this example, P(HIE) = 1/26, there is one Ace of Spades out of 26 black cards.

P(EIH) = 1, since the probability it is a black card if it is the Ace of Spades is certain.

So, P(HIE) is not equal to P(EIH).

To claim they are equal is more generally known as the Prosecutor’s Fallacy (also known as the Inverse Fallacy).

*What is the chance that a card in a four-card deck is the Ace of Spades?*

Four cards in the deck. Ace of Spades, Ace of Clubs, Ace of Diamonds, Ace of Hearts

Hypothesis: The selected card is the Ace of Spades.

Prior probability of Ace of Spades (AS) = ¼

What is the posterior probability it is Ace of Spades given evidence that the card is black?

P(HIE) = P(H).P(EIH)/P(E) = ¼.1 / (1/2) = ½

PP = xy/[xy+z(1-x)] = ¼.1/[1/4 + 1/3(3/4)] = ¼ / ½ = ½

NB z = P(EIH’). This is the probability of a black card if the card is not the Ace of Spades. There are three other cards, only one of which is black, so z = 1/3.

So either formula generates the same correct answer, that the posterior probability that the hypothesis is true (the card is the Ace of Spades given that it is black) is ½.

*Dice Example *

Two dice are thrown. The hypothesis is that two sixes will be thrown. The new evidence is that a six is thrown on the first one.

P (H) = x = 1/36

P (EIH) = y = 1 (for a double six, a six must be thrown on the first one).

P (E) = 1/6 (there is a 1 in 6 chance of throwing a six on the first die)

P (HIE) = posterior probability = P(EIH). P(H) /P(E) = 1/36 / 1/6 = 1/6 (there is a 1 in 6 chance of a double six if the first die lands on a six).

Note : P(H).P(EIH) = P(E).P(HIE) = 1/36

Note also: P(E) = P(H).P(EIH) + P(H’).P(EIH’) = 1/36 + 35/36 . 5/35 = 1/36 + 5/36 = 1/6

Similarly, PP = xy/[xy+z(1-x)] = 1/6

Note: z = P(EIH’) = 5/35 because if not a 6,6 (H’), 35 options left and chance of a single six occurs in 5 of them, i.e. 6,1; 6,2; 6,3; 6,4; 6,5

*Does P(HIE) = P(EIH)?*

Is the probability of obtaining six on two dice, if the first comes up six, the same as the probability of the first coming up six if both come up six?

In this example, P (HIE) = 1/6, which is the chance the second die will come up six if the first does.

P (EIH) = 1, since the first die must come up six if both dice are to come up six.

So, P (HIE) is not equal to P (EIH), highlighting again the classic Prosecutor’s Fallacy.

The key contributions of Bayesian analysis to our understanding of the world are fivefold

- It clearly shows that P (HIE) is not the same thing as P (EIH). The conflation of these two expressions is known as the Prosecutor’s Fallacy and has been sufficient in itself to cause countless miscarriages of justice and to reach erroneous conclusions more generally about the likelihood that a hypothesis is true in the context of observed evidence.
- So what is P (HIE) equal to?
- P (HIE) = P (EIH). P (H) /P (E), where P (E) = P (H). P (EIH) + P (H’). P(EIH’).
- Bayes’ Theorem makes clear the importance not just of new evidence but also the (prior) probability that the hypothesis was true before the new evidence was observed. This prior probability is generally given far too little weight compared to the new evidence in common intuition about probability. Bayes’ Theorem makes it explicit.
- Bayes’ Theorem allows us a way to calculate the updated probability as accurately as allowed by our assessment of the prior probability of the hypothesis being true and the probability of the evidence arising given the hypothesis being true and being false.

In all these ways, Bayes’ Theorem replaces often faulty intuition and logic with a rigorous application of conditional probability theory, to give us as accurate as possible a representation of how probable a hypothesis is to be true given the available evidence at any given point in time.