# Bayes’ Theorem: The Most Powerful Equation in the World

How should we change our beliefs about the world when we encounter new data or information? This is one of the most important questions we can ask. A theorem bearing the name of Thomas Bayes, an eighteenth century clergyman, is central to the way we should answer this question.

The original presentation of the Reverend Thomas Bayes’ work, ‘An Essay toward Solving a Problem in the Doctrine of Chances’, was given in 1763, after Bayes’ death, to the Royal Society, by Bayes’ friend and confidant, Richard Price.

In framing Bayes’ work, Price gave the example of a person who emerges into the world and sees the sun rise for the first time. As he has had no opportunity to observe this before (perhaps he has spent his life to that point entombed in a dark cave), he is not able to decide whether this is a typical or unusual occurrence. It might even be a unique event. Every day that he sees the same thing happen, however, the degree of confidence he assigns to this being a permanent aspect of nature increases. His estimate of the probability that the sun will rise again tomorrow as it did yesterday and the day before, and so on, gradually approaches, although never quite reaches, 1.

The Bayesian viewpoint is just like that, the idea that we learn about the universe and everything in it through a process of gradually updating our beliefs, edging incrementally ever closer and closer to the truth as we obtain more data, more information, more evidence.

As such, the perspective of Rev. Bayes on cause and effect is essentially different to that of philosopher David Hume, the logic of whose argument on this issue is contained in ‘An Enquiry Concerning Human Understanding’. According to Hume, we cannot justify our assumptions about the future based on past experience unless there is a law that the future will always resemble the past. No such law exists. Therefore, we have no fundamentally rational support for believing in causation. For Hume, therefore, predicting that the sun will rise again after seeing it rise a hundred times in a row is no more rational than predicting that it will not. Bayes instead sees reason as a practical matter, in which we can apply the laws of probability to the issue of cause and effect.

To Bayes, therefore, rationality is matter of probability, by which you update your predictions based on new evidence, thereby edging closer and closer to the truth. This is called Bayesian reasoning. According to this approach, probability can be seen as a bridge between ignorance and knowledge. The particularly wonderful thing about the world of Bayesian reasoning is that the mathematics of operationalising it are so simple.

Essentially, Bayes’ Theorem is just an algebraic expression with three known variables and one unknown. Yet this simple formula is the foundation stone of that bridge I referred to between ignorance and knowledge.

Bayes’ Theorem is in this way concerned with conditional probability. That is, it tells us the probability, or updates the probability, that a theory or hypothesis is true given that some event has taken place.

To help explain how it works, let us invent a little crime story in which you are a follower of Bayes and you have a friend in a spot of trouble. In this story, you receive a telephone call from your local police station. You are told that your best friend of many years is helping the police investigation into a case of vandalism of a shop window in a street adjoining where you knows she lives. It took place at noon that day, which you know is her day off work.

She next comes to the telephone and tells you she has been charged with smashing the shop window, based on the evidence of a police officer who positively identified her as the culprit. She claims mistaken identity.

You must evaluate the probability that she did commit the offence before deciding how to advise her.

So the condition is that she has been charged with criminal damage; the hypothesis you are interested in evaluating is the probability that she did it.

Bayes’ Theorem helps you answer this type of question.

There are three things you need to estimate.

- A Bayesian’s first task is to estimate the probability that the new evidence would have arisen if the hypothesis was true. In this case, you need to estimate the probability of the police officer identifying your friend if your friend actually did break the window.
- A Bayesian’s second task is to estimate the probability that the new evidence would have arisen if the hypothesis was false. In this case, you need to estimate the probability of the police officer identifying your friend if your friend did NOT break the window.
- You need what Bayesians call a
*prior probability.*

This is the probability you would have assigned to her smashing the shop window before she told you that she had been charged on the basis of the witness evidence. This is not always easy, since the new information might colour the way you assess the prior information, but ideally you should estimate this probability as it would have been before you received the new information.

A practical definition of a Bayesian prior is the odds at which you would be willing to place or offer a bet before the new information is disclosed.

Based on these three probability estimates, Bayes’ Theorem offers you the way to calculate accurately the revised probability you should assign to your friend’s guilt. The wonderful part about it is that the equation is true as a matter of logic. So the result it produces will be as accurate as the values inputted into the equation.

The formula is also so straightforward it can be jotted on the back of your hand. Actually, that’s not such a bad idea for such a powerful tool. Indeed, if you are attracted to tattoos, this is a good an idea for one as any. And it’s as simple as x,y,z.

The formula has xy on the top of the equation and xy+z(1-x) on the bottom.

And that’s it!

Bayes’ rule is:

Probability of hypothesis being true after obtaining new evidence = **xy/[xy+z(1-x)]**

This is known as the Posterior Probability.

So we have three variables.

x is the prior probability, i.e. the probability you assign to the hypothesis being true before you obtain the new evidence.

y is the probability that the new evidence would have arisen if the hypothesis was true.

z is the probability that the new evidence would have arisen if the hypothesis was false.

So let’s apply Bayes’ Rule to the case of the shattered shop window.

Let’s start with y. This is the probability that the new evidence would have arisen if the hypothesis was true. What is the hypothesis? That your friend broke the window. What is the new evidence? That the police officer has identified your friend as the person who smashes the window. So y is an estimate of the probability that the police officer would have identified your friend if she was indeed guilty.

If she threw the brick, it’s easy to imagine how she came to be identified by the police officer. Still, he wasn’t close enough to catch the culprit at the time, which should be borne in mind. Let’s say that the probability he has identified her and that she is guilty is 80% (0.8).

Let’s move on to z. This is the probability that the new evidence would have arisen if the hypothesis was false. What is the hypothesis again? That your friend broke the window. What is the new evidence again? That the police officer has identified your friend as the person who did it. So z is an estimate of the probability that the police officer would have identified if she was not the guilty party, i.e. a false identification.

If your friend didn’t shatter the window, how likely is the police officer to have wrongly identified her when he saw her in the street later that day? It is possible that he would see someone of similar age and appearance, wearing similar clothes, and jump to the wrong conclusion, or he may just want to identify someone to advance his career. Let us give him credit and say the probability is just 15% (0.15).

Finally, what is x? This is the probability you assign to the hypothesis being true before you obtain the new evidence. In this case, it means the probability you would assign to your friend breaking the shop window before you got the new information from her on the telephone about the evidence of the police officer? Well, you have known her for years, and it is totally out of character, although she does live just a stone’s throw from the shop, and is not at work that day, so she could have done it. Let’s say 5% (0.05). That’s just before you learn from her on the telephone about the witness evidence and the charge. Assigning the prior probability is fraught with problems, however, as awareness of the new information might easily colour the way you assess the prior information. You need to make every effort to estimate this probability as it would have been before you received the new information. You also have to be precise as to the point in the chain of evidence at which you establish the prior probability.

Once we’ve assigned these values, Bayes’ theorem can now be applied to establish a posterior probability. This is the number that we’re interested in. It is the measure of how likely is it that your friend broke the window, given that she’s been identified as the culprit by the police officer.

The calculation and the simple algebraic expression that we have identified is:

xy/[xy+z(1-x)]

where x is the prior probability of the hypothesis (she’s guilty) being true.

where y is the probability the police officer identifies her conditional on the hypothesis being true, i.e. she’s guilty.

where z is probability the police officer identifies her conditional on the hypothesis not being true, i.e. she’s not guilty.

In our example, x = 0.05, y = 0.8, z = 0.15

The rest is simple arithmetic.

xy = 0.05 x 0.8 = 0.04

z(1-x) = 0.15 x 0.95 = 0.1425

xy/xy+z(1-x) = 0.04/(0.04+ 0.1425) = 0.04/0.1825

Posterior probability = 0.219 = 21.9%

The most interesting takeaway from this is the relatively low probability you should assign to the guilt of your friend even though you were 80% sure that the police officer would get it right if she was guilty, and the small 15% chance you assigned that he would falsely identify her. The clue to the intuitive discrepancy is in the prior probability (or ‘prior’) you would have attached to the guilt of your friend before you were met face to face with the evidence of the police officer. If a new piece of evidence now emerges (say a second witness), you should again apply Bayes’ Theorem to update to a new posterior probability, gradually converging, based on more and more pieces of evidence, ever nearer to the truth.

It is, of course, all too easy to dismiss the implications of this hypothetical case on the grounds that it was just too difficult to assign reasonable probabilities to the variables. But that is what we do implicitly when we don’t assign numbers. Bayes’ rule is not at fault for this in any case. It will always correctly update the probability of a hypothesis being true whenever new evidence is identified, based on the estimated probabilities. In some cases, such as the crime case illustrated here, that is not easy, though the approach you adopt to revising your estimate will always be better than using intuition to steer a path to the truth.

In many other cases, we do know with precision what the key probabilities are, and in those cases we can use Bayes’ Rule to identify with precision the revised probability based on the new evidence, often with startlingly counter-intuitive results. In seeking to steer the path from ignorance to knowledge, the application of Bayes’ Theorem is always the correct method.

Thanks to Bayes, the path to the truth really is as easy as x,y,z. What remains is the wit and will to apply it.

Further Reading and Links

The most important idea in probability. Truth and justice depend on us getting it right. https://leightonvw.com/2014/12/13/this-is-probably-the-most-important-idea-in-probability-truth-and-justice-depends-on-us-getting-it-right/

A Visual Guide to Bayesian Thinking. YouTube. https://youtu.be/BrK7X_XlGB8

Bayes’ Theorem and Conditional Probabilities https://brilliant.org/wiki/bayes-theorem/