# Bayes and the False Positives Problem – in a nutshell.

Let’s say a patient goes to see the doctor. The doctor performs a test on all his patients, for a flu virus, estimating that only 1 per cent of the people who visit his surgery have the virus. The test he gives them, however, is 99 percent reliable – that is, 99 percent of people who are sick test positive and 99 percent of the healthy people test negative. Now the question is: if the patient tests positive, what chances should the doctor give to the patient having the flu virus?

The intuitive answer is 99 percent. But is that right?

The information we are given is ‘the probability of testing positive given that you have the virus’. What we want to know, however, is ‘the probability of having the virus given that you tested positive.’ Common intuition conflates these two probabilities, but they are in fact very different. In fact, if the test is 95% accurate, this means that 95% of sick people test positive. But this is NOT the same thing as saying that 95% of people who test positive are sick. This is known as the ‘Inverse Fallacy’ or ‘Prosecutor’s Fallacy’, as explored earlier. It is the fallacy, to which jurors are very susceptible, of believing that the probability of a defendant being guilty of a crime in light of the observation of some piece of evidence is the same as the probability of observing that piece of evidence if the defendant was guilty. They are in fact very different things, and the two probabilities can diverge markedly.

So what is the probability of having the virus if you test positive, given that the test is 99% reliable (i.e. 99% of people who have the virus test positive and 99% of people who do not have the virus test negative)?

To answer this we can use Bayes’ Theorem.

The (posterior) probability that a hypothesis is true after obtaining new evidence, according to the a,b,c formula of Bayes’ Theorem, is equal to:

ab/ [ab+c(1-a)]

a is the prior probability, i.e. the probability that a hypothesis is true before you see the new evidence. Before the new evidence (the test), this chance is estimated at 1 in 100 (0.01), as we are told that 1 per cent of the people who visit his surgery have the virus. So, a = 0.01

b is the probability of the new evidence if the hypothesis is true. The probability of the new evidence (the positive result on the test) if the hypothesis is true (the patient is sick) is 99%, since the test is 99% accurate. So, b =0.99

c is the probability of the new evidence if the hypothesis is false. The probability of the new evidence (the positive result on the test) if the hypothesis is false (the patient is not sick) is just 1% (because the test is 99% accurate, and we can only expect a false positive 1 time in 100). So, c = 0.01

Using Bayes’ Theorem, the updated (posterior) probability = ab/ [ab+c(1-a)] = 1/2

So there is actually a 50% chance that the test, which is 99% accurate and has tested positive, has misdiagnosed you and you are actually flu-free.

Basically, it is a competition between how rare the disease is and how rarely the test is wrong. In this case, there is a 1 in 100 chance that you have the flu before undertaking the test, and the test is wrong 1 time in 100. These two probabilities are equal, so the chance that you actually have the flu when testing positive is actually 1 in 2, despite the test being 99% accurate.

But what if the patient is showing symptoms of the disease before being tested?

In this case, the prior probability should be updated to something higher than the prevalence rate of the disease in the entire tested population, and the chance you are actually sick when you test positive rises accordingly. To the extent that a doctor only tests for something that there is corroborating support for, the likelihood that the test result is correct grows. For this reason, any positive test result should be taken very seriously, statistics aside.

More generally, the ‘False Positive’ problem can easily lead to false convictions based on forensic evidence. Let’s say that we have a theft based on access to a secure storage facility, and we test everyone who could potentially have had access, which is 100 people. Without any other evidence, we can now assign a prior probability that the suspect currently being questioned is guilty of the crime at 1 in 100 or 0.01.

Forensic evidence now comes in the way of a partial fingerprint inside the office safe. It is scientifically determined that the probability the suspect’s fingerprint matches the partial print is 95% (0.95). So there’s just a 5% chance that the print was left by another of the suspects. Applying Bayes’ Theorem, we find that when the 95% accurate forensic test provides a match, the actual probability that the suspect is guilty is just 16%. This makes sense when we consider that testing all 100 suspects would (given that the test has a false positive rate of 5%) provide an estimated five false matches. With larger trawls of forensic testing, the likelihood of a false match becomes commensurately higher.

More generally, to differentiate truth from scare we really do need to understand and employ Bayes’ Theorem. Whether at the doctor’s surgery or in the jury room, understanding it really could save a life.

**Appendix**

In the original setting with the test results showing positive for a flu virus, a = 0.01, b = 0.99, c = 0.01. Substituting into Bayes’ equation, ab/[ab+c(1-a)], gives:

Posterior probability = 0.01x 0.99 / [0.01 x 0.99 + 0.01 (1 – 0.01)] = 0.01×0.99 / [0.01×0.99 + 0.01×0.99] = 1/2

Another way of visualising this problem is by constructing a simple box diagram for a population of 10,000 patients. Of these, 1%, or 100, have the flu virus and 9900 do not. These are inserted into the Total column. There is a 1% error rate, so 1% of the 9900 who do not have the flu virus test positive. Hence the remaining 9801 test negative. Of the 100 who actually have the flu virus, one tests negative (because of the error rate) and the remaining 99 correctly test positive. See below.

Test positive | Test negative | Total | |

Has flu virus | 99 | 1 | 100 |

No flu virus | 99 | 9801 | 9900 |

Total | 198 | 9802 | 10000 |

It is now easy to see that of the 198 who test positive, exactly half (99) actually have the flu virus. The other half are false positives.

Let’s take another example.

The probability of a true positive (test comes back positive for virus and the patient has the virus) is 90%. The chance that it gives a false negative (test comes back negative yet the patient has the virus) is 10%. The chance of a false positive (test comes back positive yet the patient does not have the virus) is 7%. The chance of a true negative (test comes back negative and the patient does not have the virus) is 93%.

The probability that a random patient has the virus based on the prevalence of the virus in the tested population is 0.8%.

Here, a = 0.8% (0.008) – this is the prior probability

b =90% (0.9) – probability of a true positive

c = 7% (0.07) – probability of a false positive

So, updated probability that the patient has the virus given the positive test result =

ab / [ab + c (1-a)] = 0.008 / [0.0072 + 0.07 x (1 – 0.008)]

= 0.008 x 0.9 / [0.008 x 0.9 + 0.06944] = 0.0072 / 0.07664 = 0.0939 = 9.39%

This can be shown using the raw figures to produce the same result. We can choose any number for total tested, and the result is the same. Let’s choose 1 million, say, as the number tested.

So total tested = 1,000,000

Total with virus = 0.008 x 1,000,000 = 8000

True positive = 0.9 x 8000 = 7200

False positive = 0.07 x 992,000 = 69,440

Tested positive = 69,440 + 7200 = 76,640

Updated (posterior) probability that the patient who tests positive has the virus = True positives / Total positives = 7200 / 76640 = 0.0939 = 9.39%

In the forensic match example, we can construct a box table. In the example, out of a population of suspects of 100, one is guilty and 99 are not guilty. These are inserted into the Total column. There is a 5% error rate in the forensic match, so there is a 0.95 chance of a match if the suspect is guilty (top left). There’s a 5% chance that one of the 99 will provide a match (0.05 x 99 = 4.95), leaving 84.15 as the number for the Not guilty/No match cell.

Match | No match | Total | |

Guilty | 0.95 | 0.05 | 1 |

Not guilty | 4.95 | 94.05 | 99 |

Total | 5.9 | 94.1 | 100 |

So the chance that the suspect provides a match and is actually guilty is the proportion of those guilty and matching out of all those matching (0.95/5.9 = 0.16).

So the 95% accurate forensic match provides a hit when matched to the suspect but his actual probability of guilt on these figures is just 16%.

Using Bayes’ Theorem, we reach the same conclusion:

Substituting into Bayes’ equation gives:

P (Guilty I Match) = 0.01x 0.95 / [0.01 x 0.95 + 0.05 (1 – 0.01)] = 0.01×0.95 / [0.01×0.95 + 0.05×0.99] = 0.0095/(0.0095+0.0495) = 0.0095/0.059 = 0.16.

So P (Guilty I Match) = 0.16

P (Not guilty I Match) = 0.84

**Exercise**

A patient goes to see the doctor. The doctor performs a test on all his patients, for a flu virus, estimating that only 1 per cent of the people who visit his surgery have the flu. The test he gives them, however, is 95 per cent reliable – that is, 95 per cent of people who are sick test positive and 95 per cent of the healthy people test negative.

If the patient tests positive, what chances should the doctor give to the patient having the flu?