A patient goes to see the doctor. The doctor performs a test on all his patients, for a flu virus , estimating that only 1 per cent of the people who visit his surgery have the virus. The test he gives them, however, is 99 percent accurate – that is, 99 percent of people who are sick test positive and 99 percent of the healthy people test negative. Now the question is: if the patient tests positive, what chances should the doctor give to the patient having the virus?

The intuitive answer is 99 percent.

But is that right?

The information we are given is ‘the probability of testing positive given that you are sick’. What we want to know, however, is ‘the probability of being sick given that you tested positive.’ Common intuition conflates these two probabilities, but they are in fact very different. In fact, if the test is 95% accurate, this means that 95% of sick people test positive. But this is NOT the same thing as saying that 95% of people who test positive are sick. This is known as the ‘Inverse Fallacy’ or ‘Prosecutor’s Fallacy’. It is the fallacy, to which jurors are very susceptible, of believing that the probability of a defendant being guilty of a crime given the observation of some piece of evidence is the same as the probability of observing that piece of evidence if the defendant was guilty. They are in fact very different things, and the two probabilities can diverge markedly, markedly enough in fact to send many innocent people to the place of execution or to a life without possibility of parole.

So what is the probability of being sick if you test positive, given that the test is 99% accurate (i.e. 99% of people who are sick test positive and 99% of people who are not sick test negative)?

To answer this we can use Bayes’ Theorem.

The (posterior) probability that a hypothesis is true after obtaining new evidence, according to the x,y,z formula of Bayes’ Theorem, is equal to:

xy/[xy+z(1-x)]

x is the prior probability, i.e. the probability that a hypothesis is true before you see the new evidence.

y is the probability you would see the new evidence if the hypothesis is true.

z is the probability you would see the new evidence if the hypothesis is false.

In the case of the flu test, the hypothesis is that the patient is sick.

Before the new evidence (the test), this chance is estimated at 1 in 100 (0.01)

So x = 0.01

The probability we would see the new evidence (the positive result on the test) if the hypothesis is true (the patient is sick) is 99%, since the test if 99% accurate.

So y =0.99

The probability we would see the new evidence (the positive result on the test) if the hypothesis is false (the patient is not sick) is just 1% (because the test is 99% accurate, and will only give a false positive 1 time in 100).

So z = 0.01

Substituting into Bayes’ equation gives:

0.01x 0.99 / [0.01 x 0.99 + 0.01 (1 – 0.01)] = 0.01×0.99 / [0.01×0.99 + 0.01×0.99] = 1/2

So there is actually a 50% chance that the test, which is 99% reliable and has tested positive, has misdiagnosed you and you are actually flu-free.

Basically, it is a competition between how rare the disease is and how rarely the test is wrong. In this case, there is a 1 in 100 chance that you have the flu before undertaking the test, and the test is wrong 1 time in 100. These two probabilities are equal, so the chance that you actually have the flu when testing positive is 1 in 2.

But what if the patient is showing symptoms of the disease before being tested?

In this case, the prior probability should be updated to something higher than the prevalence rate of the disease in the entire tested population, and the chance you are actually sick when you test positive rises accordingly. To the extent that a doctor only tests for something that there is corroborating support for, the likelihood that the test result is correct grows. For this reason, any positive test result should be taken very seriously, statistics aside.

More generally, to differentiate truth from scare we really do need to understand and employ Bayes’ Theorem. Whether at the doctor’s surgery or in the jury room, understanding it really could save a life.

Appendix

In the original setting with the test results showing positive for a flu virus, a = 0.01, b = 0.99, c = 0.01. Substituting into Bayes’ equation, ab/[ab+c(1-a)], gives:

Posterior probability = 0.01x 0.99 / [0.01 x 0.99 + 0.01 (1 – 0.01)] = 0.01×0.99 / [0.01×0.99 + 0.01×0.99] = 1/2

Another way of visualising this problem is by constructing a simple box diagram for a population of 10,000 patients. Of these, 1%, or 100, have the flu virus and 9900 do not. These are inserted into the Total column. There is a 1% error rate, so 1% of the 9900 who do not have the flu virus test positive. Hence the remaining 9801 test negative. Of the 100 who actually have the flu virus, one tests negative (because of the error rate) and the remaining 99 correctly test positive. See below.

 Test positive Test negative Total Has flu virus 99 1 100 No flu virus 99 9801 9900 Total 198 9802 10000

It is now easy to see that of the 198 who test positive, exactly half (99) actually have the flu virus. The other half are false positives.

Let’s take another example.

The probability of a true positive (test comes back positive for virus and the patient has the virus) is 90%. The chance that it gives a false negative (test comes back negative yet the patient has the virus) is 10%. The chance of a false positive (test comes back positive yet the patient does not have the virus) is 7%. The chance of a true negative (test comes back negative and the patient does not have the virus) is 93%.

The probability that a random patient has the virus based on the prevalence of the virus in the tested population is 0.8%.

Here, a = 0.8% (0.008) – this is the prior probability

b =90% (0.9) – probability of a true positive

c = 7% (0.07) – probability of a false positive

So, updated probability that the patient has the virus given the positive test result =

ab / [ab + c (1-a)] = 0.008 / [0.0072 + 0.07 x (1 – 0.008)]

= 0.008 x 0.9 / [0.008 x 0.9 + 0.06944] = 0.0072 / 0.07664 = 0.0939 = 9.39%

This can be shown using the raw figures to produce the same result. We can choose any number for total tested, and the result is the same. Let’s choose 1 million, say, as the number tested.

So total tested = 1,000,000

Total with virus = 0.008 x 1,000,000 = 8000

True positive = 0.9 x 8000 = 7200

False positive = 0.07 x 992,000 = 69,440

Tested positive = 69,440 + 7200 = 76,640

Updated (posterior) probability that the patient who tests positive has the virus = True positives / Total positives = 7200 / 76640 = 0.0939 = 9.39%

In the forensic match example, we can construct a box table. In the example, out of a population of suspects of 100, one is guilty and 99 are not guilty. These are inserted into the Total column. There is a 5% error rate in the forensic match, so there is a 0.95 chance of a match if the suspect is guilty (top left). There’s a 5% chance that one of the 99 will provide a match (0.05 x 99 = 4.95), leaving 84.15 as the number for the Not guilty/No match cell.

 Match No match Total Guilty 0.95 0.05 1 Not guilty 4.95 94.05 99 Total 5.9 94.1 100

So the chance that the suspect provides a match and is actually guilty is the proportion of those guilty and matching out of all those matching (0.95/5.9 = 0.16).

So the 95% accurate forensic match provides a hit when matched to the suspect but his actual probability of guilt on these figures is just 16%.

Using Bayes’ Theorem, we reach the same conclusion:

Substituting into Bayes’ equation gives:

P (Guilty I Match) = 0.01x 0.95 / [0.01 x 0.95 + 0.05 (1 – 0.01)] = 0.01×0.95 / [0.01×0.95 + 0.05×0.99] = 0.0095/(0.0095+0.0495) = 0.0095/0.059 = 0.16.

So P (Guilty I Match) = 0.16

P (Not guilty I Match) = 0.84