Skip to content

Resolving Simpsonʼs Paradox

January 22, 2026

Life, Death, and Statistical Literacy

A version of this article appears in my book, Twisted Logic: Puzzles, Paradoxes, and Big Questions (Chapman and Hall/CRC Press).
Understanding complex statistical phenomena can be a daunting task, especially when they seem to defy common sense. One such concept is Simpson’s Paradox, a surprising phenomenon that occurs when a trend observed within several different groups of data disappears or reverses when these groups are combined. Think of it like a recipe. Individual ingredients might have distinct flavours, but when mixed together, the overall taste can be quite different. Similarly, separate sets of data may tell one story, but when combined, they can tell a completely different one.
Let’s look at some examples.
SIMPSON’S PARADOX IN MEDICINE TRIALS
Suppose you’re testing the effectiveness of two different types of medicine: a new drug and an old drug. Your goal is to determine which one is more effective at treating a certain condition. You administer these drugs to different groups of patients and then analyse how well each drug performs.
Let’s look at a two-day medical trial comparing two drugs.
On Day 1, the new drug showed a 70% success rate in a large group, while the old drug showed an 80% success rate in a much smaller group. This makes it seem like the old drug is better.
On Day 2, the new drug, applied to a small group, was less effective than on Day 1, while the old drug, applied to a larger group, was also less effective than on Day 1. Even so, once again the old drug seems to perform better than the new drug.
However, when we combine both days’ data, the new drug comes out ahead. This shift is a classic example of Simpson’s Paradox.
Day 1: Initial Observations
On the first day, you test the new drug on 90 patients, and it works for 63 of them, giving a success rate of 70%. In contrast, you administer the old drug to a smaller group of ten patients, and it works for eight of them, resulting in an 80% success rate. At this point, it seems like the old drug outperforms the new one. But let’s continue.
New drug: 90 patients; 63 successes. Success rate = 70%.
Old drug: 10 patients, 8 successes. Success rate = 80%.
Conclusion: Old Drug Outperforms New Drug
Day 2: More Data, More Surprises
The following day, the new drug is given to a different group of ten patients. This time, it only works for four of them, resulting in a decreased success rate of 40%. The old drug, on the other hand, is given to a larger group of 90 patients, and it works for 45 of them, indicating a 50% success rate. Once again, the old drug seems to outshine the new one.
New drug: 10 patients; 4 successes. Success rate = 40%.
Old drug: 90 patients, 45 successes. Success rate = 50%.
Conclusion: Old drug outperforms new drug.
COMBINING THE RESULTS: SIMPSON’S PARADOX AT WORK
When you merge the results from both days, however, an interesting thing happens. The new drug, which seemed less effective on each individual day, ended up working for 67 of the total 100 patients who took it, bringing the total success rate to 67%. The old drug, conversely, worked for only 53 out of 100 of its patients, resulting in a 53% success rate overall. This is contrary to what was observed on individual days and seems paradoxical. This flip is a classic example of Simpson’s Paradox.
New drug: 100 patients; 67 successes. Success rate = 67%.
Old drug: 100 patients, 53 successes. Success rate = 53%.
Conclusion: New drug outperforms old drug.
EXPLAINING THE PARADOX
The paradox in our medical trial example is heavily influenced by the size of the groups tested each day.
If we combine the results, larger group sizes on different days skew the overall success rate, revealing the paradox. The success rates are important, but the size of the groups being compared is crucial to understanding why the paradox occurs.
EXPLORING SIMPSON’S PARADOX IN DRUG EFFICACY TRIALS
Let’s delve deeper into Simpson’s Paradox using another example. Suppose this time you’re comparing a real drug to a placebo, a sugar pill, to see if the real drug can help patients recover from a specific illness.
You arrange the patients into four distinct age groups: elderly adults (Group A), middle-aged adults (Group B), young adults (Group C), and children (Group D).
The drug’s effectiveness is measured by the proportion of patients in each group who recover from their illness within two days of taking the medication.
THE SUGAR PILL EXPERIMENT
First, let’s take a look at the sugar pill group.
You distribute the sugar pill to different proportions of the four age groups:
Group A has 20 elderly adults, Group B has 40 middle-aged adults, Group C has 120 young adults, and Group D has 60 children.
The sugar pill helps 10% of the elderly (Group A), 20% of the middle-aged adults (Group B), 40% of the young adults (Group C), and 30% of the children (Group D).
To calculate the overall success rate, you add up the number of successful recoveries across all the groups (2 from Group A, 8 from Group B, 48 from Group C, and 18 from Group D) and divide by the total number of patients (240). The result is 76 successful recoveries out of 240 trials, giving an overall success rate of approximately 31.7%.
Group A: 20 elderly adults; 2 successes. Success rate = 10%.
Group B: 40 middle-aged adults; 8 successes. Success rate = 20%.
Group C: 120 young adults; 48 successes. Success rate = 40%.
Group D: 60 children; 18 successes. Success rate = 30%.
Total: 240 trials; 76 successes. Success rate = 31.7%.
THE REAL DRUG EXPERIMENT
Next, let’s look at the group given the real drug. This time, the group sizes are different: Group A has 120 elderly adults, Group B has 60 middle-aged adults, Group C has 20 young adults, and Group D has 40 children.
The real drug helps 15% of the elderly (Group A), 30% of the middle-aged adults (Group B), 90% of the young adults (Group C), and 45% of the children (Group D).
Again, to get the overall success rate, you add up the number of successful recoveries (18 from each group) and divide by the total number of patients (240). This time, the result is 72 successful recoveries out of 240 trials, resulting in an overall success rate of approximately 30%.
Group A: 120 elderly adults; 18 successes. Success rate = 15%.
Group B: 60 middle-aged adults; 18 successes. Success rate = 30%.
Group C: 20 young adults; 18 successes. Success rate = 90%.
Group D: 40 children; 18 successes. Success rate = 45%.
Total: 240 trials; 72 successes. Success rate = 30%.
A PARADOX EMERGES
At first glance, it seems that the sugar pill outperformed the real drug. After all, the overall success rate was higher for the sugar pill (31.7%) than for the real drug (30%). But if we examine the data more closely, we find that the real drug had a higher success rate within each age group.
So, why does the overall success rate favour the sugar pill, even though the real drug was more effective in every age category? The paradox again arises due to the different group sizes and composition.
For example, the group that took the sugar pill had a disproportionately large number of young adults (Group C). This demographic typically has higher natural recovery rates, skewing the overall success rate of the sugar pill upwards. On the contrary, the group that took the real drug had a higher proportion of elderly adults (Group A), who typically have lower recovery rates, leading to a lower overall success rate for the real drug.
A MATTER OF LIFE AND DEATH
A real-world example of Simpson’s Paradox in action can be seen in the context of the COVID-19 pandemic, specifically relating to a report published in November 2021 by the Office for National Statistics (ONS). It was titled ‘Deaths involving COVID-19 by vaccination status, England: deaths occurring between 2 January and 24 September 2021’.
The raw statistics showed death rates in England for people aged 10–59, listing vaccination status separately. Counterintuitively, the statistics showed that the death rates for the vaccinated in this age grouping were greater than those for the unvaccinated. These numbers were heavily promoted and highlighted on social media by anti-vaccine advocates, who used them to argue that vaccination increases the risk of death.
This claim was contrary, though, to efficacy and effectiveness studies showing that COVID-19 vaccines offered strong protection.
A CLOSER INSPECTION
Closer inspection of the ONS report reveals that over the period of the study, from January to September 2021, the age-adjusted risk of death involving COVID-19 was 32 times greater among unvaccinated people compared to fully vaccinated people. So how can we square this with the raw data? This is where Simpson’s Paradox comes in.
The paradox in the ONS statistics arises specifically because death rates increase dramatically with age, so that at the very top end of this age band, for example, mortality rates are about 80 times higher than at the very bottom end. A similar pattern is observed between vaccination rates and age. For example, in the 10–59 data set, more than half of those vaccinated are over the age of 40.
Those who are in the upper ranges of the wide 10–59 age band are, therefore, both more likely to have been vaccinated and also more likely to die if infected with COVID-19 or for any other reason, and vice versa. Age is acting, in the terminology of statistics, as a confounding variable, as it is positively related to both vaccination rates and death rates. To put it another way, if you are older, you are more likely to die in a given period, and you are also more likely to be vaccinated. It is age that is driving up death rates not vaccinations. Without vaccinations, deaths would have been hugely greater from COVID-19.
STATISTICAL LITERACY
If we break down the band into narrower age ranges, such as 10–19, 20–29, 30–39, 40–49, and 50–59, the counterintuitive headline finding immediately disappears. In each age band, the death rates of the vaccinated are very much lower than those of the unvaccinated. This also applies in the higher age bands—60–69, 70–79, and 80 plus. The key point is that age is a crucial factor that must be considered when analysing the risk of death and the impact of vaccinations.
In this way, misrepresentation of statistics can have potentially devastating consequences for the lives of millions around the world. Statistical literacy is a real superpower in the global quest to protect and save these lives.
GUIDELINES AND STRATEGIES
Disaggregate the Data
Break Down Data into Subgroups: Disaggregating data by relevant subgroups (e.g. age, gender, region) can reveal underlying trends that the aggregated data might mask.
Question Initial Assumptions
Challenge Averages: Averages can be misleading. Always question what an average is concealing. Is it masking a wide distribution or skewing because of outliers?
Seek Out Hidden Variables (Confounding Variables)
Identify Potential Confounders: Simpson’s Paradox often arises due to the presence of hidden variables that influence both the predictor and outcome variables.
Use Visual Data Exploration
Plot Your Data: Visualising your data can help identify patterns, trends, and anomalies. Graphs can help spot where the trend within subgroups differs from the aggregated trend, potentially signalling Simpson’s Paradox.
CONCLUSION: RESOLVING SIMPSON’S PARADOX
Understanding the factors behind Simpson’s Paradox allows us to make much better sense of our data. Whether in stylised examples or in the real world of a global pandemic, the paradox underscores the importance of careful data analysis, particularly when dealing with grouped data. By taking account of the sizes and characteristics of different groups, we can navigate the potential pitfalls of Simpson’s Paradox and learn how to draw more informed conclusions. In a very real sense, millions of lives could depend on an understanding of this statistical reality.

Leave a Comment

Leave a comment