Skip to content

Simpson’s Paradox: The puzzle that could save your life.

April 10, 2017

Was the University of California, Berkeley, guilty of gross discrimination in their entry standards? This was a cause celebre of the early 1970s and the basis of legal action for alleged bias against women applicants.

To show what was behind the discontent, we can highlight the admission figures for the Fall term of 1973. This shows that male applicants to the University were significantly more likely to be accepted than females.

Applicants            Admitted

Men               8442              44%

Women         4321              35%

Looks pretty damning, until it was decided to break the admittance figures down by department. In doing so, it revealed a paradox.

Dept.              Men                                       Women

Applicants    Admitted                  Applicants    Admitted

A         825                 62%                108                82%

B         560                 63%                 25                68%

C          325                 37%                593                34%

D         417                 33%                375                35%

E          191                 28%                393                24%

F          373                 6%                   341               7%

In other words, a higher proportion of women were admitted to four of the six departments than men.

So what was going on? Those with statistical training soon realised that this was a simple example of Simpson’s Paradox.

Simpson’s Paradox arises when different groups of frequency data are combined, revealing a different performance rate overall than is the case when examining a breakdown of the performance rate.

Take another example from baseball. In the 1995/96 seasons, fans were divided between those who claimed Derek Jeter as the best performing player and those who claimed that title for David Justice. It is easy to see why.

1995                                                       1996                           Combined

Derek Jeter             12/48 (.250)             183/582 (.314)       195/630 (.310)

David Justice           104/411 (.253)       45/140 (.321)         149/551 (.270)

Here we see that Jeter has the better overall batting average but Justice records a better average in each of the two years making up that overall average. To anyone conversant with Simpson’s Paradox this is nothing weird. It is certainly possible in theory for one player to score a better batting average in successive years than another, yet record a worse batting average overall. The case of Jeter and Justice is an example where the theory clearly shows up in practice.

For those more familiar with cricket than baseball, take this mythical example in a match consisting of two innings.

First Innings:

Harold Larwood takes 3 wickets and concedes 60 runs

Bill Voce takes 2 wickets and concedes 68 runs

Second Innings:

Harold Larwood takes 1 wicket and concedes 8 runs

Bill Voce takes 6 wickets and concedes 60 runs

Here, Larwood has the superior performance in both innings (1 wicket for 20 compared to Voce’s 1 for 34 and 1 for 8 compared to Voce’s 1 for 10) but in the overall match Larwood took 4 wickets for 68 runs (1 for 17) while Voce did slightly better, taking 8 wickets for 128 runs (1 for 16).

Another example of Simpson’s Paradox.

A more worrying example of where ignorance of the implications of Simpson’s Paradox might arise is in the case of medical trials.

Drug A                                                           Drug B

Day 1             63/90 = 70%                         8/10 = 80%

Day 2             4/10 = 40%                          45/90 = 50%

Overall, Drug A = 67% success rate; Drug B = 53% success rate.

But Drug B performs better on both days.

So which is the better drug? Who is the better baseball player? Who is the better bowler? Were the University of California, Berkeley, discriminating on the basis of gender? All of these questions are examples of Simpson’s Paradox.

In the case of Berkeley, a study published in 1975 by Bickel, Hammel and O’Connell, in ‘Science’ reached the conclusion that women tended to apply to the more competitive departments with low rates of admission, such as the English Department, while men tended to apply to less competitive departments with high rates of admission, such as engineering and chemistry. As such the University was not actively discriminating against women, if anything the converse.

In the case of Jeter and Justice, I would lean towards the case for Derek Jeter, but a case for David Justice can be made out. Similarly, the case (albeit mythical) for Voce is slightly stronger than for Larwood on these figures, I would argue, but others could make out a different case. In the medical trials, I would certainly choose to be treated by Drug A. Others might differ, but I doubt they would persuade any reasonable judge of the outcome of the trials.

And that’s the biggest lesson we can learn today, to be aware of Simpson’s Paradox, and its implications for the way you look at statistics. It’s an awareness that one day might just save your life!

Further reading and links.

P.J. Bickel, E.A. Hammel and J.W. O’Connell (1975), Sex Bias in Graduate Admissions: Data from Berkeley, Science, 187, 398-404.

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: