Sunday, 9 September 2018

Simpson's Paradox: When Things Seem Like Unfair Bias But Are Actually Not.

I remember once reading about a supposed bias in the University of California, Berkeley where they were sued for bias against women who had applied for admission to graduate schools there. The admission figures showed that men applying were more likely than women to be admitted, and the difference was thought to be large enough to infer unfair discrimination:

Applicants  Admitted   

Men       8442  44%  

Women  4321  35%  

But the strange part about it was that when examining the individual departments, it transpired that no department was significantly biased against women - in fact, most departments had a small but statistically significant bias in favour of women. Here is the data from the six largest departments:

Department           Men                                Women

                Applicants  Admitted      Applicants  Admitted   

A                         825   62%                         108   82%  

B                         560   63%                          25    68%  

C                         325   37%                          593   34%  

D                         417   33%                          375   35%  

E                         191   28%                          393   24%  

F                          373   6%                           341   7%   

Given the foregoing statistics, how can it be the case that women tended to do better than men in individual cases but worse overall? What was discovered was that women tended to apply to competitive departments with low rates of admission even among qualified applicants, whereas men tended to apply to less-competitive departments with high rates of admission among the qualified applicants. This can skew the overall picture to look like discrimination when, in fact, it is nothing of the sort.

This is what is referred to in economics as Simpson’s Paradox (which isn’t really a paradox, as I’ll show), after the statistician Edward H. Simpson. What it’s actually to do with is misleading impressions based on percentages and ratios, which can confound expectations. Suppose Jack and Jill are applying for courses at a college over a two week period. In the first week Jill gets accepted into 0 of 3 colleges and Jack gets accepted into 1 of 7. In the second week Jill gets accepted into 5 of the 7 colleges and Jack gets accepted into 3 of 3. Here are their results:

        Week 1      Week 2      Total

Jill       0/3               5/7         5/10 

Jack    1/7               3/3         4/10 

Both times Jack brought about a higher percentage of college acceptances than Jill, but the actual number of colleges into which each was accepted was not the same each week. From an equal sample size, Jill’s ratio is higher and, therefore, so is her overall percentage. It only appears like a paradox when the percentage is provided in isolation from the percentage and the ratio. Based only on percentages, Jack’s is higher than Jill’s on both weeks (14.2% and 100% compared with Jill’s 0% and 71%) even though over 2 weeks Jill’s proportion of college successes is higher. The fact that Jack can be better in each week but worse over 2 weeks is a good underlying principle that’s often repeated in many of the bogus claims of unfair discrimination we see - especially when important causal relations are omitted.