Simpson's paradox (or the Yule-Simpson effect) is a statistical paradox described by E. H. Simpson in 1951 and G. U. Yule in 1903, in which the successes of several groups seem to be reversed when the groups are combined. This seemingly impossible result is encountered surprisingly often in social science and medical statistics.

As an example, suppose two people, Ann and Bob, are let loose on Wikipedia. In the first test, Ann improves 60 percent of the articles she edits while Bob improves 90 percent of the articles he edits. In the second test, Ann improves just 10 percent of the articles she edits, while Bob improves 30 percent.

Both times, Bob improved a much higher percentage of articles than Ann - yet when the two tests are combined, Ann has improved a much higher percentage than Bob!

The result comes about this way: In the first test, Ann edits 100 articles, improving 60 of them, while Bob edits just 10 articles, improving 9 of them. In the second test, Ann edits only 10 articles, improving 1 of them, while Bob edits 100 articles, improving 30 of them. When the two tests are added together, both edited 110 articles, yet Ann improved 61 of them (55 percent) while Bob improved only 39 of them (35 percent)!

 Test 1 Test 2 Total Ann 60 / 100 1 / 10 61 / 110 Bob 9 / 10 30 / 100 39 / 110

It appears that the two sets of data separately support a certain hypothesis, but, considered together, support the opposite hypothesis.

To recap, introducing some notation that will be useful later:

• In the first test, Ann improved 60% of the articles she edited (SA(1) = 60%), while Bob's success rate was 90% (= SB(1)) Success is associated with Bob.
• In the second test Ann managed 10% (SA(2)) while Bob achieved 30% (SB(2)). On both occasions Bob's edits were more successful than Ann's. Success is again associated with Bob.
• But if we combine the two tests, we see that Ann and Bob both edited 110 articles, and that Ann improved 61 (SA = 61/110) while Bob improved only 39 (SB = 39/110).
• SB < SA. Success is now associated with Ann. Bob is better on every test but worse overall!

The arithmetical basis of the paradox is uncontroversial. If SB(1) > SA(1) and SB(2) > SA(2) we feel that SB must be greater than SA. However if different weights are used to form the overall score for each person then this feeling may be disappointed. Here the first test is weighted 100/110 for Ann and 10/110 for Bob while the weights are reversed on the second test.

SA = 100/110SA(1) + 10/110SA(2).

SB = 10/110SB(1) + 100/110SB(2).

By more extreme reweighting A's overall score can be pushed up to 60% and B's down to 30%.

The arithmetic allows us to see through the paradox but there is still the conflict between the individual performances and the overall performance: who is better, A or B? Ann and Bob's creator thought Ann was better--her overall success rate is higher. But it is possible to retell the story so that it appears obvious that B is better. A and B are now hospitals and the two tests have become two types of patient: mild and severe. The numerical data is as before: B is better at curing both types of patient but its overall success rate is worse because almost all (100/110) of its patients are severe cases while almost all of A's are mild (100/110). The association of success with A is misleading, even spurious.

In this retelling has something been added, or has a tacit assumption of the Ann and Bob story been changed? These issues are discussed in the modern literature on Simpson's paradox. Although statisticians have known about the Simpson's paradox phenomenon for over a century, there has lately been a revival of interest in it and philosophers, computer scientists, epidemiologists, economists and others have discussed it too.

 Contents

Perhaps the example of the paradox most commonly presented in popular literature in America involves batting averages in baseball. It is possible - and in rare occasions it has actually happened - for one player to hit for a higher average than another player during the first half of the year and to do so again in the second half, but for the second player to have a higher overall batting average for the entire year.

This can only happen if the players have very different numbers of at-bats in each half of the season, as shown in this example:

```            First Half     Second Half      Total season
Player A     4/10 (.400)   25/100 (.250)    29/110 (.264)
Player B   35/100 (.350)    2/10  (.200)    37/110 (.336)
```

## Literature

• Simpson, E. H. (1951), "The Interpretation of Interaction in Contingency Tables," Journal of the Royal Statistical Society, Ser. B, 13, 238-241

For a brief history of the origins of the paradox see the entries on Simpson's Paradox and Spurious Correlation in

For a recent technical discussion with many references see

• Art and Cultures
• Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
• Space and Astronomy