Simpson's paradox

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling (e.g., through cluster analysis).

Simpson's paradox has been used to illustrate the kind of misleading results that the misuse of statistics can generate.

Edward H. Simpson first described this phenomenon in a technical paper in 1951; the statisticians Karl Pearson (in 1899) and Udny Yule (in 1903) had mentioned similar effects earlier. The name Simpson's paradox was introduced by Colin R. Blyth in 1972. It is also referred to as Simpson's reversal, the Yule–Simpson effect, the amalgamation paradox, or the reversal paradox.