Thursday, 23 October 2025

Normal Distributions Normally Aren't Normal

 

I saw an interesting complaint about distributions in GCSE results, and then a second article, posing the loaded question “Why do some subjects not have bell curves (see here and here)?” The big issue with their line of enquiry is apparent confusion between what they call a ‘normative process’ and a normal distribution. You see, most bell-shaped data you encounter in the real world isn’t actually normal - at least not in the strict mathematical sense. The normal distribution - also known as the Gaussian distribution - follows a precise mathematical form: 

f(x) = 1 / sqrt(2 * pi * sigma^2) * exp(-(x - mu)^2 / (2 * sigma^2))

It’s perfectly symmetric around its mean mu, defined only by its mean and variance sigma^2, and has tidy, well-known proportions, where about 68% of values fall within one standard deviation of the mean, about 95% within two, and about 99.7% within three. That’s the pure Gaussian world - it’s elegant, compact, but rarer than you think, so it’s not really ‘normal’ at all. If it's not perfectly symmetric and unbounded on both sides (that is, with no limits, skews, outliers and mixtures) then it is not technically a normal distribution. 

So, it’s certainly the case that many real-world data sets produce histograms that look like bell curves, yet deviate in subtle but crucial ways. Some such examples would be fat tailed distributions (like in many financial data sets), which have more extreme outliers than a Gaussian predicts; asymmetric processes (like biological weight categories); and combinations of multiple subpopulations (like male and female height) can produce an overall bell shape that’s not truly normal, to name three examples. All that is to say, the bell curve is a shape, but it’s not a guarantee of normality. Anyone who uses the concept of normality when it isn’t really there opens themselves up to errors, like the increased likelihood of underestimating the probability of rare or extreme events, like misapplying statistical tests that rely on Gaussian assumptions, and like drawing misleading inferences from what seems on the surface like ‘tidy’ data.

If I may suggest a better way to think about normal here. The normal distribution is an idealisation - a mathematical lens that sometimes fits reality well, but often only approximately. Real data lives in a messier world. Consequently, while the Schools Week article rightly draws attention to the potential harms of rigidly applying a bell curve to educational outcomes, it implicitly assumes that the bell curve is a natural or unavoidable benchmark for student ability, which is a hasty assumption. Student performance, like many social phenomena, is influenced by a mix of factors - personality profiles, teaching quality, socioeconomic background, learning differences, levels of freedom, etc - which do not necessarily conform neatly to a symmetric Gaussian model.

More broadly, this reflects a deeper issue that has pervaded education for too long: the push toward homogenisation - an obsession with fitting diverse learners into uniform models of “average” performance. This not only misrepresents the true distribution of abilities (especially at the upper end), but also unfairly neglects or distorts those who fall outside the assumed curve.

/>