I saw an interesting complaint about distributions in GCSE results, and then a second article, posing the loaded question “Why do some subjects not have bell curves (see here and here)?” The big issue with their line of enquiry is apparent confusion between what they call a ‘normative process’ and a normal distribution. You see, most bell-shaped data you encounter in the real world isn’t actually normal - at least not in the strict mathematical sense. The normal distribution - also known as the Gaussian distribution - follows a precise mathematical form:
f(x)
= 1 / sqrt(2 * pi * sigma^2) * exp(-(x - mu)^2 / (2 * sigma^2))
It’s perfectly
symmetric around its mean mu, defined only by its mean and variance sigma^2,
and has tidy, well-known proportions, where about 68% of values fall within one
standard deviation of the mean, about 95% within two, and about 99.7% within
three. That’s the pure Gaussian world - it’s elegant, compact, but rarer than
you think, so it’s not really ‘normal’ at all. If it's not perfectly symmetric and unbounded on both sides (that is, with no limits, skews, outliers and mixtures) then it is not technically a normal distribution.
So, it’s certainly the
case that many real-world data sets produce histograms that look like
bell curves, yet deviate in subtle but crucial ways. Some such examples would be
fat tailed distributions (like in many financial data sets), which have more
extreme outliers than a Gaussian predicts; asymmetric processes (like
biological weight categories); and combinations of multiple subpopulations
(like male and female height) can produce an overall bell shape that’s not
truly normal, to name three examples. All that is to say, the bell curve is a shape,
but it’s not a guarantee of normality. Anyone who uses the concept of normality
when it isn’t really there opens themselves up to errors, like the increased
likelihood of underestimating the probability of rare or extreme events, like
misapplying statistical tests that rely on Gaussian assumptions, and like
drawing misleading inferences from what seems on the surface like ‘tidy’ data.
If I may suggest a
better way to think about normal here. The normal distribution is an
idealisation - a mathematical lens that sometimes fits reality well, but often
only approximately. Real data lives in a messier world. Consequently, while the Schools Week article rightly draws attention to the potential harms
of rigidly applying a bell curve to educational outcomes, it implicitly assumes
that the bell curve is a natural or unavoidable benchmark for student ability,
which is a hasty assumption. Student performance, like many social phenomena,
is influenced by a mix of factors - personality profiles, teaching quality,
socioeconomic background, learning differences, levels of freedom, etc - which
do not necessarily conform neatly to a symmetric Gaussian model.
More broadly, this reflects a deeper issue that has
pervaded education for too long: the push toward homogenisation - an obsession
with fitting diverse learners into uniform models of “average” performance.
This not only misrepresents the true distribution of abilities (especially at
the upper end), but also unfairly neglects or distorts those who fall outside
the assumed curve.