This Year 1 statistics topic covers statistical diagrams — histograms, frequency polygons, box plots and cumulative frequency diagrams — scatter diagrams and regression, measures of location and spread, and identifying and cleaning outliers. A Year 2 addition, linearising y = axⁿ or y = kbˣ with logarithms, sits in the same spec section. It's assessed on Paper 3 and typically accounts for around 12 of the 300 marks — an estimate, since the split with sampling and probability shifts year to year.
It's also the statistics topic most tightly bound to the pre-released Large Data Set — Met Office weather records comparing 1987 and 2015. The data itself isn't in the exam, but questions assume familiarity with its units and conventions — oktas for cloud cover, knots for windspeed, 'tr' for trace rainfall — and examiners have repeatedly reported that this costs accessible marks.
The booklet gives Sxx and the standard-deviation formula (either divisor accepted), but the mean formulae aren't included. Calculating a regression line's own coefficients is off the syllabus — you only use a given line, within the range of data it was fitted to.
The specification statements this topic covers. AS = Year-1 content, also assessed in the standalone AS course (8MA0); A2 = full A level only. Typical share of a 300-mark series: ≈12 marks — our estimate from the 2018–2025 papers, not an official weighting.
| Ref | Spec statement | Level |
|---|---|---|
| 2.1 | Statistical diagrams | AS |
| 2.2 | Scatter diagrams and regression | AS |
| 2.2 | Linearising with logarithms | A2 |
| 2.3 | Location and spread | AS |
| 2.4 | Outliers and cleaning data | AS |
Examiners repeatedly report LDS-flavoured parts answered worst by students who haven't looked at the data — in one series, over half scored zero on a part needing only that cloud cover is measured in whole oktas from 0 to 8. The fix is knowing the variables and units, not memorising figures.
The single most repeated comment in statistics reports is that a conclusion needs writing in context, not left as a bare number. Describing a diagram, explaining an outlier, or justifying a choice of average all need a sentence tied to what the data represents.
Examiners repeatedly report predictions made well outside the data's range, and correlation reworded as causation. A regression line only earns credit inside the sampled range, and correlation describes association, not cause.
When a question asks you to criticise a presentation, or justify median and IQR over mean and SD, many answers stop at a number. The mark is for the reasoning — usually that outliers are present, or that a diagram hides the feature being tested.
In a histogram, area represents frequency, not height — that shortcut only works with equal class widths. Skipping frequency density gives wrong frequencies wherever widths differ. Check total area, not the y-axis directly.
A box plot displays median, quartiles and range — not the mean or standard deviation. Comparisons citing 'higher mean' from a box plot alone score zero. Compare location by the median line and spread by the box and whiskers.
After fitting a line to log y against x (or log x), the gradient or intercept is usually log of the constant you want, not the constant itself. Write out what Y, X, gradient and intercept mean before substituting numbers.
The rule — 1.5 × IQR beyond the quartiles, or 3sd from the mean — is always given in the question; reaching for the other one by habit gives the wrong fences. Choosing median and IQR over mean and SD also needs the outliers cited as the reason.
Predicting well beyond the sampled x-range is treated as unreliable even with correct arithmetic, and using a y-on-x line to predict x is a separate error. A strong correlation shows association only — causation needs its own justification.
We haven’t published checked questions for this topic yet — a worked sample appears here only once a question has passed every check. In the meantime you can practise in the app.
Know what actually needs memorising: mean formulae aren't in the booklet, so make them automatic, while Sxx and the standard-deviation formula are given. Learn both outlier rules well enough to apply whichever a question specifies without hesitating, and practise linear interpolation for a median or percentile from grouped data until it's routine.
This is a topic where bounded extra preparation pays off directly: work through the actual pre-released Large Data Set once, note its units and codings, then practise LDS-based questions from a few different series so its style of interpretation question stops feeling unfamiliar.
Make sure your calculator pulls summary statistics straight from data — an expected feature, not a shortcut — but still write down what you're calculating, since forming it often carries its own mark. Mix diagram-reading, regression and outlier questions when practising, since real questions often chain several of these skills together.
All 19 topics: Edexcel A level Maths topic guides. Reference: formula booklet vs memorise and grade boundaries.
Ready to practise this topic properly? Start free — no card needed.
Original questions written for the Pearson Edexcel A Level Mathematics (9MA0) specification. Not affiliated with or endorsed by Pearson.