Comparing Data Distributions
Data distributions can be compared to understand how two or more sets of data are similar or different. This is often done using one measure of central tendency and or one measure of spread. Comparisons must be clear and based on the values calculated.
Using a Measure of Central Tendency
A measure of central tendency describes a typical value in a data set. Common measures include the mean, median and mode.
When comparing distributions using one measure of central tendency:
• calculate the same measure for each data set
• compare the values directly
• state which data set has the higher or lower typical value
For example, comparing medians can show which group generally has higher values, especially when data contains anomalies.
The same measure must be used for all data sets being compared
Choosing the Appropriate Measure
The choice of measure depends on the data.
The mean is useful when:
• data is numerical
• values are evenly spread
• there are no extreme values
The median is more suitable when:
• data is skewed
• anomalies are present
The mode is suitable when:
• data is categorical
• you want to compare the most common outcome
Using an inappropriate measure can lead to misleading comparisons.
Using a Measure of Spread
A measure of spread shows how spread out the data is.
The most common measure of spread at GCSE level is the range.
To compare distributions using the range:
• find the range for each data set
• compare the sizes of the ranges
• identify which data set is more spread out
A larger range indicates more variation in the data.
The range depends on extreme values
Comparing Using Both Measures
Sometimes a comparison is stronger when using both a measure of central tendency and a measure of spread.
For example:
• one data set may have a higher median
• but also a larger range
This suggests that although typical values are higher, the data is more variable.
Clear comparisons should mention:
• which measure is used
• what the values show
• what this means in context
Context Matters
Comparisons should always be linked to the context of the data.
For example:
• comparing test scores
• comparing journey times
• comparing prices
Statements should explain what the comparison tells us about real situations.
Avoid vague statements such as “better” or “worse” without explanation.
Common Errors to Avoid
Common mistakes include:
• using different measures for different data sets
• comparing means when data is skewed
• ignoring spread when only central tendency is considered
• making conclusions without context
Clear justification improves accuracy.
Key Points to Remember
Data distributions can be compared using central tendency or spread.
The same measure must be used for all data sets.
The mean, median or mode describe typical values.
The range describes spread.
Using context makes comparisons meaningful.
Comparing data distributions carefully helps identify differences and similarities and leads to clearer, more reliable statistical conclusions.