Exam code:1ST0
Comparing data sets
How do I compare two data sets?
-
You may be given two sets of data that relate to a context
-
To compare data sets, you need to
-
compare an average (measure of central tendency)
-
Mode, median or mean
-
-
AND compare a measure of spread (measure of dispersion)
-
Range, interquartile range (IQR), interpercentile or interdecile range, standard deviation
-
-
-
You need to use the same average and the same measure of spread for both data sets
-
You may need to decide which average should be used
-
See the ‘Using Measures of Central Tendency’ revision note
-
-
Which measure of spread to use depends on which average is used
-
If you compare the modes of the data sets
-
then use the range (quantitative data only)
-
-
If you compare the medians of the data sets
-
then use the range, interquartile range, interpercentile range or interdecile range
-
interquartile range is the most common choice
-
standard deviation should not be used with median
-
-
If you compare the means of the data sets
-
then use the range or standard deviation
-
standard deviation is the most common choice, if it is available
-
interquartile range should not be used with mean
-
-
How do I write a conclusion when comparing two data sets?
-
When comparing averages and spreads, you need to
-
compare numbers
-
describe what this means in the context of the question (‘in real life’)
-
-
Copy the exact wording from the question in your answer
-
There should be four parts to your conclusion
-
For example:
-
“The median score of class A (45) is higher than the median score of class B (32).”
-
“This means that, on average, class A performed better than class B in the test.”
-
“The range of class A (5) is lower than the range of class B (12).”
-
“This means the scores in class A were less spread out than scores in class B.”
-
-
Other good phrases for lower ranges include:
-
“scores are closer together“
-
“scores are more consistent“
-
“there is less variation in the scores”
-
-
What restrictions are there when drawing conclusions?
-
The data set may be too small to be truly representative
-
Measuring the heights of only 5 pupils in a whole school is not enough to talk about averages and spreads
-
-
The data set may be biased
-
Measuring the heights of just the older year groups in a school will make the average appear too high
-
-
The conclusions might be influenced by who is presenting them
-
A politician might choose to compare a different type of average if it helps to strengthen their argument!
-
What else could I be asked?
-
You may need to think from the point of view of another person
-
A teacher might not want a large spread of marks
-
It might show that they haven’t taught the topic very well!
-
-
An examiner might want a large spread of marks
-
It makes it clearer when assigning grade boundaries, A, B, C, D, E, …
-
-
-
You may be asked to compare data from a sample with data from the population as a whole
-
For example, to determine how representative the sample is of the population
-
Examiner Tips and Tricks
-
To get full marks when when comparing data sets in the exam, you must
-
be sure to use appropriate averages and measures of spread
-
compare the numbers
-
say what the numbers mean in the context of the question
-
Worked Example
Julie collects data showing the distances travelled by snails and slugs during a ten-minute interval. She records a summary of her findings, as shown in the table below.
|
|
Median |
Interquartile Range |
|
Snails |
7.1 cm |
3.1 cm |
|
Slugs |
9.7 cm |
4.5 cm |
Compare the distances travelled by snails and slugs during the ten-minute interval.
Compare the numerical values of the median (an average)
Describe what this means in the context of the question
Slugs have a higher median than snails (9.7 cm > 7.1 cm)
This suggests that, on average, slugs travel further than snails
Compare the numerical values of the interquartile range (the spread)
Describe what this means in the context of the question
Snails have a lower range than slugs (3.1 cm < 4.5 cm)
This suggests that there is less variation in the distances travelled by snails
Standardising Data
What do we mean by standardising data?
-
It is possible to standardise the data collected in two samples
-
This makes it easier to compare data values in the two samples
-
e.g. Michelle scores an 80 on a maths exam and a 72 on an English exam
-
The two exams are quite different
-
So which of those is really the ‘better’ score?
-
-
How do I standardise data?
-
Each data value is converted into a standardised score using the formula
-
-
the raw value is the original data value from the data set
-
the mean is the mean of the data set the raw value belongs to
-
the standard deviation is the standard deviation of the data set the raw value belongs to
-
The formula calculates how many standard deviations the raw value is away from the mean
-
-
This can also be written as
-
is the raw data value,
is the mean,
is the standard deviation
-
-
The formula is not on the exam formula sh
-
Responses