Exam code:1ST0
Data Collection Basics
What are different ways to collect data?
-
You should be familiar with different methods of collecting data
-
You can use direct observation to collect data
-
This means observing the things you are interested in and recording what you observe
-
For example to study pedestrians’ use of mobile phones you might observe people walking past a certain spot in a town and tally the numbers who are or aren’t looking at a mobile phone while they walk
-
-
You will need an appropriate data collection sheet for recording your data
-
This will usually be a table or tally chart, with appropriate rows or columns for the data you are collecting
-
For the example above you could use a tally chart, with rows for ‘looking at a mobile phone’ and ‘not looking at a mobile phone’
-
-
An advantage of observation can be not affecting the natural behaviour of the things you are observing
-
But a possible disadvantage is not having any control over the things you are studying
-
-
-
You can also conduct an experiment to collect data
-
This is done to see how changes in one variable (the explanatory or independent variable) affect another variable (the response or dependent variable)
-
It is important to control extraneous variables (see the ‘Extraneous Variable’ spec point)
-
Different types of experiment (laboratory experiments, field experiments, and natural experiments) have different advantages and disadvantages
-
including different levels of control for extraneous variables
-
-
Sometimes a pre-test will be used before starting on a full experiment
-
The intended experiment is run on a small sample
-
This may reveal any problems with the design of the experiment
-
And allow the problems to be fixed before the experiment is run for real
-
-
-
Simulation can be used to model events in the real world
-
Data is collected from the model to predict what would happen in the real world
-
This may be easier or cheaper than collecting real world data
-
Random processes may be involved (including the use of random numbers)
-
For example say 23% of the UK population possesses a certain genetic marker
-
A two-digit random number generator could serve as a model ‘person’
-
A number from 00 to 22 means the ‘person’ has the genetic marker, and a number from 23 to 99 means they don’t
-
-
-
You can gather data from individuals using questionnaires or interviews
-
These need to be used carefully to avoid bias or other possible issues
-
See the ‘Questionnaires & Interviews’ spec point
-
-
-
You can also use reference sources to collect secondary data
-
e.g., government census data, online sources, etc.
-
Remember that the source of secondary data needs to be acknowledged
-
-
See the ‘Types of Data’ revision note
-
What are the advantages and disadvantages of different kinds of experiment?
-
You should know the advantages of different types of experiment for collecting data
-
Laboratory experiments
-
Conducted in a controlled environment (it doesn’t have to happen in an official laboratory!)
-
For example studying people’s sleep patterns in a special room where lighting, temperature, bedding materials, etc. are all under the researchers’ control
-
-
Advantages include
-
Easy to control extraneous variables
-
Easy to repeat the experiment under exactly the same conditions
-
-
Disadvantages include
-
Test subjects may not behave naturally in the controlled environment
-
-
-
Field experiments
-
Conducted in the subject’s usual environment, but with the researcher controlling the situation and certain variables
-
For example studying people’s sleep patterns in their own beds at home, but with the researchers providing specific types of pillow and deciding what time subjects should go to bed
-
-
Advantages include
-
More likely than a laboratory experiment to show usual or natural behaviour
-
-
Disadvantages include
-
Can’t control all extraneous variables
-
Harder to repeat the experiment under exactly the same conditions
-
-
-
Natural experiments
-
Conducted in the subject’s usual environment, without the researcher controlling the situation or variables
-
For example studying people’s sleep patterns in their own beds at home, with the subjects using their own beds and bedding, going to sleep at their usual times, etc.
-
-
Advantages include
-
More likely than a laboratory experiment to show usual or natural behaviour
-
-
Disadvantages include
-
Can’t control any extraneous variables
-
Harder to repeat the experiment under exactly the same conditions
-
-
What are validity and reliability with regards to collected data?
-
We say that data is reliable when repeated measurements give similar results
-
i.e. if you collected the data again under similar circumstances you would get similar results
-
For example, using a scale to weigh some samples
-
It should give the same result if the same sample is weighed again
-
-
The reliability of collected data is the extent to which this is true
-
-
We say that data is valid if it measures what it was intended to measure
-
i.e. the data should be telling you what you think it is telling you
-
For example, using a questionnaire to assess participants’ stress levels
-
To be valid, scores from the questionnaire should agree with other accepted ways of measuring stress
-
-
The validity of collected data is the extent to which this is true
-
-
Reliability and validity are both very important for collected data
-
The more reliable and valid data is, the more we can trust any predictions or conclusions made from it
-
Worked Example
Tomas is a researcher studying obedience in pet dogs. He plans to study 8 different dogs. For each dog, he will first visit the dog at its home, ask it to perform 10 basic commands, and record how many the dog successfully carries out. Two days later, Tomas will visit each dog at home a second time, ask it to do the same 10 commands, and record how many the dog successfully carries out.
(a) Design a data collection sheet that Tomas could use to record the results of his experiment.
Tomas will need to record the data for the 8 different dogs
For each dog he will need to record two different data values (the number of commands successfully carried out on each visit)
The best way to do this will be in a table
|
Dog |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
|
1st visit |
||||||||
|
2nd visit |
(b) Explain whether Tomas is conducting a laboratory experiment, a field experiment, or a natural experiment.
He is visiting the dogs at their homes, so he is not carrying out a laboratory experiment
He is controlling what the dogs are asked to do on each visit, so it is not a natural experiment
Tomas is visiting the dogs in their home environments, but he is also controlling what they are asked to do on each visit. Therefore it is a field experiment.
(c) Explain what Tomas has done to help assure the reliability of his experimental results.
Tomas is asking the dogs to perform the same 10 commands each time
He is testing them in the same setting (their home) each time
If his experiment is reliable he should get approximately the same results on both visits
He is visiting each dog twice. Both visits are in the dogs’ homes, and they are asked to do the same 10 commands each time. This will test whether he gets similar results for each dog when tested in similar circumstances, and help to show whether his results are reliable.
Questionnaires & Interviews
What makes a good questionnaire?
-
A questionnaire contains a set of questions that are used to collect data
-
A person who completes a questionnaire is known as a respondent
-
-
You should know the difference between open and closed questions
-
An open question has no suggested answers, and a respondent can answer anything at all to them
-
For example, ‘How do you think the current town council is doing?’
-
Every answer can be different
-
so it can be hard to summarise or analyse the data as a whole
-
-
-
A closed question offers the respondent a number of answers to choose from
-
For example, ‘The current town council is doing a great job. Choose one: ☐ Agree ☐ Disagree’
-
It is possible to record how many people choose each response
-
This makes it easier to summarise and analyse the data
-
-
Closed questions will often use an opinion scale
-
For example offering the options ‘strongly agree’, ‘agree’, disagree’ and ‘strongly disagree’
-
A problem with opinion scales is that most people tend to choose responses ‘in the middle’, so the data collected might be biased towards those middle values
-
-
-
There are a number of things to consider when creating a questionnaire
-
Avoid leading questions
-
These are questions that suggest a particular answer
-
For example ‘How delighted are you with our awesome new product?’
-
This is ‘leading’ the respondent to give a positive answer
-
The responses collected are likely to be biased
-
-
-
Make sure that options offered cover all possibilities
-
For example, ‘How many time per day do you use our app? ☐ 1 time ☐ 2 times ☐ 3-5 times’
-
This doesn’t offer ‘0’ or ‘more than 5’ as options
-
-
You may need to include options like ‘never’, ‘other’ or ‘I don’t know’
-
-
Make sure any intervals given do not overlap
-
For example, ‘How much do you spend per month on widgets? ☐ £0 to £5 ☐ £5 to £10 ☐ More than £10’
-
‘£5’ is included in the first and second options!
-
-
-
Be sure to be specific about time frames
-
For example, ‘How many text messages do you send per week?’ is better than ‘How many text messages do you send?’
-
-
Keep questions short
-
and use language that is simple and easy to understand
-
-
Be careful about asking sensitive questions
-
i.e. questions about personal matters (age, etc.) or about things people may not want to discuss (‘How many times have you stolen things from shops?’)
-
People may not answer the questions
-
Or they may not answer them honestly
-
-
-
-
Sometimes a pilot survey will be used before giving the questionnaire to all the respondents in the intended survey
-
The questionnaire is first given to a smaller sample of people
-
This may reveal any problems with the design of the questionnaire
-
And allow the problems to be fixed before the questionnaire is used for real
-
What are the advantages and disadvantages of interviews versus anonymous questionnaires?
-
In interviews an interviewer asks the questions to the respondents and records their responses
-
This can be done in person or by phone
-
Advantages of interviews:
-
The response rate is higher
-
i.e. every person interviewed will tend to answer the questions
-
-
The interviewer can explain questions (if necessary)
-
The respondent can explain their answers
-
This avoids unclear or ambiguous answers being recorded
-
-
A good interviewer can help respondents feel more comfortable when answering sensitive questions
-
-
Disadvantages of interviews:
-
Conducting interviews can take a lot of time
-
So interviews can take longer and be more expensive
-
-
The sample size will usually be smaller than when using questionnaires
-
This can make the sample less representative
-
-
Respondents may be less likely to be honest or to answer sensitive questions in an in-person interview
-
Or respondents may try to boast or to give the answers they think the interviewer wants to hear
-
-
There may be interviewer bias
-
This is when the opinions or expectations of the interviewer affect the answers given by the respondent
-
For example the interviewer may ask a question in a way that leads the respondent towards giving a particular answer
-
This can lead to biased results
-
-
-
-
Questionnaires will normally be given to people to fill in anonymously
-
This can be a printed form or a form accessible online
-
Advantages of questionnaires:
-
Respondents can answer questions in their own time
-
This can make the survey quicker and cheaper to run
-
-
Questionnaires can be sent to a large sample
-
This can make the sample more representative
-
-
Respondents may be more likely to be honest and to answer sensitive questions in an anonymous questionnaire
-
There is no interviewer bias
-
-
Disadvantages of questionnaires:
-
The response rate is lower
-
People may not answer all the questions, or may not complete or return the questionnaire at all
-
-
A respondent may not understand the questions
-
A respondent’s answers may be unclear or ambiguous
-
-
What is the random response method for collecting sensitive data?
-
Even in an anonymous questionnaire, people may not be willing to give honest answers to sensitive questions
-
The random response method is a way to get better responses for these sorts of questions
-
It uses some sort of random event (for example a coin flip) to determine how a question will be answered
-
-
For example, say you wanted to collect data on people using handheld phones while driving
-
This is illegal in the UK
-
So people may not be willing to admit that they have done it
-
-
You could ask the question in this form:
-
“Have you ever driven while using a handheld phone?
Flip a coin.
If you get heads, then answer Yes.
If you get tails, then answer honestly.” -
There is no way to know if a person answering yes really did drive while using a handheld phone, or whether they only answered yes because they flipped the coin and got heads
-
-
To estimate the response rate for a random response question:
-
Estimate the number of people who answered a certain way because of the random event
-
For example, with a coin flip about half the people will get heads and half will get tails
-
-
Remove that many responses from the data set
-
For the example used above, say 1000 people responded to the question
-
We would expect half of them to answer yes because they got heads on the coin
-
So remove 500 yes answers from the data set
-
-
Perform your analysis on the remaining items in the data set
-
See the Worked Example
-
Worked Example
A researcher is designing a questionnaire in order to collect data on how often people illegally download music.
One question the researcher is thinking of using is the following:
“A lot of people say that downloading music illegally is really okay, because it doesn’t hurt anyone. How bad do you think it is to download music illegally?”
(a) State with a reason whether that is an open or a closed question.
Remember, in a closed question respondents are given a fixed set of responses to choose from
A person could give any answer at all to that question, so it is an open question
(b) State one thing that is wrong with the way the question is asked.
They start by saying that a lot of people think it’s okay, before asking the actual question
This is leading a respondent towards giving a certain type of answer
It is a leading question, because it starts off by saying that a lot of people think it’s okay to download music illegally
In the final version of the questionnaire, one of the questions is as follows:
“Have you ever downloaded music illegally?
Before answering the question, flip a coin.
If you get heads on the coin, then answer Yes.
If you get tails on the coin, then answer honestly.”
The questionnaire is sent to a large number of people. 1332 people answer Yes to that question, and 1068 answer No.
(c) Estimate the percentage of people in the sample who have downloaded music illegally.
Start by figuring out the total number of people who responded
<img alt=”1332 plus 1068 equals 2400″ data-mathml=”<math ><semantics><mrow><mn>1332</mn><mo>+</mo><mn>1068</mn><mo>=</mo><mn>2400</mn></mrow><annotation encoding=”application/vnd.wiris.mtweb-params+json”>{“fontFamily”:”Times New Roman”,”fontSize”:”18″,”autoformat”:true,”toolbar”:”<toolbar ref=’general’><tab ref=’general’><re
Responses