Overview Data Collection
In my opinion there is nothing too challenging in data collection and common sense will help to answer most questions. You do have to be familiar with some of the jargon used (see the definitions below). Stratified sampling (see question and answer at the end of this article) is seen as a A or A* topic but even this is common sense and being able to use percentages.
This is as much about definitions as it is about numbers, I’m going to start by listing the key definitions.
Hypothesis: A proposal that may be:
- Tested by further investigation
- Readily measured
- Answered with simple yes/no or true/false
Data: Factual Information
Primary Data: Data you collect yourself
Secondary Data: Data that has already been collected. Examples include: internet, newspapers and books.
Qualitative Data: Can only be described in words (think quality = words).
Quantitive Data: Can be given numerical values (think quantity = numbers).
Discrete Data: Can only have certain values such as whole numbers or fractions. For example shoe sizes can only be 1, 1 1/2, 2, 2 1/2, 3 etc.
Continuous Data: Can have any value within a range. For example the length of an adult’s foot could be any value between say 10 cms and 50cms.
Grouped Frequency Tables: There are 2 types. Discrete data grouped frequency tables and continuous data grouped frequency tables.
Example of Discrete Data Grouped Frequency Table
The class intervals should be equal, in this case they have a range of five. The groups must not overlap. You need to total the frequency column to ensure that you account for the whole population. In addition it may help to sort the data into ascending order and/or use a tally count to make it easier to count group totals.
Example of Grouped Frequency Table for Continuous Data
Again you need a check total. You must use the symbols ≤ ≥ as you are dealing with continuous data. You need to take care with values that are right at the edge of a class interval (for example 170.0 in this set of data).
Two-way tables make use of columns and rows to sub-divide and analyse data. The division of data in this way enables better understanding. This is best shown with an example. In this example the columns and rows are totaled.
This shows the advantages of sub-dividing the data by using a two-way table. If the data was not split by gender then we would say for example that “Soaps” is the most popular type of TV programme, without understanding that soaps are are far more popular with men than they are with women. You may find this result surprising, could it be that the 150 people sampled are not representative of the total population?
You can carry out a survey to gather information. Questionnaires are forms with pre-prepared questions that may be used to carry out a survey in a structured way.
Care must be taken to ensure that the questions are appropriate. Good questionnaires have the following characteristics:
- Specific questions that are easy to answer (for example yes/no answers)
- Use tick boxes
- Ensure possible answers do not overlap
- Avoid personal questions
- Avoid leading questions
- Keep the questionnaire as short as possible
Ideally when you are completing a survey you gather information from all possible sources. Such a survey is known as a census.
However, unless the population (all the people that could contribute to a survey) is very small or you have unlimited time and money you will need to use a sample of the population.
In order for a survey to give meaningful results the sample must be representative. This means that all sub-groups within the population must be fairly represented in the sample.
Stratified sampling is classified as grade A or even grade A* but, to me, it seems relatively straightforward. It’s probable best explained by looking at a question.
A leisure centre wants to survey young adults that live within one kilometre of the centre. The total population of such young adults is as follows:
The leisure centre can only afford to survey 100 people. How many males and females of each age should be surveyed?
Knowledge/Method to Answer the Question
So how to go about this question? Each data point in the above table (e.g 16 year-old females) represents a stratum (stratum, plural strata, means layer) of the total population. Stratified sampling ensures that each stratum is fairly represented in the survey.
You can use a formula to calculate the sample size of each stratum (you could learn this, but I think it would be better to understand the logic/common sense behind it).
Anyway, here’s the formula:
(Size of stratum/size of population) x size of total sample = Sample size of stratum.
Taking the first stratum, 16 year-old females this gives:
(45÷585) x 100 = 7.6923 (round to eight)
I find it helpful to put this into words to see the logic behind it. 16 year-old girls represent 45/585 = 0.076923 or 7.6923% of the total population. Our total sample size is 100, so we need 100 x 0.076923 = 7.6923, round up to 8 as the sample size for this stratum.
Sample size for 16 year-old females = 45/585 x 100 = 7.6923 = 8 (rounded)
Sample size for 16 year-old males = 56/585 x 100 = 9.57264 = 10 (rounded)
Using the same method for every stratum gives the following sample sizes:
MindMap Data Collection
Here is my MindMap of Data Collection. I find that preparing a mindmap is the best way to prepare revision notes. The act of trying to cram the whole topic onto one page helps me to fix it in my brain. In addition the use of colours and shapes makes it memorable. Once I’ve got a mindmap, I can just look at it to revise a whole chapter of Maths. This is useful but its not as good as practice questions.