Overview Data Collection

In my opin­ion there is noth­ing too chal­leng­ing in data col­lec­tion and com­mon sense will help to answer most ques­tions. You do have to be famil­iar with some of the jar­gon used (see the def­i­n­i­tions below).  Strat­i­fied sam­pling (see ques­tion and answer at the end of this arti­cle) is seen as a A or A* topic but even this is com­mon sense and being able to use percentages.

This is as much about def­i­n­i­tions as it is about num­bers, I’m going to start by list­ing the key definitions.

Hypoth­e­sis: A pro­posal that may be:

  • Tested by fur­ther investigation
  • Read­ily measured
  • Answered with sim­ple yes/no or true/false

Data: Fac­tual Information

Pri­mary Data: Data you col­lect yourself

Sec­ondary Data: Data that has already been col­lected. Exam­ples include: inter­net, news­pa­pers and books.

Qual­i­ta­tive Data: Can only be described in words (think qual­ity = words).

Quan­ti­tive Data: Can be given numer­i­cal val­ues (think quan­tity = numbers).

Dis­crete Data: Can only have cer­tain val­ues such as whole num­bers or frac­tions. For exam­ple shoe sizes can only be 1, 1 1/2, 2, 2 1/2, 3 etc.

Con­tin­u­ous Data: Can have any value within a range. For exam­ple the length of an adult’s foot could be any value between say 10 cms and 50cms.

Grouped Fre­quency Tables: There are 2 types. Dis­crete data grouped fre­quency tables and con­tin­u­ous data grouped fre­quency tables.

Exam­ple of Dis­crete Data Grouped Fre­quency Table

The class inter­vals should be equal, in this case they have a range of five. The groups must not over­lap. You need to total the fre­quency col­umn to ensure that you account for the whole pop­u­la­tion. In addi­tion it may help to sort the data into ascend­ing order and/or use a tally count to make it eas­ier to count group totals.

Exam­ple of Grouped Fre­quency Table for Con­tin­u­ous Data

Again you need a check total. You must use the sym­bols ≤ ≥ as you are deal­ing with con­tin­u­ous data. You need to take care with val­ues that are right at the edge of a class inter­val (for exam­ple 170.0 in this set of data).

Two-Way Tables

Two-way tables make use of columns and rows to sub-divide and analyse data. The divi­sion of data in this way enables bet­ter under­stand­ing. This is best shown with an exam­ple. In this exam­ple the columns and rows are totaled.

This shows the advan­tages of sub-dividing the data by using a two-way table. If the data was not split by gen­der then we would say for exam­ple that “Soaps” is the most pop­u­lar type of TV pro­gramme, with­out under­stand­ing that soaps are are far more pop­u­lar with men than they are with women. You may find this result sur­pris­ing, could it be that the 150 peo­ple sam­pled are not rep­re­sen­ta­tive of the total population?


You can carry out a sur­vey to gather infor­ma­tion. Ques­tion­naires are forms with pre-prepared ques­tions that may be used to carry out a sur­vey in a struc­tured way.

Care must be taken to ensure that the ques­tions are appro­pri­ate. Good ques­tion­naires have the fol­low­ing characteristics:

  • Spe­cific ques­tions that are easy to answer (for exam­ple yes/no answers)
  • Use tick boxes
  • Ensure pos­si­ble answers do not overlap
  • Avoid per­sonal questions
  • Avoid lead­ing questions
  • Keep the ques­tion­naire as short as possible


Ide­ally when you are com­plet­ing a sur­vey you gather infor­ma­tion from all pos­si­ble sources. Such a sur­vey is known as a cen­sus.

How­ever, unless the pop­u­la­tion (all the peo­ple that could con­tribute to a sur­vey) is very small or you have unlim­ited time and money you will need to use a sam­ple of the population.

In order for a sur­vey to give mean­ing­ful results the sam­ple must be rep­re­sen­ta­tive. This means that all sub-groups within the pop­u­la­tion must be fairly rep­re­sented in the sample.

Strat­i­fied Sampling

Strat­i­fied sam­pling is clas­si­fied as grade A or even grade A* but, to me, it seems rel­a­tively straight­for­ward. It’s prob­a­ble best explained by look­ing at a question.


A leisure cen­tre wants to sur­vey young adults that live within one kilo­me­tre of the cen­tre. The total pop­u­la­tion of such young adults is as follows:

 The leisure cen­tre can only afford to sur­vey 100 peo­ple. How many males and females of each age should be surveyed?

Knowledge/Method to Answer the Question

So how to go about this ques­tion? Each data point in the above table (e.g 16 year-old females) rep­re­sents a stra­tum (stra­tum, plural strata, means layer) of the total pop­u­la­tion. Strat­i­fied sam­pling ensures that each stra­tum is fairly rep­re­sented in the survey.

You can use a for­mula to cal­cu­late the sam­ple size of each stra­tum (you could learn this, but I think it would be bet­ter to under­stand the logic/common sense behind it).

Any­way, here’s the formula:

(Size of stratum/size of pop­u­la­tion) x size of total sam­ple = Sam­ple size of stratum.

Tak­ing the first stra­tum, 16 year-old females this gives:

(45÷585) x 100 = 7.6923 (round to eight)

I find it help­ful to put this into words to see the logic behind it. 16 year-old girls rep­re­sent 45/585 = 0.076923 or 7.6923% of the total pop­u­la­tion. Our total sam­ple size is 100, so we need 100 x 0.076923 = 7.6923, round up to 8 as the sam­ple size for this stratum.


Sam­ple size for 16 year-old females = 45/585 x 100 = 7.6923 = 8 (rounded)

Sam­ple size for 16 year-old males = 56/585 x 100 = 9.57264 = 10 (rounded)

Using the same method for every stra­tum gives the fol­low­ing sam­ple sizes:

MindMap Data Collection

mindmap 1

mindmap 1

Here is my MindMap of Data Col­lec­tion. I find that prepar­ing a mindmap is the best way to pre­pare revi­sion notes. The act of try­ing to cram the whole topic onto one page helps me to fix it in my brain. In addi­tion the use of colours and shapes makes it mem­o­rable. Once I’ve got a mindmap, I can just look at it to revise a whole chap­ter of Maths. This is use­ful but its not as good as prac­tice questions.