It is important that proper sampling techniques are used to ensure a survey is representative. However, it is also not always possible to achieve representative results, especially if random sampling is employed. To compensate for this type of sampling error, survey results can be weighted; this is where the data is adjusted so that it is more representative of the population surveyed. Weighting is a technique whereby the data structure is made similar to the population structure to obtain estimates which are unbiased and representative for the population one wants to make inferences about. This article shows how weighting works, using some fictitious figures.
Suppose that you think that gender has a large impact on your survey data, but three-quarters of the respondents were female, rather than half (the actual male/ female split in the UK is 49% male/ 51% female, but it is easier to understand the weighting if 50/50 is used). The raw data here will not provide a representative sample of the population because more responses are from women If however, you do not think that, in this example, gender will have an impact on the findings there would be very little to gain by weighting the results by gender. Figure 1 shows how the weighting would be calculated.
A weight of 2 is then assigned to all the surveys completed by men. This means that each survey completed by a man is ‘worth’ two surveys. Conversely, each survey completed by a woman is assigned a weight of 0.67, meaning that one survey completed by a woman is ‘worth’ two-thirds of a survey. An ‘ideal’ weight of 1 would mean that the percentage of men and women surveyed matched the percentage of men and women in the population as a whole.
Figure 2 and Figure 3, show how this works in practice.
As can be seen in Figure 3, the bases for male and female are now both 50, but the percentages of men and women saying ‘yes’ are the same in both figures. However, the number of individuals making up each percentage split has altered. For example, in Figure 2, 10 men said ‘yes’, but because a weight of 2 has been given to men, in Figure 9, 20 men have said ‘yes’. This has the effect of making the results more representative of the whole population and has altered the overall response to Q1. The weighted figure of 60% saying ‘yes’ is more accurate than the unweighted figure of 70%.
The weighting technique described above is known as ‘cell weighting’ and refers to weighting by just one variable (in this case gender). It is also possible to weight data by multiple parameters (such as gender and age).
When weighting by multiple variables there are two options. The method described above can be used for weighting multiple variables where there is population data available that takes all variables into account (for example age broken down by gender). This is a relatively straightforward procedure which generally produces good results, however, it also requires that population data is available for each cell (e.g. 16-24 year old men). If there is one dataset with the population age distribution, and another dataset with the gender distribution, cell weighing cannot be used, because there is no information, in this example, of the age breakdown of men and women separately. This method also requires a relatively large sample as there must be a reasonable number of people in the sample in each sub cell
The alternative is known as raking, which is a process which iteratively calculates weights for each factor (e.g. age and gender) and converges upon an approximation of a cell weight. The advantages of this process are that one does not need highly interlocking Census data to calculate the weights and it can be used on smaller sample sizes.