Sampling is far from a perfect science. Think of the polls that have sometimes struggled to predict the outcome of elections around the world in recent years. Although these pollsters strive for accuracy, the reality is it's extremely difficult to find samples that accurately match the age, gender, race, geography, and political affiliation of the wider voting population.

Whether it's budget constraints or an underwhelming number of responses, the reality is that collecting data in the real world presents unavoidable complexities. This often results in one group being under-represented while another is over-represented.

One way to correct non-representative samples is with post-stratification.

What is Post-Stratification?

Post-stratification is a statistical technique used to improve how well the survey matches the real world. Unlike stratified sampling, where groups are defined before sampling, post-stratification adjusts the survey results to match what we know about the population, such as their age, gender, or education. This helps correct for nonresponse bias and under-represented groups, ultimately improving the accuracy of the final results.

The process of post-stratification falls into the wider concept of weighting survey data, wherein we assign values to responses to improve the statistical accuracy and validity of survey estimates. Nearly all commercial applications of weighting involve post-stratification.

How Post-Stratification Improves Accuracy

To use the example of election polling again, think of a two-party preferred poll. The study asks 1,000 people about their voting preferences. The population is known to be 50% male and 50% female. Only 40% of respondents in the sample are male, while 60% are female.

This can be corrected using post-stratification. Since men were underrepresented, we give their answers more weight. Since women were overrepresented, we give their answers less weight:

  • Each male respondent gets a weight of 50% / 40% = 1.25
  • Each female respondent gets a weight of 50% / 60% = 0.83

Applying these weights ensures that the sample better reflects the actual population distribution, leading to more accurate overall estimates.

Limitations

One of the key drawbacks of post-stratification is that it relies on accurate population benchmarks and sufficient sample sizes. Although census data is readily available, it often lacks the granularity required to perform post-stratification successfully.

Additionally, it works best with a limited number of variables—if too many different groups are defined, sample sizes within each group may become too small for reliable estimates.

Multilevel Regression with Post-Stratification

One way to bypass insufficient population data is with multilevel regression and poststratification - otherwise known as MRP/Mister P. For this reason, MRP is a popular option for studies demanding small-area estimates.

Multilevel regression is a statistical method that considers both individual data and group data. This method helps us understand differences between areas and allows us to create estimates even where we only have a few survey responses.

Then, these predictions undergo post-stratification, weighted to match the real population, fixing any problems with the original survey. Finally, weighted predictions are aggregated for overall or subgroup estimates.

MRP provides accurate results for complicated surveys and small areas. As surveys become more complex, MRP is essential for strong research.

Easily Weight Your Survey Data with Displayr 

No more manual calculations—Displayr makes survey weighting seamless with cell weighting, rim weighting, calibration, and capping all built-in. Save time and ensure accurate results with just a few clicks.

Try it free today.