Three best practices for survey research from the 2020 election

Joshua Wu, PhD
4 min readNov 11, 2020

As we conclude this election cycle, pundits and clients alike are questioning what went wrong with the polls. While there are some possible explanations, though perhaps the final error may be on par with historical performance, this scrutiny of polling and results illustrate three best practices we should adopt so that we are able to better design, analyze, and present primary research to clients and stakeholders.

Photo

First, we need to define our sample audience to best match the population of interest.

Election polls take a sample of likely, registered, and non-voters to extrapolate the vote choice of those most likely to vote. Defining the population of voters is challenging as the best available data is always from the last election. Misalignment between expectations of likely voters sampled and those who actually voted can lead to errors in polling. For example, in 2016, many polls undercounted the proportion of white non-college educated voters, thereby overstating support for Clinton.

Thus, we must be explicit in defining our audience to ensure alignment between the sample of respondents we recruit and the population we are inferring results to. For example, if surveying likely consumers, make sure that the qualification criteria match the profile of those who shop in that category and behavioral attitudes match existing research about shoppers. Or if we are sampling donors from a client member list, we may need to weight responses to ensure that the donors who respond match the overall composition of the donor population.

Second, we need to be transparent in communicating the assumptions we make in defining and calibrating findings from our sample to the population of interest.

While pollsters rarely explain their proprietary definitions of likely voters that produce differential “house effects”, we must explain to clients and stakeholders the audience definitions we tailor to their specific needs and objectives. It’s not simply enough to say that surveyed business elites or likely advocates — we need to justify why the sample was defined this way, how specific qualification criteria screens out non-relevant respondents, and how our method of recruitment ensures a representative sample of the population of interest.

Effective communication of audience assumptions enables us to be proactive in identifying potential challenges to study feasibility and providing solutions if there is unexpectedly low recruitment of key sub-audiences. Transparency about the design of our survey respondent frames can also lead to fruitful discussions about limitations in the result. When this discussion does not happen transparently, as is sometimes the case when polls are aggregated, there can be misunderstandings about the nature of our insights and misapplication of findings for unintended purposes. Instead, we should be explicit in explaining how the assumption we make about our sample are justified and best empower the identification of insights to answer key questions.

Third, we must be careful with presentation of preliminary data.

Clients sometimes ask for data previews while the survey is fielding or real-time dashboards to track responses as they come in. But we must be careful in sharing preliminary data as it can be misinterpreted to support narratives that are invalidated with final calibrated data. This election illustrates at least two of the main ways that preliminary data can be misleading.

Unique in this election where Democrats were more likely to vote by mail and Republicans more likely to vote in-person, there were “red mirages” in states where mail in ballots were counted after in-person votes and “blue mirages” in states where mail in ballots were reported first. This heterogeneity in state ballot counting laws meant that preliminary data had significant partisan skew and flipped the outcome of the race from preliminary to final tallies. The skewed collection and reporting of responses can also occur with our client work. Vendors sometimes prioritize recruitment of harder to reach sub-audiences first (younger and non-white respondents). This practice of targeted recruitment results in skewed preliminary data as respondents recruited first are not representative of the final sample.

Source

The inaccuracies of exit polls also remind us how we need to be careful about our presentation of preliminary data. Despite the “warning labels” found on exit polls, too many pundits and partisans use them to justify narratives, generalizations, and explanations that are sometimes found to be contradicted by final vote tallies and comprehensive studies of representative voter responses. This reminds us that before we present survey results, data often need to be calibrated with weights or other types of non-response statistical adjustments. As such, presentation of preliminary non-weighted survey data is at best an imprecise and at worse incorrect estimate of final results.

Despite the loud prognostications of many, this election cycle is not the death knell of survey research as we know it.

Errors were made and good postmortems will identify useful lessons that should improve polling performance at the next election. But if we apply the reminders discussed above, we can be more confident that we are appropriately leveraging insights from survey results to best inform and empower client strategy.

--

--