Predicting Politics: How Forecasters Understand the US Presidential Election

Abstract

*In 2016, the SOL program pivoted for the year to focus on political engagement in what was then called the “Political Engagement Pilot Project,” or PEPP. This was an alternative version of SOL that laid the groundwork for the development of the PEP program as it currently exists.

After Franklin Delano Roosevelt’s 1932 election to the presidency of the United States, The Literary Digest was veritably on a roll. The popular weekly magazine had correctly predicted the past five presidential elections, from Woodrow Wilson’s victory in 1916 onward through 1932. Its approach was simple: poll as many people as possible before November, then report their responses directly. Carrying this successful strategy forward to the 1936 election, the magazine conducted a survey of 2.4 million Americans – fully 5% of the number of people who would later cast votes for a candidate.2 When all of the responses had been tallied, the outcome was clear: with 57% of the popular vote and 370 electoral votes, Republican Alfred Landon would be the next president of the United States. Of course, this was not the outcome of the election. President Roosevelt won every state except Maine and Vermont, giving him 523 electoral votes to Landon’s 8. But The Literary Digest was confident in their conclusion, and the poll persuaded many others, including the chair of the Democratic National Committee himself, who remarked, “Any sane person cannot escape the implication of such a gigantic sampling of popular opinion as is embraced in The Literary Digest straw vote.” It would have indeed seemed unreasonable to doubt the prediction given the magazine’s lengthy record of success and impressive approach.

There was someone who doubted the prediction, however. In fact, George Gallup entirely contradicted it, issuing his own prediction that Roosevelt would win based on a poll his American Institute of Public Opinion carried out. Their sample size was 50,000 people, or about 2% of The Literary Digest’s. Gallup’s accurate prediction marked the beginning of a new era of public polling, and he demonstrated some crucial lessons in forecasting, one of the most important being that past success does not guarantee future success. The Literary Digest predicted five straight presidential elections, but a scrutinizing examination of their methodology would have shown that their forecasts were driven largely by luck, as George Gallup realized. Two major flaws beset their approach. First, their sample was not representative of the American electorate. Their mailing list was drawn from automobile registration records and phonebooks, which would have consisted of Americans who were wealthier than average, since they owned either cars or telephones despite the Great Depression. Many Americans who were not so well-off would have never been contacted for their voting intentions, and if this segment of the electorate preferred Roosevelt to Landon, the poll would fail to indicate so. Second, their sample suffered from non-response bias. The magazine originally mailed out 10 million questionnaires, but only 2.4 million of these were returned. A low response rate isn’t necessarily a problem, but if the response rate is a consequence of some peculiar characteristic of the respondents, then it is a significant problem. The personal cost assumed by taking the time to fill out and mail a political survey, for example, could skew the sample toward the disproportionately civically engaged, as these are the people who might be sufficiently willing to fill out and mail political surveys, and if the disproportionately politically engaged tend to support one candidate over another, bias can be introduced into the sample. Countering these challenges of representation and response-bias was essential for Gallup’s own success, and in doing so, he illustrated another lesson in forecasting, that data, i.e. information of any kind, should very rarely be ignored. Accompanying the response to every poll are some supremely relevant bits of information, namely the fact that the respondent answered the poll at all and the story of how their contact information found its way into the pollster’s hands in the first place.

Yet while Gallup had honed the practice of polling, he had not perfected it. 80 years on from the 1936 election, public polls once again failed to indicate the outcome of the presidential election when Donald Trump was elected president with 306 electoral votes to Hillary Clinton’s 232. Nationally and in a healthy number of states, Clinton had lead Trump in the polls for the preponderance of the election season, and she maintained her advantage into Election Day. Real Clear Politics’ polling average for the week from November 1st to 7th showed Clinton leading Trump nationally by over 3 points. But come the 8th, Trump would overtake her in several key states. In Wisconsin, for instance, polls overestimated Clinton’s support by 6 points, putting Clinton well ahead even as Trump would win the state. The final electoral count was a surprise to many, even within Trump’s camp. On election night, one internal advisor to Trump confided to a CNN reporter, “It would take a miracle for us to win.” And though no polling error in the 2016 election was as severe as The Literary Digest’s 80 years prior, the outcome may still be seen to dispel any accrued credence in public polls or those civilians, pundits, and journalists who read them as signals of the United States’ political future. A more pessimistic interpretation than this would suggest that the political tides are beyond the forecaster’s understanding; the future of the presidency won’t submit to prediction.

The remainder of this paper is devoted to the contrary interpretation. Polling provides valuable information for forecasting the presidential election. Polling is also only one of six well-established approaches to predicting the presidential election, the other five being econometric models, index models, prediction markets, collective expert judgment, and direct citizen forecasts. One overarching strategy, pioneered by the researchers behind the PollyVote model, combines information from each of these six methods to yield what could reasonably be judged as the best possible approach to forecasting the election, and the predictions from this strategy, though flawed for 2016, have historically been close to the mark. Each of these six methods and the combinatorial strategy will be elaborated. Of similar importance to the problem of prediction too are questions of inference, questions about the actual functioning of politics, which, importantly, predictive models can lend some insight to. The political tides are not only within the intellectual domain of prediction. This domain reflects some knowledge of the forces driving politics.