Election forecasts are no crystal ball, data scientist explains

In the 2016 Democratic primary in Michigan, Hillary Clinton surprised many by losing to Bernie Sanders.

Jody Heck Wortman, a data scientist for the Democratic National Committee, brought a new perspective to one of the biggest election forecasting mistakes in the 2016 election during a talk Wednesday afternoon. 

She said that Sanders' whopping 20-point win should not have been viewed as unexpectedly as it was.

“You want to use forecasting in a way that actually makes a difference, either by making your voters and constituents more happy, or convincing more people to vote for you," Wortman said. "Or, by convincing people who would vote for you to actually get out instead of staying at home.” 

Through the use of original simulations and the Michigan primary as a case study, she explained how forecasters approach election analysis, especially how sources of bias can infiltrate these predictions. 

Various forms of bias can infiltrate surveys that rely on phone calls or web access.

However, other pollsters use advances in machine learning like multilevel regression. These approaches take demographic data to predict how a certain block of people will vote. Since people tend to vote similarly to others who share demographic factors, like race or education level, this method is favored in smaller elections.

Although demographic data can be telling, she added that the issues associated with record linkage and the fight for privacy don’t guarantee certainty. Record linkage uses machine learning to find records of a voter who may have moved states or is listed under a different name, altering polling centers’ databases of voters. 

“The more local you get, the smaller 'n' you have, so people use multilevel regression,” Wortman said. 

Although theoretically better for eliminating statistical bias, Wortman said that a larger sample size does not always protect against error. This is because a method's own sources of bias can continue on a larger scale. 

"The actual margin of error that’s listed on a poll will almost always just be from the sample size," Wortman said. "The problem is that while it’s really easy to measure, it’s usually totally underestimating the actual uncertainty in a poll.” 

Wortman then turned to a simulation using beta distributions to show attempts to predict how changes in voter turnout, especially for undecided voters, can affect the margins of error in election forecasting. 

She emphasized that latency in certain demographic groups can greatly impact how far forecasts are from actual results, like in the recent special election for the Alabama senate seat.

Voter turnout itself is unpredictable and hard to model for people doing election forecasting, even with models like Wortman’s. 

“You also need to know whether people are going to vote because oftentimes, you’ll have the overall registered to vote population," she said. "Or, the overall population will think one thing but only 70 percent of those people will be registered to vote and only 50 percent of those people will actually vote."

Discussion

Share and discuss “Election forecasts are no crystal ball, data scientist explains” on social media.