With a week to go in the race to the White House the polls seem to be bouncing all over the place. Nate Silver at the fivethirtyeight blog predicting a narrowing but substantial Hillary lead, The RealClearPolitics blog predicts a closer race with 149 electoral votes up for grabs.
Pollsters haven’t been doing too well this year – pollsters predicted a “yes” vote in the Brixet vote, the no’s won, in the Columbia FARC plebiscite, once again, the pollster predict “yes, the vote came out “no.”
I owe the following discussion to Howard Wainer, Distinguished Research Scientist, National Board of Medical Examiners:
Pollsters identify a pool, a subset that reflects the larger population to be polled. We used to call the subset a stratified, random sample, a microcosm of the total population to be polled. The issue is the nonresponse rate which is gigantic. In a world of cell phones, potential responders can easily choose whether or not to answer a call. The nonresponse rate erodes the accuracy of the poll.
A group of physicists at The City College have developed an alternative method of predicting elections using Twitter data.
[CCNY physicists} have developed analytic tools combining statistical physics of complex networks, percolation theory, natural language processing and machine learning classification to infer the opinion of Twitter users regarding the Presidential candidates this year.
“Forecasting opinion trends from real-time social media is the long-standing goal of modern-day big-data analytics,” said Makse, a Fellow of the American Physical Society. “Despite its importance, there has been no conclusive scientific evidence so far that social media activity can capture the opinion of the general population at large.”
However, by using a large-scale dataset of 73 million tweets collected from June 1 to September 1, 2016, Makse and his associates are able to investigate the temporal social networks formed by the interactions among Twitter users.
Read the article with links to the research here: https://www.ccny.cuny.edu/news/ccny-team-develops-analytics-predict-poll-trends
Pollsters are increasingly turning to what statisticians call covariates, Wainer writes,
A more promising approach (using covariates but a different matching variable) uses Nielson ratings, which are not self-selected and are well documented to accurately depict viewing habits. And then tying viewing habits to voting choices in previous elections 2012, 2008, etc. After building the model from such data they use the current viewing habits to predict 2016. So the idea is that if the viewership is growing monstrous for Duck Dynasty, Hillary ought to watch out, whereas if there are big jumps for McNeil-Lehrer (or whatever it is called now) Trump should worry.
You get the idea — the point of polls is to use the outcome of polls to predict the outcome we care about. But if polls are unreliable we must find more reliable (but still efficacious) predictors. Perhaps tweets help, but there are other options. In the future, if people continue to not answer phones, these alternative approaches will become the norm.
Traditional polling is increasingly shaky, you glance at your phone, if you can’t identify the number you ignore it, if it is an 800 or an 888 number you ignore it. Pollsters are dependent on responses, who answers the phone? Older voters with more time? Who doesn’t answer the phone? Have you programmed your phone to only accept specific numbers? If non-responses are gigantic traditional telephone-based polling is both inaccurate, and, not the best way to predict outcomes.
Yes, Twitter or Nielson or Facebook may provide better ways of predicting outcomes.
Although it is well known that being a statistician means never having to say you’re certain (nothing in life is ever better than 3 to 1), I feel safe in betting the farm on Hillary (regardless of the release of emails). And also a Democratic Senate.