We are addicted to predicting winners: at race tracks the betting public creates the odds for each horse in a race, every Sunday the odds makers in Las Vegas predict winners and the numbers of points by which teams will win based on previous records and a plethora of player related achievement numbers.
This is called gambling.
Data can be used for more respectable purposes, namely predicting winners in elections, another type of race, a political race, as well as predicting “success” in teaching by measuring increases in student achievement attributed to individual teachers.
Each day the New York Times online publishes odds, in the form of a percent, for the presidential election – on Sunday Hillary was “leading” Trump 90% to 10%, on Wednesday 88% to 12% percent. The section is called Upshot and the site explains the methodology. One of the sources is the Princeton Election Consortium and, if you want to get into the weeds, you can read about “symmetric random drift” and “setting a Bayesian prior,” probably well beyond the interest and knowledge of the vast percentage of “ordinary” folk.
The essential problems are the source data, the actual polling. Lo those many years ago we learned we had to create a stratified random sample, a microcosm of the population we wished to poll. An example is the upcoming September 13th Democratic primary election in the 65th Assembly District in Manhattan, the seat formerly held by Sheldon Silver, awaiting sentencing by the feds. There are six contenders for the seat, and a close look at the population in the district is revealing
Population figures, though, do not always translate into actual voters. According to 2014 census data, there were 32,952 Asian and South Asian citizens of voting age in the district. But only 15,284 were registered Democrats, said Jerry Skurnik, a partner at Prime New York, which compiles voter information. Of those, only 5,500 voted in the last three primaries.
Far fewer registered Hispanic and Portuguese Democrats voted in those three previous primaries, said Mr. Skurnik, who analyzed election data relating to social groups based on surnames. Of 11,675 registered voters, only 4,101 participated in a previous primary election, he said. Those of “European background,” including English, Irish, Italian and likely-to-be-Jewish voters, were the largest group, at 20,496 registered Democrats, with 8,205 showing up in previous primaries.
Randomly selecting names from census data is not a stratified random sample, selecting names from prime voter lists is a major step; however, how many potential prime voters don’t answer the phone and participate in the poll? Do the participants constitute a “stratified random sample?” I understand that fewer than 10% of those called actually respond to a polling call.
In June the United Kingdom (England, Scotland and Wales) voted in the Brexit election, an election to decide whether the UK would remain in the Common Market. Extensive polling revealed that the Brits would remain in the Common Market by a 52-48 vote, when the dust cleared the Brits voted to leave 52-48 – what went wrong?
An experienced pollster commented on “what went wrong.”
The difference between survey and election outcome can be broken down into five terms:
- Survey respondents not being a representative sample of potential voters (for whatever reason, Remain voters being more reachable or more likely to respond to the poll, compared to Leave voters);
- Survey responses being a poor measure of voting intentions (people saying Remain or Undecided even though it was likely they’d vote to leave);
- Shift in attitudes during the last days;
- Unpredicted patterns of voter turnout, with more voting than expected in areas and groups that were supporting Leave, and lower-than-expected turnout among Remain supporters.
- And, of course, sampling variability.
In spite of extensive polling by “the best and the brightest” the pollsters were off by four percent!!
Howard Wainer, a statistician with vast experience explains
… the response rate for virtually all of the polls ranges from 8 to 9 percent. Yes, more than 90% of those asked for their opinion hang-up. Do you know anyone who chooses to answer the phone? Who? Do you? Professional pollsters never talk about this because it means their paychecks.
The only way to use such polls is to make heroic assumptions — most commonly what is assumed is ‘ignorable nonresponse’ — that is that those who don’t respond are just the same as those who do — clearly nonsense.
Even such a sensible person as [pollster] Nate Silver has to make do with terrible information. Yes, drawing inferences from flawed data are usually better than doing it with no information at all, but it is hardly enough to keep from being terrified.
The one aspect of this in which I find some solace is that the polls may be self-fulfilling. This is seen in the shrinkage of donations to Republicans.
Although it is an unintended consequence polling results influence voters – polls discourage voters who are on the trailing side and impact voters who want to be on the winning side – the band wagon effect.
The only absolute winners are the pollsters who receive fees for parsing out the results.
Attempts to use dense mathematical algorithms to assess teacher performance face the same core issue. Value Added Measurement (VAM) purports to compare teachers who are teaching similar students, i.e., Title 1, English language learners, special education, etc. The formula creates a score for each teacher on a 1 – 100 scale so that teachers can be compared. The problem is not the dense formula – the issue is that teachers teach different students each year and the VAM scores have high errors of measurement that swing widely from year to year. A score with an error of measurement of plus or minus fifteen percent means the teacher score falls with a thirty point range. The following year the score may be substantially higher or lower and the entire system is predicated on student tests that may be fatally flawed.
If the stratified random sample is flawed or the test is flawed all conclusions emanating are flawed.
The other method of assessing teacher performance is supervisory observations, which may be helpful in improving teacher performance; however, have no inter rater reliability.
An irony is that there are numerous examples of low scores from supervisors and considerably higher VAM scores. VAM scores, although deeply flawed, in many cases protect teachers from low observational scores that may be biased.
Polls are a photograph, a moment in time based on available data that might very well be flawed or change dramatically in the days or hours before the “final” poll, the election.
Value Added Measurements have enriched testing companies, confused and angered teachers and parents and created a Quixote quest (“…revive chivalry, undo wrongs, and bring justice to the world”) that is impossible to fulfill.
We are gullible and accept complex formula as truth. If an explanation is filled with obtuse Greek letters and symbols it must be accurate.
Australia has compulsory voting, polling is probably far more accurate, in the United States local voting participation is commonly below 50%, and the voters vary from election to election. The only accurate poll is the election.
If teachers taught the same students every year and the tests met statistical standards of validity, reliability and stability the VAM scores might be reasonably accurate.
Bottom line: polling is an informed guess and VAM scores are of little value.