Tag Archives: Howard Wainer

Can the Polls Be Wrong? Hillary is up 8%, No; Trump is up 1%, What’s Going On? Why Are the Polls Varying So Much?

With a week to go in the race to the White House the polls seem to be bouncing all over the place. Nate Silver at the fivethirtyeight blog predicting a narrowing but substantial Hillary lead,  The RealClearPolitics blog predicts a closer race with 149 electoral votes up for grabs.

Pollsters haven’t been doing too well this year – pollsters predicted a “yes” vote in the Brixet vote, the no’s won, in the Columbia FARC plebiscite, once again, the pollster predict “yes, the vote came out “no.”

I owe the following discussion to Howard Wainer,  Distinguished Research Scientist, National Board of Medical Examiners:

Pollsters identify a pool, a subset that reflects the larger population to be polled. We used to call the subset a stratified, random sample, a microcosm of the total population to be polled. The issue is the nonresponse rate which is gigantic. In a world of cell phones, potential responders can easily choose whether or not to answer a call. The nonresponse rate erodes the accuracy of the poll.

A group of physicists at The City College have developed an alternative method of predicting elections using Twitter data.

[CCNY physicists} have developed analytic tools combining statistical physics of complex networks, percolation theory, natural language processing and machine learning classification to infer the opinion of Twitter users regarding the Presidential candidates this year.

“Forecasting opinion trends from real-time social media is the long-standing goal of modern-day big-data analytics,” said Makse, a Fellow of the American Physical Society. “Despite its importance, there has been no conclusive scientific evidence so far that social media activity can capture the opinion of the general population at large.”

However, by using a large-scale dataset of 73 million tweets collected from June 1 to September 1, 2016, Makse and his associates are able to investigate the temporal social networks formed by the interactions among Twitter users.

Read the article with links to the research here: https://www.ccny.cuny.edu/news/ccny-team-develops-analytics-predict-poll-trends

Pollsters are increasingly turning to what statisticians call covariates, Wainer writes,

 A more promising approach (using covariates but a different matching variable) uses Nielson ratings, which are not self-selected and are well documented to accurately depict viewing habits. And then tying viewing habits to voting choices in previous elections 2012, 2008, etc. After building the model from such data they use the current viewing habits to predict 2016. So the idea is that if the viewership is growing monstrous for Duck Dynasty, Hillary ought to watch out, whereas if there are big jumps for McNeil-Lehrer (or whatever it is called now) Trump should worry.

Wainer continues,

You get the idea — the point of polls is to use the outcome of polls to predict the outcome we care about. But if polls are unreliable we must find more reliable (but still efficacious) predictors. Perhaps tweets help, but there are other options. In the future, if people continue to not answer phones, these alternative approaches will become the norm.

Traditional polling is increasingly shaky, you glance at your phone, if you can’t identify the number you ignore it, if it is an 800 or an 888 number you ignore it. Pollsters are dependent on responses, who answers the phone?  Older voters with more time? Who doesn’t answer the phone? Have you programmed your phone to only accept specific numbers?  If non-responses are gigantic traditional telephone-based polling is both inaccurate, and, not the best way to predict outcomes.

Yes, Twitter or Nielson or Facebook may provide better ways of predicting outcomes.

Wainer concludes,

Although it is well known that being a statistician means never having to say you’re certain (nothing in life is ever better than 3 to 1), I feel safe in betting the farm on Hillary (regardless of the release of emails). And also a Democratic Senate.

VAM, the Lederman Decision and the Misuse of Statistical Tools. “Gut versus Actual Evidence.”

What if the educators making important decisions about schools and colleges are acting too much on their guts and not enough based on actual evidence? (Review of Howard Wainer, “Uneducated Guesses: Using Evidence to Uncover Misguided Education Policies,” 2011)

Back in my union rep days I occasionally represented members in interest arbitrations, claims of violations of the agreement. The Board fired a paraprofessional claiming he had assisted students; cheating thorough the erasure of incorrect answers and using expert testimony explaining how software was used to analyze the erasures. I scrambled to find my own expert. I worried that the technical evidence would be too dense; however, the arbitrator had a background in math and economics and not only understood the testimony he asked numerous questions of the expert witnesses.

A few months later: I won the case; I was ecstatic, the inappropriate use of the erasure analysis software would be barred.

While the arbitrator found the use of the software was not “persuasive;” he sustained our case writing the Board failed to reach their burden of proof. It was a victory, a narrow victory that did not resolve the question of the misuse of the software.

A couple of years ago Sheri Lederman, a teacher on Long Island received an “ineffective” rating on the Value-Added Measurement (VAM) side of the teacher evaluation metric. The appellants introduced evidence from numerous experts all challenging the use of VAM to assess individual teachers.

In a narrowly worded decision a New York State Supreme Court judge overturned the “ineffective” rating of the teacher ruling that use of Value Added Measurement for the appellant in the instant case was “arbitrary and capricious,” No precedent was set.,

Read the Lederman filing here: http://www.capitalnewyork.com/sites/default/files/Sheri%20Aff%20Final.pdf

Read an excellent analysis here: https://www.the74million.org/article/ny-teacher-wins-court-case-against-states-evaluation-system-but-she-may-appeal-to-set-wider-precedent

In 2010 the New Teacher Project (TNTP), an advocacy organization firmly embedded in the (de)form side of the aisle issued a report – a survey of school districts across a number of states, the findings,

  • All teachers are rated good or great. Less than 1 percent of teachers receive unsatisfactory ratings, making it impossible to identify truly exceptional teachers.
  • Professional development is inadequate. Almost 3 in 4 teachers did not receive any specific feedback on improving their performance in their last evaluation.
  • Novice teachers are neglected. Low expectations for beginning teachers translate into benign neglect in the classroom and a toothless tenure process.
  • Poor performance goes unaddressed. Half of the districts studied have not dismissed a single tenured teacher for poor performance in the past five years.

Six years later New York State is working on Teacher Evaluation 4.0, and, we are in the first year of a four year moratorium on the use of grade 3-8 standardized test scores to assess teachers.

Value-Added Models also referred to as Growth Scores; attempts to compare teachers from around state teaching similar students. A dense mathematical algorithm incorporates a variety of variables and generates a numerical score for each teacher. For example, a fourth grade teacher is compared to other fourth grade teachers across the state taking into account percentages of students she teaches who are Title 1 eligible, students with IEPs, English Language Learners, by gender and perhaps other variables. The criticism is the use of the formula to assess individual teachers: the experts aver the scores are “unreliable,” large errors of measurement,  i. e., plus or minus five or ten or fifteen percent, and the scores are “unstable,” teacher scores vary widely from year to year.

The use of value-added measurements to assess individual teachers has been pilloried by experts.

The New York State Learning Summit  brought together experts from across the country – they were sharply critical of the use of VAM to assess individual teachers.

Howard Wainer, a statistician with decades of experiences and published articles has been a harsh critic of the misuse of statistical tools,

Ideas whose worth diminishes with data and thought are too frequently offered as the only way to do things. Promulgators of these ideas either did not look for data to test their ideas, or worse, actively avoided considering evidence that might discredit them.

The issue is not the mathematical model; the issue is how the model is used. If a particular teacher over a number of years consistently receives high scores it is worthwhile to ask: what is that particular teacher doing? What instructional practices is the teacher utilizing? Can these practices be isolated and taught to prospective teachers in college teacher preparation programs? In school and district-based professional development? Or, are these practices unique to the individual teacher?  Is there a “teaching gene,” an innate quality that resonates with students?

Sadly, VAM has been misused in spite of the evidence that discredits the use of the tool to assess individual teachers

Six years after the Widget Report, a report that bemoaned that only 1 percent of teachers were rated unsatisfactory, six years into the use of student achievement data using dense mathematical prestidigitation we find that 1 percent of teachers are found “ineffective.”

Millions of dollars and endless conflicts and the percentage of teachers found unsatisfactory remain at 1 percent!

Insanity: doing the same thing over and over again and expecting a different result

In New York State we are in year one of a four-year moratorium on the use of grade 3-8 student test scores to evaluate teachers.

How should management evaluate teacher competence?

“One size fits all” fits no one.

The state should explore a menu of choices to fit the many differences among the 700 school districts in the state.