What if the educators making important decisions about schools and colleges are acting too much on their guts and not enough based on actual evidence? (Review of Howard Wainer, “Uneducated Guesses: Using Evidence to Uncover Misguided Education Policies,” 2011)
Back in my union rep days I occasionally represented members in interest arbitrations, claims of violations of the agreement. The Board fired a paraprofessional claiming he had assisted students; cheating thorough the erasure of incorrect answers and using expert testimony explaining how software was used to analyze the erasures. I scrambled to find my own expert. I worried that the technical evidence would be too dense; however, the arbitrator had a background in math and economics and not only understood the testimony he asked numerous questions of the expert witnesses.
A few months later: I won the case; I was ecstatic, the inappropriate use of the erasure analysis software would be barred.
While the arbitrator found the use of the software was not “persuasive;” he sustained our case writing the Board failed to reach their burden of proof. It was a victory, a narrow victory that did not resolve the question of the misuse of the software.
A couple of years ago Sheri Lederman, a teacher on Long Island received an “ineffective” rating on the Value-Added Measurement (VAM) side of the teacher evaluation metric. The appellants introduced evidence from numerous experts all challenging the use of VAM to assess individual teachers.
In a narrowly worded decision a New York State Supreme Court judge overturned the “ineffective” rating of the teacher ruling that use of Value Added Measurement for the appellant in the instant case was “arbitrary and capricious,” No precedent was set.,
Read the Lederman filing here: http://www.capitalnewyork.com/sites/default/files/Sheri%20Aff%20Final.pdf
In 2010 the New Teacher Project (TNTP), an advocacy organization firmly embedded in the (de)form side of the aisle issued a report – a survey of school districts across a number of states, the findings,
- All teachers are rated good or great. Less than 1 percent of teachers receive unsatisfactory ratings, making it impossible to identify truly exceptional teachers.
- Professional development is inadequate. Almost 3 in 4 teachers did not receive any specific feedback on improving their performance in their last evaluation.
- Novice teachers are neglected. Low expectations for beginning teachers translate into benign neglect in the classroom and a toothless tenure process.
- Poor performance goes unaddressed. Half of the districts studied have not dismissed a single tenured teacher for poor performance in the past five years.
Six years later New York State is working on Teacher Evaluation 4.0, and, we are in the first year of a four year moratorium on the use of grade 3-8 standardized test scores to assess teachers.
Value-Added Models also referred to as Growth Scores; attempts to compare teachers from around state teaching similar students. A dense mathematical algorithm incorporates a variety of variables and generates a numerical score for each teacher. For example, a fourth grade teacher is compared to other fourth grade teachers across the state taking into account percentages of students she teaches who are Title 1 eligible, students with IEPs, English Language Learners, by gender and perhaps other variables. The criticism is the use of the formula to assess individual teachers: the experts aver the scores are “unreliable,” large errors of measurement, i. e., plus or minus five or ten or fifteen percent, and the scores are “unstable,” teacher scores vary widely from year to year.
The use of value-added measurements to assess individual teachers has been pilloried by experts.
The New York State Learning Summit brought together experts from across the country – they were sharply critical of the use of VAM to assess individual teachers.
Howard Wainer, a statistician with decades of experiences and published articles has been a harsh critic of the misuse of statistical tools,
Ideas whose worth diminishes with data and thought are too frequently offered as the only way to do things. Promulgators of these ideas either did not look for data to test their ideas, or worse, actively avoided considering evidence that might discredit them.
The issue is not the mathematical model; the issue is how the model is used. If a particular teacher over a number of years consistently receives high scores it is worthwhile to ask: what is that particular teacher doing? What instructional practices is the teacher utilizing? Can these practices be isolated and taught to prospective teachers in college teacher preparation programs? In school and district-based professional development? Or, are these practices unique to the individual teacher? Is there a “teaching gene,” an innate quality that resonates with students?
Sadly, VAM has been misused in spite of the evidence that discredits the use of the tool to assess individual teachers
Six years after the Widget Report, a report that bemoaned that only 1 percent of teachers were rated unsatisfactory, six years into the use of student achievement data using dense mathematical prestidigitation we find that 1 percent of teachers are found “ineffective.”
Millions of dollars and endless conflicts and the percentage of teachers found unsatisfactory remain at 1 percent!
Insanity: doing the same thing over and over again and expecting a different result
In New York State we are in year one of a four-year moratorium on the use of grade 3-8 student test scores to evaluate teachers.
How should management evaluate teacher competence?
“One size fits all” fits no one.
The state should explore a menu of choices to fit the many differences among the 700 school districts in the state.