VAM, the Lederman Decision and the Misuse of Statistical Tools. “Gut versus Actual Evidence.”

What if the educators making important decisions about schools and colleges are acting too much on their guts and not enough based on actual evidence? (Review of Howard Wainer, “Uneducated Guesses: Using Evidence to Uncover Misguided Education Policies,” 2011)

Back in my union rep days I occasionally represented members in interest arbitrations, claims of violations of the agreement. The Board fired a paraprofessional claiming he had assisted students; cheating thorough the erasure of incorrect answers and using expert testimony explaining how software was used to analyze the erasures. I scrambled to find my own expert. I worried that the technical evidence would be too dense; however, the arbitrator had a background in math and economics and not only understood the testimony he asked numerous questions of the expert witnesses.

A few months later: I won the case; I was ecstatic, the inappropriate use of the erasure analysis software would be barred.

While the arbitrator found the use of the software was not “persuasive;” he sustained our case writing the Board failed to reach their burden of proof. It was a victory, a narrow victory that did not resolve the question of the misuse of the software.

A couple of years ago Sheri Lederman, a teacher on Long Island received an “ineffective” rating on the Value-Added Measurement (VAM) side of the teacher evaluation metric. The appellants introduced evidence from numerous experts all challenging the use of VAM to assess individual teachers.

In a narrowly worded decision a New York State Supreme Court judge overturned the “ineffective” rating of the teacher ruling that use of Value Added Measurement for the appellant in the instant case was “arbitrary and capricious,” No precedent was set.,

Read the Lederman filing here: http://www.capitalnewyork.com/sites/default/files/Sheri%20Aff%20Final.pdf

Read an excellent analysis here: https://www.the74million.org/article/ny-teacher-wins-court-case-against-states-evaluation-system-but-she-may-appeal-to-set-wider-precedent

In 2010 the New Teacher Project (TNTP), an advocacy organization firmly embedded in the (de)form side of the aisle issued a report – a survey of school districts across a number of states, the findings,

  • All teachers are rated good or great. Less than 1 percent of teachers receive unsatisfactory ratings, making it impossible to identify truly exceptional teachers.
  • Professional development is inadequate. Almost 3 in 4 teachers did not receive any specific feedback on improving their performance in their last evaluation.
  • Novice teachers are neglected. Low expectations for beginning teachers translate into benign neglect in the classroom and a toothless tenure process.
  • Poor performance goes unaddressed. Half of the districts studied have not dismissed a single tenured teacher for poor performance in the past five years.

Six years later New York State is working on Teacher Evaluation 4.0, and, we are in the first year of a four year moratorium on the use of grade 3-8 standardized test scores to assess teachers.

Value-Added Models also referred to as Growth Scores; attempts to compare teachers from around state teaching similar students. A dense mathematical algorithm incorporates a variety of variables and generates a numerical score for each teacher. For example, a fourth grade teacher is compared to other fourth grade teachers across the state taking into account percentages of students she teaches who are Title 1 eligible, students with IEPs, English Language Learners, by gender and perhaps other variables. The criticism is the use of the formula to assess individual teachers: the experts aver the scores are “unreliable,” large errors of measurement,  i. e., plus or minus five or ten or fifteen percent, and the scores are “unstable,” teacher scores vary widely from year to year.

The use of value-added measurements to assess individual teachers has been pilloried by experts.

The New York State Learning Summit  brought together experts from across the country – they were sharply critical of the use of VAM to assess individual teachers.

Howard Wainer, a statistician with decades of experiences and published articles has been a harsh critic of the misuse of statistical tools,

Ideas whose worth diminishes with data and thought are too frequently offered as the only way to do things. Promulgators of these ideas either did not look for data to test their ideas, or worse, actively avoided considering evidence that might discredit them.

The issue is not the mathematical model; the issue is how the model is used. If a particular teacher over a number of years consistently receives high scores it is worthwhile to ask: what is that particular teacher doing? What instructional practices is the teacher utilizing? Can these practices be isolated and taught to prospective teachers in college teacher preparation programs? In school and district-based professional development? Or, are these practices unique to the individual teacher?  Is there a “teaching gene,” an innate quality that resonates with students?

Sadly, VAM has been misused in spite of the evidence that discredits the use of the tool to assess individual teachers

Six years after the Widget Report, a report that bemoaned that only 1 percent of teachers were rated unsatisfactory, six years into the use of student achievement data using dense mathematical prestidigitation we find that 1 percent of teachers are found “ineffective.”

Millions of dollars and endless conflicts and the percentage of teachers found unsatisfactory remain at 1 percent!

Insanity: doing the same thing over and over again and expecting a different result

In New York State we are in year one of a four-year moratorium on the use of grade 3-8 student test scores to evaluate teachers.

How should management evaluate teacher competence?

“One size fits all” fits no one.

The state should explore a menu of choices to fit the many differences among the 700 school districts in the state.

Advertisements

4 responses to “VAM, the Lederman Decision and the Misuse of Statistical Tools. “Gut versus Actual Evidence.”

  1. Indeed, VAM/growth should not be used to judge individual teachers, on statistical grounds alone. But there are deeper reasons, which significantly also discredit its use for identifying ‘good practices’ that can be shared. The good practices mean good at ensuring consistently strong test-score gains; this in turn assumes that what the tests measure is important enough to devote significant time and resources for other teachers to learn about and in some what emulate. The assumption is not tenable. Yes, some of what the tests measure is important, valuable, but too much is not important enough and far too much that is important is not measured.

    Like

  2. BTW, for a short elaboration on my points above see http://fairtest.org/teacher-evaluation-fact-sheet.

    Like

  3. Retired- no more public enemy number one.

    The reader is invited to consider the performance model shown in the Dilbert comic strip for May 8, 2016.

    http://dilbert.com/strip/2016-05-08

    It’s just as good as the expensive ones.

    Like

  4. REALITY TOO COMPLEX: The complaint that not enough teachers are rated ineffective is a matter of relativity.

    In struggling inner city schools, there are pronounced teacher shortages. Getting rid of an imperfect teacher is not a good idea if the replacement is likely to be even worse.

    VAM, aside from mistakenly identifying top teachers as ineffective, seeks to standardize quality ratings across entire states, regardless of factors that go way beyond poverty, language and disability status.

    For example, NYC has suburbs where families hire private tutors in massive numbers. There is no mechanism to capture this, making VAM a comparison of apples and oranges. The list of other excluded factors is as long and complex as reality itself.

    When a teacher meets her new class each September, she has no idea what to expect. The idea that we can make assumptions and categorize students into assembly line models is dismissive of best practices in education – and reality itself.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s