Teacher Evalution: Firing to Excellence is Faux Reform, Consistent, On-Going Assessment by Skilled Observers Engenders Excellence.

The State Education Department released the second year of teacher assessment scores, the first year for New York City teachers. The anti-teacher “fire to excellence” (de)former forces slammed the scores, teacher unions gave a modified approval and others appeared to be waiting for the governor’s state of the state message.

The New York Times reports,

In the city, only 9 percent of teachers received the highest rating, “highly effective,” compared with 58 percent in the rest of the state. Seven percent of teachers in the city received the second-lowest rating — “developing” — while 1.2 percent received the lowest rating, “ineffective.” In the rest of the state, the comparable figures were 2 percent and 0.4 percent.

Gov. Andrew M. Cuomo has said he wants to strengthen the evaluation system … a spokeswoman said, “As the governor previously stated, stronger, more competitive, teacher evaluation standards will be a priority” for the next legislative session.

A State Education Department press release,

… similar to the first year, the vast majority of teachers and principals received a high performance rating. The preliminary results show more than 95 percent of teachers statewide are rated effective (53.7 percent) or highly effective (41.9 percent); 3.7 percent are rated as developing; approximately one percent are rated ineffective.

Chancellor Tisch, the leader of the Board of Regents does not seem to fully understand what the rating assess,

“The ratings show there’s much more work to do to strengthen the evaluation system,” Board of Regents Chancellor Merryl H. Tisch said. “There’s a real contrast between how our students are performing and how their teachers and principals are evaluated. The goal of the APPR process is to identify exceptional teachers who can serve as mentors and role models, and identify struggling teachers to make sure they get the help they need to improve. I don’t think we’ve reached that goal yet. The ratings from districts aren’t differentiating performance. We look forward to working with the Governor, Legislature, NYSUT, and other education stakeholders to strengthen the evaluation law in the coming legislative session to make it a more effective tool for professional development.”

Former Commissioner David Steiner, John King and UFT president Michael Mulgrew spent months working on a teacher evaluation plan that eventually was crafted into a statute. Commissioner King convened a technical working group that worked intensely for months to develop the final regulations and the 700 school districts in New York State negotiated plans within the regs to development district specific plans (The New York City plan was imposed by the commissioner after the union and the mayor could not agree upon a plan)..

The plans all divide into three sections: 20% of the teacher score is based on growth in student test scores, 20% on a locally negotiated metric and 60% on principal observations using an approved observation tool (New York City uses the Danielson Frameworks).

To compare student test scores on the new common core tests with teacher scores is comparing apples to oranges. The common core exams are new exams with a new baseline. The commissioner, through the standards-setting process, made an arbitrary decision, to set the “passing” grade (“proficiency”) at a high level. A few members of the Regents suggested that the “proficiency” level be phased in over a number of years, the commissioner decided to move from the former “proficiency” level to the new far higher level. Not surprisingly the 2/3 “proficiency” rates on the old exam flipped to 2/3 “below proficiency” rates on the new common core exams. The commissioner, by his actions, acknowledged his error, the new common core regents exams will have the grades phased in, what we used to call “scaling” the grades.

Governor Cuomo and Chancellor Tisch seem to be upset because the teacher scores do not match the student scores – there is absolutely no correlation between the scores.

1. Virtually every statistics expert agrees that student test scores should not be used for high stakes decisions. The American Statistical Association sharply criticizes the use of student test scores, called Value-Added Modeling (VAM).

* VAMs … do not do directly measure potential teacher contributions toward student outcomes.

* VAMs typically measure correlation, not causation: Effects – positive or negative attributed to a teacher may actually be caused by other factors that are not captured in the model.

* Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores.

2. The principal observation section, the 60% of the overall score, assumes some comparability in assessments, school districts can select from six different assessment rubrics, and, most importantly, observers view lessons differently from supervisor to supervisor, from school to school, from district to district, observation scores vary widely.

The Gates-funded MET Project report, “The Reliability of Classroom Observation by School Personnel,” (January, 2013) authored by Andrew Ho and Thomas Kane of the Harvard School of Education,” takes a deep dive into the world of teacher observations. The research,

…highlights the importance of involving multiple observers … [and] to provide better training and certification tests for prospective raters … the only way a district can monitor the reliability of classroom observation and ensure a fair and reliable system for teachers would be to use multiple observers and set up systems to check and compare feedback given to teachers by different observers,,, the element of surprise [unannounced observations] may not be necessary.

…classroom observations are not discerning large absolute differences in practice. The vast majority of teachers are in the middle of the scale with small differences in scores producing large changes in percentile ranks … the underlying practice on the existing scales does not vary that much.

The American Statistical Association and the Gates Foundation directly challenge the comments by the outgoing commissioner, the governor and the chancellor; to attempt to “contrast” teacher and student test scores are to completely misunderstand findings by recognized experts,

The purpose of a teacher assessment system must be to improve practice, not to assign a grade. The current system compares to teachers to teachers instead of teachers responding to observations by skilled observers. Teachers, from year to year, change the grades they teach, groups of student vary from year to year, trying to create a “growth model” is futile, and it fails the tests of “validity and reliability.” How do you “measure” teachers of students with disabilities? Can you measure the effectiveness of teachers in high functioning schools and school districts with teachers in low functioning schools and districts?

The “vast majority” of teachers, to use the Gate terminology, fall in the mid-range. If “correcting” the teacher evaluation metric means “failing” more teachers the entire highly flawed system will crumble.

We never master teaching, great teachers, athletes, musicians and artists practice, they strive to upgrade skills; whether the novice or the expert, guided practice is crucial.

Gates suggests multiple, highly trained observers, and I add breaking the isolation of teachers by creating teams of teachers which include peer review and peer exchanges

From acceptance to teacher preparation programs, to the content of the programs, to teacher preparation exit standards, to hiring, to supporting probationary teachers and the tenure-granting process to consistent, on-going professional development, we must strive to embed research-based policies with wide stakeholder input.

Unfortunately the ideology-driven “disruptive” reformers advocate for “market-driven” solutions: teacher against teacher, public school versus charter schools, “solutions” that fly in the face of valid and reliable research.

Shaming teachers is a ludicrous. The current policies are antithetical to positive change; the (de)formers have succeeded in uniting teachers against policies. We should create polices that attract teachers, that build communities of learners,

Perhaps the governor will have an apotheosis.


One response to “Teacher Evalution: Firing to Excellence is Faux Reform, Consistent, On-Going Assessment by Skilled Observers Engenders Excellence.

  1. Ed, this article could have used some proofreading. That said, it is right on point. The difference between teachers’ scores upstate and in New York City should come as no surprise. For one thing, classes in NYC have six more students than classes upstate, and for another NYC receives a much more varied cohort of students than many communities in the rest of the state do. These two things alone would skew the results. I agree with Charlotte Danielson that the purpose of teacher evaluation is professional development, not “gotcha.”


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s