Ever play “hot potato”?
♫ One potato
Maybe its appropriate that the state legislature and the governor are playing a children’s game?
The governor’s blustering and threatening resulted in a cyclonic backlash; from the teachers to parents to electeds with his approval rating in free fall. Others new plans, a “matrix,” as convoluted as the movie, a committee, each suggestion was met with suspicion, the “hot potato” was bouncing from the governor to the Senate to the Assembly, no one wanted it, it was too politically hot.
News reports and the rumor mill claim the budget process will return teacher evaluation to the Board of Regents to craft a plan and return to the legislature for action by June 1. Let me underline, I have not seen a bill, just reporting based on news reports.
The entire teacher evaluation catastrophe seems beyond redemption. The heart of the argument is tying teacher effectiveness to test scores, using a dense algorithm so that teachers teaching similar students are compared to each other; the statistical term is value-added modeling (VAM). All the experts agree that while the data is interesting, especially over time, it should not be used for high stakes decisions, like firing and promotion, it’s too unstable; however, arguing that teachers are totally responsible for test scores is a simple answer to a complex issue has swept from state to state.
The Race to the Top (RttT) application required a teacher evaluation plan, a multiple measures plan incorporating student test scores (VAM).
The pushback has been unabated, with examples of teachers rated highly effective by the principal and ineffective by the VAM algorithm, and a few the reverse.
To further confuse, about 75% of teachers teach non-tested subjects or classes, how do you use student data to assess the 75%? The Measures of Student Learning (MOSL) vary from school to school from school district to school district.
The New York State plan calls for 20% assessment by student test scores: 20% by a locally negotiated metric and 60% by supervisory observations using one of six approved rubrics.
When the dust settled after year one 51% of teachers were rated “highly effective” and 1% “ineffective.”
In the days before Charlotte Danielson became an iconic name, I met with Charlotte and about twenty principals. At the end of the session one principal proudly proclaimed, “In my school every teacher will be highly effective.” Danielson shook her head, “You’re lucky if a teacher is highly effective occasionally during a single lesson.”
BTW, Danielson emphasized that her Frameworks were a professional development tool not an evaluative tool and trashed the use of student test scores as an evaluative tool.
A closer look at the scores across the state is disturbing. In many districts every teacher received very high observation scores: can all 200 teachers in a district be highly effective? Teachers teaching special education, English language learners, and very high poverty kids tended to get lower scores. Do we attract less competent teachers or is the algorithm flawed? Some teachers (art, music, physical education) are rated based upon the school-wide ELA and/or Math scores, does that make any sense?
How are the seventeen members of the Board of Regents going to “correct” the current system in 60 days?
In a handful of schools teachers play a role in assessing colleagues: peer assessment is commonplace in other professions. Should we include teachers, colleagues in the same school, on teams with principals? Should we use teachers from other schools?
A paper from the Chicago Consortium on School Research published in Education Next, Does Better Observation Make Better Teachers, November 2014, assesses a teacher observation experiment in Chicago,
The principals’ role evolved from pure evaluation to a dual role in which, by incorporating instructional coaching, the principal served as both evaluator and formative assessor of a teacher’s instructional practice. It seems reasonable to expect that more-able principals could make this transition more effectively than less-able principals. A very similar argument can be made for the demands that the new evaluation process placed on teachers. More-capable teachers are likely more able to incorporate principal feedback and assessment into their instructional practice.
Our results indicate that while the pilot evaluation system led to large short-term, positive effects on school reading performance, these effects were concentrated in schools that, on average, served higher-achieving and less-disadvantaged students. For high-poverty schools, the effect of the pilot is basically zero.
In another study, “Teacher Dismissal Under New Evaluation System” (Grover Whitehurst and Katherine Lindquist), published also in Education Next sees “troublesome” flaws in observational teacher evaluation systems,
… we identified flaws in the evaluation systems that need correction. The most troublesome of these is a strong bias in classroom observations that leads to teachers who are assigned more able students receiving better observation scores. The classroom observation systems capture not only what the teacher is doing, but also how students are responding. This makes the teacher’s classroom performance look better to an observer when the teacher has academically well-prepared students than when she doesn’t.
We are a long way from a system that clearly differentiates effectiveness among teachers. There is no question that the variation from school to school, from school district to school district, is significant.
European countries use teacher inspectorates, teams that visit schools and assess both school and teacher quality.
I wish the Board of Regent luck.
UPDATE: Just Out!! General outline of education initatives in the budget here