A couple of years ago I was participating in a Danielson Training Workshop, two Saturdays in a room filled with principals and network support folk. We watched a video of part of a lesson – we were told we were watching a first year teacher in November in a high school classroom.
Under the former Satisfactory/Unsatisfactory rating system the lesson was clearly satisfactory. The Danielson Frameworks (Read the 115-page NYSED document here) requires that teachers are rated on a four-point scale (Distinguished, Proficient, Basic and Unsatisfactory) while New York State also requires a four point scale (Highly Effective, Effective, Developing and Ineffective). The Frameworks divides the teaching process into four domains, 22 components and 76 elements.
The instructor asked us to rate the lesson: at my table we were all over the place. For a teacher in the third month of her first year of teaching the lesson was excellent – clearly “proficient.” Others argued the time in teaching was irrelevant, you had to rate her against all other teachers regardless of experience – at best, she was “developing.” Inter-rater reliability was absent.
Decades ago the union sent me to an Educational Testing Service conference on teacher assessment; about thirty experienced superintendents from all over the Northeast, and me, one union guy. We began by watching three 15-minutes videos of lessons: one an “old-fashioned” classroom, the kids sitting in rows, the kids answered teacher questions, the kids stood when they answered; the questions were at a high level although a small number of kids dominated the discussion. In the other video kids were sitting at tables, the teacher asked a question, gave the kids a few minutes to “huddle,” and one of the kids answered for the group and the teacher followed up with a few clarifying questions, in the third classroom the kids were at stations around the room, it was noisy, the noise was the kids discussing the assignment, the teacher flitted around the room, answering, clarifying and asking questions.
We were asked to rate the lesson on a provided checklist.
The result: the superintendent ratings were all over the place.
I was serving as the teacher union rep on a Schools Under Registration Review (SURR) team – we were visiting a low performing school. We were told to wait, the principal was busy, four of the 50 teachers were absent and there were three vacancies, the principal was assigning classroom coverages.
At the initial get acquainted session a team member, considering the staffing issues asked, “What are the primary qualities you look for in assessing teacher quality?” The principal blurted, “They come every day and blood doesn’t run out from under the door.”
A colleague was touring a school with very high test scores. As he walked the building with the principal, he saw uniformly “mediocre” instruction – teacher-dominated, no student engagement. He mentioned the low quality of instruction to the principal, who shrugged, “Why mess with success?”
Once again, there is no inter-rater reliability.
In a number of school districts across the state almost all teachers received maximum observation ratings.
The State Ed folk simply accept the observation ratings of principals and school districts.
Charlotte Danielson, in her other book, Talk About Teaching (September, 2015), discusses the complex role of the principal as rater as well as staff developer: how can a principal, who is the summative evaluater honestly engage with teachers who they rate?
In an excellent article from the Center for Educator Compensation Reform, Measuring and Promoting Inter-Rater Agreement of Teacher and Principal Performance Ratings (February, 2012), the authors parse the reliability of teacher observation ratings. There are a number of statistical tools to assess reliability – the state uses none of them.
In New York State 60% of a teacher rating is made up of the teacher observation score, and, we have no idea of the accuracy of the rating.
In the pre-Race to the Top days, the Satisfactory/Unsatisfactory rating days, the entire rating was dependent on the observation – in the last year of Bloomberg term 2.7% of teachers in New York City received Unsatisfactory ratings, under the current far more complex system that incorporates student tests scores and other measures of student growth only 1% of teachers were rated ineffective (Read a description of the plan: APPR 3012-c).
Under the newest system the other 40% is a combination of Measures of Student Learning and Student Learning Objectives, the use of state test scores is suspended until the 2019-20 school year.
In May, 2015 the Regents convened a Learning Summit and asked a number of experts to discuss the use of student growth scores (VAM): Watch the lengthy, sometime contentious discussion here.
With one exception the experts criticized the use of student growth scores (VAM), the VAM scores did not meet the tests of “validity,” “reliability” and “stability.”
There have been glaring errors in the system. In the Sheri Lederman law suit a teacher had very high observation scores and due to the composition of her class, very low student growth scores. The judge ruled the use of the growth scores, in the individual case, was “arbitrary and capricious.”
The APPR plan negotiated in New York City, on the other hand, allows for appeals by a neutral third party, and, the “neutral” has overturned appeals in which there was a wide disparity between the observation and VAM scores.
The current plan, created by the governor and approved by the legislature has been rejected by teachers and parents. Teachers are convinced that their score is dependent on the ability of the students they teach, not their competence. Parents feel schools are forced to “teach to the test” due to the consequences facing principals and teachers.
Angry parents, angry teachers and principals and a governor and a legislature looking for a way out of the box they created.
And, a cynicism from elements among the public – if two-thirds of kids are “failing” state tests how is it possible that only one percent of principals and teachers are rated “ineffective?”
The Board of Regents has been tasked with finding the “right” plan.
There has been surprisingly little research and public discussion of teacher attrition – in high poverty schools staggering percentages of teachers, 30%, 40%, 50% or more leave within their first few years.
The December, 2015, Cuomo Commission Task Force, in a scathing report, tasked the Regents with “correcting” what has been a disastrous path. Partially the governor creating an incredibly complex teacher evaluation matrix and partially the Commissioner King rushing to adopt the common core, common core testing and teacher evaluation simultaneously.
Can the Regents separate political decisions from research-based and guided decisions? Can the Regents move from the John King path, an emotion-guided political path to actually following “what the research says”?
On Tuesday the new Research Work Group, chaired by Regent Johnson will convene for the first time.
The roadmap for the State Ed Department and the Board of Regents are the twenty-one recommendations of the Cuomo Common Core Task Force. A number of the recommendations: untimed testing, an in-depth review from the field of the standards, greater transparency of the test items, alternatives to the use of examinations for students with disabilities, and, the beginning of an review of teacher evaluation are already in progress.
The Commissioner and the Regents have to regain a lost credibility: from policy emanating from the Gates Foundation and the so-called reformers to policies guided by scholarship and supported by parents and educators.