Getting It Right: Building a Research-Based Teacher Assessment System

A couple of years ago I was participating in a Danielson Training Workshop, two Saturdays in a room filled with principals and network support folk. We watched a video of part of a lesson – we were told we were watching a first year teacher in November in a high school classroom.

Under the former Satisfactory/Unsatisfactory rating system the lesson was clearly satisfactory. The Danielson Frameworks (Read the 115-page NYSED document here) requires that teachers are rated on a four-point scale (Distinguished, Proficient, Basic and Unsatisfactory) while New York State also requires a four point scale (Highly Effective, Effective, Developing and Ineffective). The Frameworks divides the teaching process into four domains, 22 components and 76 elements.

The instructor asked us to rate the lesson: at my table we were all over the place. For a teacher in the third month of her first year of teaching the lesson was excellent – clearly “proficient.”  Others argued the time in teaching was irrelevant, you had to rate her against all other teachers regardless of experience – at best, she was “developing.” Inter-rater reliability was absent.

Decades ago the union sent me to an Educational Testing Service conference on teacher assessment; about thirty experienced superintendents from all over the Northeast, and me, one union guy. We began by watching three 15-minutes videos of lessons: one an “old-fashioned” classroom, the kids sitting in rows, the kids answered teacher questions, the kids stood when they answered; the questions were at a high level although a small number of kids dominated the discussion. In the other video kids were sitting at tables, the teacher asked a question, gave the kids a few minutes to “huddle,” and one of the kids answered for the group and the teacher followed up with a few clarifying questions, in the third classroom the kids were at stations around the room, it was noisy, the noise was the kids discussing the assignment, the teacher flitted around the room, answering, clarifying and asking questions.

We were asked to rate the lesson on a provided checklist.

The result: the superintendent ratings were all over the place.

I was serving as the teacher union rep on a Schools Under Registration Review (SURR) team – we were visiting a low performing school. We were told to wait, the principal was busy, four of the 50 teachers were absent and there were three vacancies, the principal was assigning classroom coverages.

At the initial get acquainted session a team member, considering the staffing issues asked, “What are the primary qualities you look for in assessing teacher quality?” The principal blurted, “They come every day and blood doesn’t run out from under the door.”

A colleague was touring a school with very high test scores.  As he walked the building with the principal, he saw uniformly “mediocre” instruction – teacher-dominated, no student engagement. He mentioned the low quality of instruction to the principal, who shrugged, “Why mess with success?”

Once again, there is no inter-rater reliability.

In a number of school districts across the state almost all teachers received maximum observation ratings.

The State Ed folk simply accept the observation ratings of principals and school districts.

Charlotte Danielson, in her other book, Talk About Teaching  (September, 2015), discusses the complex role of the principal as rater as well as staff developer: how can a principal, who is the summative evaluater honestly engage with teachers who they rate?

In an excellent article from the Center for Educator Compensation Reform, Measuring and Promoting Inter-Rater Agreement of Teacher and Principal Performance Ratings (February, 2012), the authors parse the reliability of teacher observation ratings. There are a number of statistical tools to assess reliability – the state uses none of them.

In New York State 60% of a teacher rating is made up of the teacher observation score, and, we have no idea of the accuracy of the rating.

In the pre-Race to the Top days, the Satisfactory/Unsatisfactory rating days, the entire rating was dependent on the observation – in the last year of Bloomberg term 2.7% of teachers in New York City received Unsatisfactory ratings, under the current far more complex system that incorporates student tests scores and other measures of student growth only 1% of teachers were rated ineffective (Read a description of the plan:  APPR 3012-c).

Under the newest  system the other 40% is a combination of Measures of Student Learning and Student Learning Objectives, the use of state test scores is suspended until the 2019-20 school year.

Read a detailed description of the current APPR 3012-d teacher evaluation law here and a lengthy Power Point here.

In May, 2015 the Regents convened a Learning Summit and asked a number of experts to discuss the use of student growth scores (VAM): Watch the lengthy, sometime contentious discussion  here.

With one exception the experts criticized the use of student growth scores (VAM), the VAM scores did not meet the tests of “validity,” “reliability” and “stability.”

There have been glaring errors in the system. In the Sheri Lederman law suit  a teacher had very high observation scores and due to the composition of her class, very low student growth scores. The judge ruled the use of the growth scores, in the individual case, was “arbitrary and capricious.”

The APPR plan negotiated in New York City, on the other hand, allows for appeals by a neutral third party, and, the “neutral” has overturned appeals in which there was a wide disparity between the observation and VAM scores.

The current plan, created by the governor and approved by the legislature has been rejected by teachers and parents. Teachers are convinced that their score is dependent on the ability of the students they teach, not their competence. Parents feel schools are forced to “teach to the test” due to the consequences facing principals and teachers.

Angry parents, angry teachers and principals and a governor and a legislature looking for a way out of the box they created.

And, a cynicism from elements among the public – if two-thirds of kids are “failing” state tests how is it possible that only one percent of principals and teachers are rated “ineffective?”

The Board of Regents has been tasked with finding the “right” plan.

There has been surprisingly little research and public discussion of teacher attrition – in high poverty schools staggering percentages of teachers, 30%, 40%, 50% or more leave within their first few years.

The December, 2015, Cuomo Commission Task Force, in a scathing report, tasked the Regents with “correcting” what has been a disastrous path. Partially the governor creating an incredibly complex teacher evaluation matrix and partially the Commissioner King rushing to adopt the common core, common core testing and teacher evaluation simultaneously.

Can the Regents separate political decisions from research-based and guided decisions? Can the Regents move from the John King path, an emotion-guided political path to actually following “what the research says”?

On Tuesday the new Research Work Group, chaired by Regent Johnson will convene for the first time.

The roadmap for the State Ed Department and the Board of Regents are the twenty-one recommendations of the Cuomo Common Core Task Force. A number of the recommendations: untimed testing, an in-depth review from the field of the standards, greater transparency of the test items, alternatives to the use of examinations for students with disabilities, and, the beginning of an review of teacher evaluation are already in progress.

The Commissioner and the Regents have to regain a lost credibility: from policy emanating from the Gates Foundation and the so-called reformers to policies guided by scholarship and supported by parents and educators.

Advertisements

2 responses to “Getting It Right: Building a Research-Based Teacher Assessment System

  1. Marc Korashan

    There is an ongoing debate debate among psychometricians about the relationship between Validity and Reliability. When I first studied these concepts more than thirty-five years ago, the mantra was that reliability is a necessary but not sufficient condition for Validity. One could devise Reliable tests that do nor Validly measure the desired traits.

    I think that this is the situation we find ourselves dealing with as we work to evaluate Teacher Effectiveness. Teaching Effectiveness is a very complex thing; it involves the ability to establish positive relations with students, knowledge of content, knowledge of pedagogy, and the ability to use the that knowledge and one’s relation to students to make the knowledge accessible and meaningful for them. even that is an oversimplification.

    Danielson’s rubrics help us to understand the multiple dimensions in teaching and her book Talk About Teaching emphasizes the need to ask teachers, at all level of experience, to reflect on their choices as a way to help them get better.

    In my work with first and second year Teaching Fellows in NYC this is what I try very consciously to do. How did the lesson go? Did you achieve your desired outcome for all students? How do you know? What do you ave to do to reach students still struggling? What kind of “specially designed instruction” do you need to provide for the IEP driven students? How will you do it?

    Questions like these provoke very reach discussions about teaching and my role becomes one of providing suggestions for capturing teachable moments that may have been overlooked, restructuring lessons to ensure more engagement, offering suggestions for additional material or ways of presenting material, etc. Test scores offer no opportunity for discussion like this. Nor does using the Danielson rubrics as a checklist. Nor do ten minute drop ins give anyone a real sense of what happens in a classroom or how a teacher structures, presents, and follows through on a lesson.

    If the Regents are serious about developing meaningful Teacher Evaluation processes, they will stop taking direction from politicians and testing companies who want quick and dirty. The way to make the teacher corps better is to force them into on-going discussions about the choices they make, provide many opportunities for teachers to meet and talk about what is working in their classes, in their schools with the population they serve. Teachers need coaching not supervision until it is clear that the coaching is not helping them improve.

    In a community where coaching is the norm and everyone is working together to improve those who can’t make progress are very likely to leave the profession. Teaching is very taxing work. No one will keep teaching where they are seen as failing. .

    Like

  2. Jacqueline Foil (retired teacher and concerned citizen)

    On every aspect of Mark Korashon’s discussion, I agree. He is correct in his analysis.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s