The Albany Education Learning Summit: aka, Teacher Evaluation 4.0: Can the Regents/SED Create a Student Performance/Teacher Observation Model that is Valid, Reliable and Fair?

The Regents and invited guests gathered at the State Museum in Albany to listen to a day of comments on the new Teacher Evaluation law, called by one superintendent, Teacher Evaluation 4.0. A few blocks away in the Capital the Republican members of the Senate, at least some them, were swearing undying loyalty to Majority Leader Dean Skelos while behind the scenes John De Francisco (Syracuse), Catherine Young (Olean) and John Flanagan (Smithtown-Huntington) whispered in their colleagues ears. As early as Monday Dean may be crying “Et tu John (or Catherine)” as the dagger is plunged into his leadership heart.

You can watch/listen to the Albany Summit panels, read the hundreds of pages of comments and supporting documents and submit your own comments at

Ken Wagner, the acting co-Commissioner acted as the moderator, and did an outstanding job. Wagner, who describes himself as a “recovering school psychologist,” skillfully asked clarifying questions, asked and restated questions from Regent members, the audience and the Internet audience.

New York City Regents Bendit (Manhattan), Cea (Staten Island) and Cottrell (At-Large) did not attend. In addition about a dozen members of the legislature were in attendance. All the newly appointed Regents attended.
(Regent Cea informed me she watched the webcast of the Summit)

Wagner described the changes in the law (See changes here): moving to the HEDI matrix, requiring in addition to your principal an outside evaluator, allowing for but not requiring a peer evaluator, requiring that all alternative assessment tools, for example Student Learning Objectives, are approved by the state and setting time constraints for completion of the process, with a hardship that would allow time limit extensions.

After opening remarks by co-Commissioner Beth Berlin, Wagner began by summarizing the changes in the law in great detail. Watch the opening remarks here.

In the first two sessions superintendents and the three organizations in the state representing supervisors vigorously attacked the outside evaluator concept and attempted to buttress the role of the principal as primary evaluator. Both superintendents and school-based supervisors saw the outside evaluator concept as eroding their authority as well as overly complex and an administrative monstrosity. There are eighteen approved observational rubrics: imagine matching up districts, imagine the cost of “training” outside observers, imagine the cost to district, hundreds of schools have a single supervisor, schools are hours apart, the concept of outside evaluator is mechanically impossible. In addition the outside evaluator doesn’t “know the schools,” doesn’t the students, doesn’t know the school culture.

What was NOT part of the discussion: in many school districts all teachers received 58, 59 and 60 points on the 60 point scale: Are the vast percentage of teachers highly effective?

The superintendent/principal panel urged the Regents to only credit the outside observer with 1 – 5% of the teacher observation section of the matrix.

The expert panel was the most anticipated section of the day. Seven “experts, six in attendance and one on video from California occasionally agreed and frequently disagreed about the technical aspects of the student performance section of the matrix. Was a growth model, usually referred to as Value-Added Modeling (VAM) a valid, reliable and stable method to assess teacher performance or, to quote Diane Ravitch, “junk science?”

Read my previous post on the efficacy of VAMs here.

Three of the “experts” have national reputations, Tom Kane (Harvard), Aaron Pallas (Columbia) and Jesse Rothstein (U of California – Berkeley) and have been in the midst of the VAM vortex.

The panel also included Stephen Caldas (Manhattanville College), Catherine Brown (Center for American Progress), Lesley Guggenheim (The New Teacher Project) and Sandi Jacobs (National Council on Teacher Quality)

Kane was the lead researcher in the 3-year MET Project that concluded student performance, observations and student surveys create an accurate assessment of teacher performance. The report suggests that VAM measures should account for between 33 and 50% of teacher assessment.

Read a summary of the MET Project here.

Aaron Pallas, who is an expert witness in two lawsuit challenging growth scores points out that teachers who were “measured” by the results of state tests received significantly lower scores than teachers who were “measured” by other metrics. (Read full article here.

What is immediately obvious is that teachers whose state growth ratings were not based on the growth percentiles received much higher ratings than those whose ratings were based on the growth percentiles

Jesse Rothstain has been challenging Tom Kane quite publicly over the use of VAM scores, scholars do not usually duel in public. Read a Rothstein paper here, and, a blog post that summaries the “back and forth” and links to the many other posts here.

I urge you to put aside an hour and forty minutes and watch the expert panel here.

To summarize the experts:

* Caldas points to 50% error rates in VAM models and suggests they are useless. Guggenheim from TNTP argues that they’re better than the previous “S” – “U” systems in which virtually everyone received an “S” rating. Read a blog post here in which Caldis rejects the validity of growth models

* Everyone agreed VAM models are extremely complicated and very few understand them.

* All the panelists agreed that an external evaluator added accuracy to the process, the “arms-length” nature of the external observer added to the reliability; while the purpose of the external evaluator was summative assessment the in-school supervisor continued to have the role of normative or monitoring day-to-day practice; merging the roles will be challenging.

* The growth metric can be used to measure anything – New York City developed 159 growth scores to measure Student Learning Objectives (SLOs)

* Only about 20% of teachers are measured by state test scores, others by locally developed SLOs, some by school-wide or group measures, such as pre-k and kindergarten teachers as well as perhaps physical education and art teachers; probably no alternative to using group measures.

* The number and type of observations is crucial, How many formal? How many informal? How many by an external evaluator? All agreed that timely feedback by all of the assessors is crucial.

* All the panelists agreed that the tests themselves were troublesome at best, or deeply flawed, and the scores produced. misunderstood.

* Wagner asked about the use of video, either real time or archived, to facilitate the role of the external evaluator.

Kane referenced current research entitled The Best Foot Forward Project,

In the Best Foot Forward project, we give teachers control of the camera and allow them to choose which of their videos to submit for observation. We then train their administrators to view and score them as they would score an in-person classroom observation and then have a conversation with a teacher using the video as a coaching tool. We also provide commentary and feedback from external observers.

In conclusion: Kane believes research supports the use of VAM and produces data over time that distinguishes among teachers; Rothstein, Pallas and Caldas pretty vehemently disagree.

The majority of the discussion centered on teacher observations, and, while the experts all believed that observations were the key, and, external observers crucial, superintendents and school-based supervisors disagreed sharply.

In my view the entire testing system is far too complex, the tests not useful and the data far too unstable to be used to assess teachers. I question the ability of too many school-based supervisors to accurately assess and help teachers grow professionally. Yes, some are extremely proficient, others not, there is far too little training for supervisors in using the observation process to improve instruction.

The Best Foot Forward Project supra moves us to the next level and I hope the research is utilized by school districts.

By the end of the week, a draft plan and at the May 18/19 Regents Meeting a full discussion

And, as you read this post the Senate Republicans are in the process of deposing their leader with wide-ranging ramifications for pending educational legislation.

Stay tuned.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s