Tag Archives: teacher assessment

Restoring Trust: Can the Regent/SED, Unions and Superintendents Agree on a “Valid. Reliable and Fair” Teacher Evaluation System?

As part of my union rep duties I served on committees to select teachers for new schools; my favorite question was, “What was the best lesson you taught in the last few weeks and how do you know it?” Teachers had no problem describing lessons, and lots of trouble explaining how they could assess the effectiveness of the lesson. A few would say “exit slips,” others explain that checking the homework assessed the effectiveness of the lesson, or that experience was the best guide. In the real world we plow through the curriculum, an occasional unit exam, differentiating lessons, re-teaching concepts, understanding that god in her wisdom did not make us or our students all equal.

We have been giving statewide exams for decades, before No Children Left Behind, in grades four and eight, and Regents exams are more than a century old. Yes, we did have a triage system, classes were homogeneously grouped and students who failed to make academic progress were placed in classes with similar kids. At the end of the line kids dropped out or received a lesser diploma, and moved into the workforce. Low-skilled union jobs were commonplace, the education system was the “divider” which steered kids to college or the world of work.

For the last thirty years our economy has undergone structural changes, the low-skilled union jobs have fallen victim to automation or moved overseas.

Are schools appropriately preparing students for the rapidly and continuously changing world of work, or, to use the commonly used term, are we preparing “college and career ready” students?

The answer begins in the world of baseball.

I was attending an education conference, and as commonly happens, a book was handed out, not a dense text, not a “how to” book, not a messianic message, the book was Michael Lewis’ Moneyball: The Art of Winning an Unfair Game (2003),

Moneyball is a quest for the secret of success in baseball. Following the low-budget Oakland Athletics, their larger-than-life general manger, Billy Beane, and the strange brotherhood of amateur baseball enthusiasts, Michael Lewis has written not only “the single most influential baseball book ever” (Rob Neyer, Slate) but also what “may be the best book ever written on business” (Weekly Standard). [Lewis’] … intimate and original portraits of big league ballplayers are alone worth the price of admission—but the real jackpot is a cache of numbers—numbers!—collected over the years by a strange brotherhood of amateur baseball enthusiasts: software engineers, statisticians, Wall Street analysts, lawyers and physics professors.

Baseball is no longer ruled by the cigar chomping old-timers, every decision, from salary negotiations, to valuing players, to comparing players, to which pitch to throw to each batter is guided by a mathematical algorithm. As a wonky baseball friend claims, “You could prop Bernie in the corner of the dugout (“Weekend at Bernie’s“) and IBM’s Watson could manage the team.”

We even have a term for baseball data; Sabermetrics, a hobby among a handful of nerds now rules the national pastime.

The widespread use of data has not solved the problems of baseball, the numbers of Afro-American ballplayers and fans has sharply diminished, the fan base is aging out; data may drive decisions, the American pastime is facing a ticking clock.

See Chris Rock on baseball: http://deadline.com/2015/04/chris-rock-baseball-real-sports-with-bryant-gumbel-hbo-video-1201414387/

Whether we like it or not, understand it or not, data drives decisions across a wide spectrum. Ian Ayres, Super Crunchers, Why Thinking by Numbers Is the New Way to be Smart (2007) chronicles how data has embedded itself, from predicting the quality of red wines, to driving the medical profession, to determining which prisoners should be paroled, dense regressive mathematical models rule.

It is not surprising that data influences core decisions in education.

The New Teacher Project 2009 “Widget Effect” report resounded across the education domain,

All teachers are rated good or great. Less than 1 percent of teachers receive unsatisfactory ratings, making it impossible to identify truly exceptional teachers.

• Professional development is inadequate. Almost 3 in 4 teachers did not receive any specific feedback on improving their performance in their last evaluation.

• Novice teachers are neglected. Low expectations for beginning teachers translate into benign neglect in the classroom and a toothless tenure process.

• Poor performance goes unaddressed. Half of the districts studied have not dismissed a single tenured teacher for poor performance in the past five years.
The result has been the movement to assess teacher performance by applying dense mathematical regression models to education, attempting the compare teacher to teacher based upon student achievement data, using a range a variables to level the playing field.

In other words, there is no teacher evaluation system.

The new current systems attempt to address the absence of systems using statistical methods called regression analysis. The students take a common exam, a state test for example, the model allowing for a range of variables, namely, economic status of test takers, students with disabilities, the level of disability, English language learners, student attendance, etc., and the formula differentiates among teachers, within a margin of error.

As with all statistical data sets there are errors of measurement – that “plus or minus” that warns us that the results fall within a range.

“Candidate A leads candidate B 52-48 with an error of measurement of “plus or minus 4%”,” a statistical tie.

As school districts begin to create the models, usually referred to as Value-Added Models or Growth Models, “experts” warn about the problems of VAM. At the NYSED Education Learning Summit three education experts were critical, VAM data was not ready for prime time and a fourth expert argued that VAM was better then what preceded and suggested a combination of VAM student performance data, teacher observations and student surveys.

By mid-June the NYS Board of Regents/SED have to create regulations to implement another new teacher evaluation system in New York State.

The current teacher evaluation law has been on the books for three years (two years in New York City), and has raised more questions than answers.

* Are teachers in Rochester and Syracuse less able or is the algorithm flawed? (A law suit is in progress)
* Are teachers of poorer students (Low SES), English language learner and students with disabilities less able, or, is the algorithm, the formula, flawed?
* Conversely, are teachers higher income (High SES) district more able than other teachers?

On Monday the Regents will begin an in-depth discussion of the new Matrix model (See the Education Learning Summit page here)

The movement to Common Core tests was a disaster, probably too mild a term. Instead of phasing in the Common Core-aligned tests the decision-makers, led by Commissioner John King, used the “push off the end of the diving board” approach. Randi Weingarten asked King to incorporate a “save-harmless” for a year or two, to no avail. The result: over 100,000 parents opting-out, the opt-out movement is spreading across the nation, teachers are highly suspicious of everything and most importantly, electeds are threatened and are introducing legislation to weaken SED initiatives.

The Regents have four new members, three former superintendents and a former Buffalo school board member, who join former superintendents Cashin, Rosa and Young. Regents Cashin and Rosa voted against the original teacher evaluation plan.Hopefully their experience can put the train back on the tracks.

How can the Regents/SED win back the confidence of parent and teachers?

The complex numbers cannot be seen as a tool to punish teachers.

The movement to Common Core-aligned test ignored the impact of drastic drops in test scores, and, when the public expressed discomfort John King blamed parents and “outside agitators;” a classic example of how NOT to roll out a new initiative.

Advice: Accept the recommendations of NYSUT, the state teacher union, the UFT, the NYC teacher union and the NYC Department of Education., in other words, create buy-in, remember Rule # 1 of “change,” participation reduces resistance.

Each year a technical committee that includes the unions can review and modify the model. Shoving new proposals down people’s throats and vigorously defending the position hasn’t worked out too well. Incremental change builds trust.

A simple example: A teacher receives a score of 42 on a model that has an error of measurement of plus or minus 10 %. In other words the teacher’s score could be within the range of 32 to 52. If the cut scores between D (Developing) and E (Effective) is 50 the teacher should be graded E (Effective). We should accept the error of measurement issue and not disadvantage the teacher.

Similarly, on the observation rubrics, we should accept the recommendations of the unions and the NYC Department, set low cut scores and examine the results each year.

If the Regents/SED decides to select cut scores that are “tougher,” the teacher and parent wars will escalate and electeds will jump on the “voter” side, the Opt-Outs, the trash the system side. Charter schools, voucher and tax credit supporters will argue public schools are not fixable.

Just as Chris Rock points out re the underlying problems of baseball, the underlying problems of teacher quality will not be resolved by teacher evaluation regulations; however, the Regents/SED need to be standing on a stage with NYSUT President Karen Magee, UFT President Michael Mulgrew and NYC Chancellor Carmen Farina joining together announcing a teacher evaluation plan that is “valid, reliable and fair.”

Regaining a lost trust is crucial and must precede any further steps, until we can trust each other we cannot move forward, and, with the vandals at the gates, Regents/SED must take the first step.

King on the Spot: Who “Won” the Teacher Evaluation Battle? or, Is the Hubbub “Sound and Fury Signifying Nothing?”

… it is a tale told by an idiot, full of sound and fury, signifying nothing.” Macbeth

Understand we are all pawns on a limitless stage with the powerful vying for our adulation, and every few years, our votes.

After eight years Michael Bloomberg had fashioned a worldwide reputation – as a cynical New Yorker told me, “He turned New York into Copenhagen, whether we liked it or not,” pedestrian malls, bike lanes, new refurbished parks, low crime rates and an avalanche of tourists from around the world, the well-honed image, the diminutive, aloof manager-mayor at a press conference pointing at a reporter, “Miss, your question?” The apolitical mayor, neither democrat nor republican, running the greatest city in the world, who briefly flirted with running the nation.

Four years later he is an angry, reclusive billionaire spending his final months in a vengeful assault on teachers and their union.

In the early hours of January 17th, the final date the governor set for agreement the department and the union reached a handshake agreement. Hours later the mayor made a political judgment – he trashed the agreement and rolled out his media mavens. The mayor, editorials in the Daily News and the Post, the republican mayoral candidates and conservative pundits, all undoubtedly orchestrated by Howard Wolfson, the deputy Mayor for Political Skullduggery, all pounding away at a union who was “defending incompetent teachers.”

Anything short of building a guillotine on the steps of City Hall is unacceptable, tumbrels must be rolling from schools to the blade, and we must rid the city of the plague of bad teachers.

Sacrificing 250 million the penalty for not reaching a timely agreement, is a small price to pay to resuscitate a stumbling legacy and, John King, might be vulnerable, and might fear the slings and arrows of the Bloomberg regency.

Late Saturday afternoon Commissioner King released his decision: see State Ed summary, and the full 241-page decision.

The NY Post claims the decision is a victory for the mayor, sort of.

The department matches up city and union positions with the Commissioner’s decision and claims a win and crows that they won.

Gotham Schools cogently summarizes the plan with comments by Walcott and Mulgrew.

UFT President Mulgrew writes a letter to members explaining the positive components and worries about implementation.

Gotham Schools reminds us that the mayor had different expectations for the final plan,

In January and last year, Mayor Bloomberg rejected teacher evaluation deals because he said the systems that would go into place would not result in any teachers being fired.

King pushed back against that outlook today, in the first paragraph of his press release touting the new evaluation system.

“There are strong measures to help remove ineffective teachers and principals, but let’s be clear: New York is not going to fire its way to academic success,” King said.

One of aphorisms in the world of management is: as complexity increases the chances of achieving goals decreases – and teacher eval plan clearly is enormously complex.

Two years or so down the road, with a new, probably a democratic and teacher union friendly mayor in place, one wonders whether the “strum und drang” of the last year will have faded away, as the dismissal procedure for “double ineffective” teachers face an arbitrator for the first time.

Principals generally fall into two categories, the managers and the educators: some principals spend their time managing the school – discipline, guidance, and sorting through reams of paperwork, they can usually be found in their offices while others are constantly in and out of classrooms engaging in the teaching/learning process – a few are both.

The core of the plan, the 60%, are teacher observations,

Danielson (2013): 22 components must be observed annually via observations and teacher artifacts
Teachers will have a choice between two options and indicate which option they have chosen at their initial planning conference in the beginning of the school year:
• Option 1: (a) minimum of 1 formal; (b) minimum of 3 informal (at least 1 unannounced)
• Option 2: minimum of 6 informal (at least 1 unannounced)
Teacher may authorize observation by video

The department encourages principals to use a low inference protocol for teacher classroom observations – the observer scribes the lesson: pupil:teacher and pupil:pupil interactions and in the post observation conference discusses the lesson: How effective do you think the lesson was? How do you know? Why did you ask a particular question? Did it produce the expected answer? How could you have improved the question? How would you assess pupil engagement? etc., the post observation is a self-assessment as well as a principal assessment tool. The resultant report is a summary of the conversation – with a “grade” in the HEDI range (Highly Effective, Effective, Developing and Ineffective), no longer the S/U (Satisfactory/Unsatisfactory) assessment. The lesson is viewed through the Danielson lens. See Danielson Evaluation Instrument (2013).

A Partnership Support Organization, New Visions for Public Schools produced a detailed guide for principals on teacher observations.

The Teacher Effective Project Handbook Teacher Effectiveness Program 2012-13 Handbook, a project in coordination with the union, is a detailed guide to teacher observations.

The overall teacher evaluation law is far too complex and the entire state will stumble.

In New York City, in addition to the complexity of the plan, I have grave doubts about whether the current leadership of the department can manage the teacher evaluation plan. A new leadership team, working together with the union, might be able to craft a collaborative instructional support program, engaging peers in the observation program, using new technologies to view lessons, use common planning time as lesson studies, the potential is great, and unfulfilled.

Sadly the first act of the department/mayor after the release of the plan was to gloat – they may not be gloating after 12/31/13.

The Luddites Are Right: Skilled Observers Trump Dense Algorithms – Improving the Art of Teaching Requires Mentoring by Respected Educators not Computer Printouts

At the top of the reform agenda, match teachers to pupil growth (VAM) and grade teachers accordingly: identify “good” teachers and “bad” teachers. Reward the “good” teachers and support, retrain and perhaps fire the ‘bad’ teachers.

Research appears to support the impact of “good teachers,”

In their analysis of these data, Rivkin, Hanushek, and Kain (2005) found that teacher quality differences explained the largest portion of the variation in reading and math achievement. As in the Tennessee findings, Jordan, Mendro, and Weerasinghe (1997) found that the difference between students who had three consecutive highly effective teachers (again defined as those whose students showed the most improvement) and those who had three consecutive low-effect teachers (those with the least improvement) in the Dallas schools was 34 percentile points in reading achievement and 49 percentile points in math.

If the goal is to fill classrooms with “good teachers” and rid classrooms of “bad teachers,” how are we doing in achieving that goal? The New Teacher Project report, “The Widget Effect” paints a grim picture, in the districts studied virtually every teacher receives a satisfactory rating and there is little help for new or struggling teachers.

    All teachers are rated good or great.

Less than 1 percent of teachers receive unsatisfactory ratings, even in schools where students fail to meet basic academic standards, year after year.

    Excellence goes unrecognized.

When excellent ratings are the norm, truly exceptional teachers cannot be formally identified. Nor can they be compensated, promoted or retained.

    Professional development is inadequate.

Almost 3 in 4 teachers did not receive any specific feedback on improving their performance in their last evaluation.
Novice teachers are neglected. Low expectations for beginning teachers translate into benign neglect in the classroom and a toothless tenure process.

    Poor performance goes unaddressed.

Half of the districts studied have not dismissed a single tenured teacher for poor performance in the past five years. None dismiss more than a few each year.

To address the disconnect, led by the US Department of Education, states began to design teacher assessment systems based on a combination of student growth scores (VAM) and principal lesson observations based on a widely accepted rubric. In the growth score category teachers are measured against each other and in the teacher lesson observation category against a standard, for example the Danielson or Kim Marshall frameworks.

In order to qualify for Race to the Top (RttT) and School Incentive Grant (SIG) dollars states have to implement a teacher assessment system, in New York State it is referred to by the acronym APPR. The feds and states have spent hundreds of millions of dollars, jobs for psychometricians, economists and other experts, using extremely dense mathematical calculations, formulae that are the subject of sharp differences in the academic community.

“If these teachers were measured in a different year, or a different model were used, the rankings might bounce around quite a bit,” said Edward Haertel, a Stanford professor…. “People are going to treat these scores as if they were reflections on the effectiveness of the teachers without any appreciation of how unstable they are.”

As scholars bicker the results from early adopters are surprising to the teacher scolds ,

In Florida, 97 percent of teachers were deemed effective or highly effective in the most recent evaluations. In Tennessee, 98 percent of teachers were judged to be “at expectations.”

In Michigan, 98 percent of teachers were rated effective or better.

Advocates of education reform concede that such rosy numbers, after many millions of dollars developing the new systems and thousands of hours of training, are worrisome.

“It is too soon to say that we’re where we started and it’s all been for nothing,” said Sandi Jacobs, vice president of the National Council on Teacher Quality, a research and policy organization. “But there are some alarm bells going off.”

New York State is entering both the first teacher of teacher evaluation and implementing the Common Core (CCSS) on the soon to be administered state tests. A state teacher union officer and the Chancellor of the Board of Regents have sharply differing opinions.

“We’re giving the test before teaching the curriculum. That’s not what you should do,” said Maria Neira, the vice president for research and educational services for New York State United Teachers. “We’re rushing to do it, instead of doing it right.”

Merryl H. Tisch, the chairwoman of the state board of regents, counters that the state’s timeline for common-core implementation has been clear for more than two years, and that schools and districts would have to have been “living under a rock” to be surprised now.

“There is an enormous pushback against us because we are rolling out the common-core assessment, and some think we should have waited a year,” she said. “But as youngsters graduate high school right now, they’ve already hit a wall. Their reality is right now. We feel this is such an urgent issue, we have to roll it out now.”

A principal, only half-jokingly, tells me that teachers in her school joke about undercutting other teachers to improve their “grade.” With the release of the first round of scores in August, 2012 principals were confused, in numerous instances the grades did not jibe with principal judgments. For probationary teachers the teacher grades determine tenure – few principals are willing to fight with superintendents for their teachers.

Around the country school districts are developing multiple measure systems combining the use of student test scores, usually a growth model, and supervisory lesson observations. We know the student test score data is “unstable,” aka wide year to year swings, and, up to now supervisors rate only a few percent of teachers as “ineffective.” The new multiple measures assessments, combining growth scores and lesson observations find few “ineffective” teachers. What if we train supervisors and lead teachers on the use of an agreed upon rubric and use supervisor/teachers teams to observe?

Well, guess what, the Chicago Consortium for School Research conducted a two-year research project,

“Rethinking Teacher Evaluation in Chicago: Lessons Learned from Classroom Observations, Principal-Teacher Conferences, and District Implementation” (Read here) from the University of Chicago Consortium on School Research focuses on Chicago, but the lessons learned have significant applicability to districts across the country. The report is one of the first to provide research-based evidence showing that new teacher observation tools, when accompanied by thoughtful evaluation systems and professional development, can effectively measure teacher effectiveness and provide teachers with feedback on the factors that matter for improving student learning. This is especially relevant for those districts that are implementing the Charlotte Danielson Frameworks.

If we spent our time and dollars training supervisors and teachers around an agreed upon rubric we could develop an assessment system that identifies both “highly effective” and “ineffective” teachers, that not only identifies but provides feedback to a teacher that hopefully leads to improved practice.

The annual New York City Department of Education Instructional Expectations document demands of principals “frequent brief lesson observations with meaningful feedback.”

If we know that lesson observations conducted by well-trained supervisors and teachers are effective why do we spend mega-dollars on constructing systems based on mathematical algorithms that only the few can understand?

Either, we don’t think we can train supervisors to observe lessons, or, we don’t trust them, or, we are besotted with the world of data, take your pick.

Read an excellent interview with Charlotte Danielson here: http://www.ed.gov/Teacher-Evaluation-Systems

Or, maybe we can try to emulate Finland,

Finland has developed a deeply thoughtful curriculum and then provided teachers ever more autonomy with respect to how they approach that curriculum; they have both a curriculum worth teaching to and the kind of autonomy in how they approach it that is characteristic only of the high status professions. Because Finland is at the frontiers of curriculum design to support creativity and innovation, teachers have a job that has many of the attractions of the professions that involve research, development and design. They are pushing their intellectual and creative boundaries. Because Finland is understandably satisfied with the job its teachers are doing, it is willing to trust them and their professional judgments to a degree that is rare among the nations of the world (a sign of which is the fact that there are no tests given to all Finnish students at any level of the system that would allow supervisors to make judgments about the comparative worth of individual teachers or school faculties.)

Teachers jump up and down with glee – why can’t we be like Finland?

I point out that if we were like Finland the vast majority of you wouldn’t be teachers,

Finnish teacher education programs are extremely selective, admitting only one out of every ten students who apply. The result is that Finland recruits from the top quartile of the cohort

In the good, old USA admission standards to get into college schools of education are low and virtually all prospective teachers graduate and receive certification.

These are complex issues: the one point that I’m sure of is the current teacher assessment system will neither attract and retain “good” teachers or rid the system of “bad” teachers – it will simply anger all teachers – pit principals against teachers and principals and teachers against superintendents and state education departments: a pitiable formula for failure.

The Luddites are right. People trump mindless machines (and dense algorithms).