Tag Archives: Peer Review

How Should We Evaluate/Assess/Rate Teacher Performance? (Maybe Peer Review)

We live in a world of assessment; let’s take a look at sports. Every major league baseball team has a group of data wonks who collect bits and pieces of data and create algorithms to assess and predict future performance. Once upon a time we could quote batting averages, home runs, earned run averages, now we’re overwhelmed by Wins Over Replacement (WAR), launch angle, etc… We live in the world of Sabermetrics (“A Guide to Sabermetrics Research”).  Every sport has its own set of data used to assess player performance and to predict outcomes.

If we work out we keep track of minutes on the treadmill, number of pull-ups and dips, deep knee bends,  we can measure our performance. We can keep track on our I-Phone or I-Watch. If we play golf: has our handicap dropped? Or, tennis: are we beating players we used to lose to?

Dancers and musicians practice with a coach, guided practice, and improve at their art.

Which raises the nurture/nature question?  Do some athletes and artists have encoded DNA that makes them a better athlete or musician, or, does 10,000 hours of practice produce excellence? Grit and determination or natural ability?

David Epstein, The Sport’s Gene: Inside the Science of Extraordinary Athletic Performance explores,

The debate is as old as physical competition. Are stars like Usain Bolt, Michael Phelps, and Serena Williams genetic freaks put on Earth to dominate their respective sports? Or are they simply normal people who overcame their biological limits through sheer force of will and obsessive training?

The truth is far messier than a simple dichotomy between nature and nurture. In the decade since the sequencing of the human genome, researchers have slowly begun to uncover how the relationship between biological endowments and a competitor’s training environment affects athleticism. Sports scientists have gradually entered the era of modern genetic research.

In his book, Outliers, Malcolm Gladwell lays out the much quoted “10,000 hours rule,”  simply put: gaining mastery requires 10,000 hours of “deliberate” practice.

The principle holds that 10,000 hours of “deliberate practice” are needed to become world-class in any field.

But a new Princeton study tears that theory down. In a meta-analysis of 88 studies on deliberate practice, the researchers found that practice accounted for just a 12% difference in performance in various domains.

In education, a 4% difference
In professions, just a 1% difference

In it, [the authors] argue that deliberate practice is only a predictor of success in fields that have super stable structures. For example, in tennis, chess, and classical music, the rules never change, so you can study up to become the best.

But in less stable fields, like entrepreneurship  [and teaching]… rules can go out the window… mastery is more than a matter of practice.

Teaching is a far more complex task: on one side the teacher, with whatever skills s/he possesses, on the other side twenty or thirty students with a wide range of life experiences: are they hungry, or bullied, or depressed, and, in the middle the content you’re expected to transmit to the students, content, or, standards, or a curriculum or a program, none of which you played a role in selecting. Almost ten years ago the Obama-Duncan administration decided  dense algorithms can be used to compare teachers to teachers who are teaching “similar” students, the tool is called Value-Added Measurement, referred to as VAM, it was rolled out as “we can use results on standardized test scores to rate and compare teachers.” John King, at that time the NYS Commissioner adopted the use of VAM combined with supervisory observations, to assess teacher performance.

The pushback was vigorous, Chancellor Merryl Tisch convened a summit, experts from around the country to discuss the efficacy of using the VAM tool. The experts were crystal clear, VAM was never intended to assess the performance of an individual teacher. The Board of Regents agreed upon a four year moratorium on the use of standardized test scores to assess teacher performance. Last week both house of the state legislature passed a bill returning the question of teacher assessment to school districts, with considerable pushback from parents who felt district would simply substitute another off-the-shelf test.

See my blog here

We should completely de-link teacher assessment from test results.

The Netherlands are among the highest achieving school systems in the OECD, 8,000 unionized public schools functioning like charter schools. the schools have extremely wide discretion in how they run. Read a detailed description here.

European school systems use an inspectorate system (See links in the blog here), the school supervisory authority sends teams of experts into schools to assess the functioning of the school.

Back in the 90’s and early 2000’s New York State sent Schools Under Registration Review (SURR) teams into schools for a deep dive into the functioning of the school and produced highly specific (“Findings and Recommendations”) reports. I was the teacher union representative on many teams.

New York City conducts periodic Quality Review visits to schools, a type of inspectorate system.

Experienced educators conduct a two-day school visit. They observe classrooms, speak with parents, students, teachers, and school leaders. They use the Quality Review Rubric to help them examine the information they gather during the school visit.

After the school visit, a Quality Review Report is published on the school’s DOE webpage. The Quality Review Report rates the school on 10 indicators of the Quality Review Rubric. The report also describes areas of strength and areas of focus and has written feedback on six of the indicators. Information from this report is also used in the School Quality Snapshot.

The QR teams can be improved, they should be joint Department/Union teams and the union should play a role in constructing the Quality Review Rubric.

As far as the assessment of individual teachers we shouldn’t fear peer review, respected colleagues providing feedback.

Let me say, I’m not hopeful. At a recent live streamed town hall, (by invitation only), the mayor, the chancellor and the chancellor’s crew met with parent and community leaders from the Bronx. To a question about the large number of schools in a district the chancellor posited an additional deputy superintendent, and added, the press would attack him for bloating the administration, and, the oress would be correct. Level upon level of supervision “monitors” data: educational decisions should be made in schools not in distant offices. A parent worried, she was in her son’s 6th grade class and saw student work replete with frequent spelling errors, the deputy chancellor suggested a Google Spelling app, the parent sighed, “He’ll only want to play video games on the computer.” Maybe a sign the school has serious instructional issues?

Empowering schools and holding them accountable for their decisions make much more sense than measuring and punishing, and, BTW, resources matter they matter a great deal, and, any school assessment should factor in “poverty risk load.” (See discussion here ).

Figthing over whether a teacher is “developing” or “effective” is insane, maybe we should be working to create collaborative school communities in which school leaders, parents and teachers work together to craft better outcomes.

Should Teachers Review Other Teachers Performance? Creating a Culture of Collaboration and Ownership

The single book driving education policy, on the surface, is about baseball. Moneyball: The Art of Winning an Unfair Game (2003), posits that decisions in baseball driven by mathematical algorithms will determine winners and losers.

The so-called science of baseball statistics, sabermetrics, is all the rage with a list of acronyms that baseball junkies, as well as baseball executives rely upon as gospel. In a prior age batting averages, HRs and RBIs, now OBA, WHIP, WAR, etc. drive personnel decisions.

If we can predict the value of a second baseman why can’t we use the same methodology to assess teachers?

Teacher selection has been tied to the economy, teacher surpluses in good times and teacher shortages in bad times.

The Great Consolidation, the formation of New York City in 1898 merged the five boroughs into one city, and one school system. As part of the reform movement a Board of Examiners was created, an autonomous body that created competitive examinations resulting in a rank order list – exam scores determined your rank on the eligible list. In the 1930’s during the Depression teaching was a highly sought after job and the Board of Examiners was the gate keeper. In 1950’s and 60’s a burgeoning economy pushed teaching aside, school districts scrambled to fill classrooms.

During the Vietnam War teachers in Title 1 schools were given draft exemptions and a surprising number of males sought refuge as teachers, at the end of the war some left for other professions while many stayed and made teaching a career.

By 1995 17% of teachers in New York City were provisional preparatory teachers (PPT); they accumulated the minimum number of credits but could not pass the low level exams. In hard-to-staff inner city schools teachers left faster than they came – many simply quit, others found jobs in higher achieving schools.

For decades supervisors evaluated teachers, the weak teachers left on their own or were counseled out, teachers were observed a couple of times a year – the vast percentage were rated “satisfactory.”

The New Teacher Project (TNTP), in 2009, issued the “Widget Effect” report,

“When it comes to measuring instructional performance, current policies and systems overlook significant differences between teachers. There is little or no differentiation between teachers, from good to fair or fair to poor. This is the Widget Effect: a tendency to treat all teachers as roughly interchangeable when their teaching is quite variable. Consequently, teachers are not developed as professionals with individual strengths and capabilities and poor performance is rarely identified or addressed.”

Just as Money Ball changed the way general managers viewed baseball the Widget Effect changed the way the feds and the states viewed teacher assessment.

The answer for small market baseball teams and education systems was the analysis of numbers – creating complex mathematical formulas and judge both players and teachers by the numbers.

They both sorely overestimated the use of data to substitute for human judgment.

Arne Duncan pegged Race to the Top (RttT) dollars to creating teacher evaluation plans and states scrambled to create systems anchored in student growth data (VAM) with the end product, a numerical score, a system in which all teachers are rated and compared to each other.

In New York State all 700 districts created plans (view all the plans here).

In October the state education department released the preliminary results of all the districts (with the exception of NYC),

The preliminary statewide composite results, based on data submitted by school districts and BOCES as of the October 18 deadline, found that 91.5 percent of teachers are rated Highly Effective (49.7 percent) or Effective (41.8 percent); 4.4 percent are rated Developing; and 1 percent are rated Ineffective.

The Widget Effect thesis was simply wrong – the algorithm created by one of the most well-respected research institutions in the nation produced a teacher evaluation plan that is useless.

Principals and teachers have no idea what the score means, it is useless as a guide for professional development and the “instability,” the year to year variability puts into question whether the score can serve any purpose.

The threat of a bad rating, the threat of a school closing does not result in teachers improving practice; in fact, it impedes improving practice, and diminishes the effectiveness of all teachers. Fear and test prep is no way to run a school system.

American Institutes for Research (AIR) suggests that teacher evaluation take into account the following factors,

• Student performance on annual standardized achievement tests
• Student performance on classroom tests (e.g., curriculum-based measures)
• Evaluation of student artifacts and work judged according to rubrics
• Unique assessments for teachers in nontested grades and subjects
• Unique assessments for teachers of at-risk populations
• Review of teacher portfolios
• Student surveys
• Parent surveys
• Self-report measures
• Principal evaluation
• Goal-driven professional development
• Classroom observation

In ideal systems communities of professionals, principals and teachers create a rubric, a list of the qualities of effective teaching, and use the list to guide both professional development as well as normative and summative assessments

The one element missing from the list is involvement of peers – of colleagues.

If we view ourselves as professionals we must begin to take ownership of our profession. I hear constant chirping, you can’t use test scores, principals are unfair and prejudiced, kids vary from year to year, and on and on, all complaints have some validity; however, how do we assess performance, and, more importantly, how do we create a culture of constant improvement, constant growth in practice?

The identification of the 1%, the teachers who are inadequate, must not drive an assessment system for the profession – what must drive the profession is teachers striving, always striving to become more effective teachers. The hardest working athletes are the best athletes, they never rest on their laurels, their goal is the next ring, the next championship, and, improving the team. LeBron James makes the other four players on the court better.

The process of creating a school-based assessment system, a system anchored in on-going professional development, directed by the school leader and the staff, supported by the district leadership which produces desired outcomes.

I believe that peer collaboration, involving ourselves in reviewing our own instructional practices and the practices of our colleagues is the most effective path, we must move away from a factory model to a professional model.

An Annenberg Institute report suggests,

One of the perennial criticisms of public education in the United States is the reliance on the traditional “egg-crate” model of teaching and learning, whereby teachers instruct students within isolated, closed-door classrooms with little interaction or sharing of effective practices. Peer learning among teachers and leaders, when it does happen, has traditionally been scattered and informal, with the exception of some district- and philanthropically supported efforts.

An emerging literature supports the idea that peer networks, both within and across schools, can improve teaching and learning

This lack of widespread formal knowledge sharing is coupled with an increased emphasis on evaluation systems that reward individual teachers and schools for producing higher test scores … the push for these “new teacher-evaluation systems that rely primarily on matching individual teachers with their students’ test scores threatens to exacerbate [a] competitive, rather than collaborative, system of teaching,” a system that does not lend itself to high-quality practice

[The Annenberg report recommends,]

• allow individuals in schools adequate time to learn about and share effective practices;
• build rapport and “safe spaces” for principals and practitioners to discuss challenges openly and honestly;
• understand the importance of building social capital within and across schools, and that teaching and leadership are joint enterprises
• emphasize inter-school collaboration and outward facing approach, rather than the competitive models that are increasingly popular in urban districts; and
• mix both technological and face-to-face interactions to build effective communities of practice.

Not only do these guidelines support collaboration, rather than competition and isolation, they are consistent with recent international research that illuminates the traits of the world’s most successful educational systems (Fullan 2010; Gurria 2011).

The sharing of best practices — across classrooms and schools — are tools that schools, districts, and states can use to create and retain highly effective teachers and school leaders. This recent Annenberg study found that U.S. teachers used peer-to-peer networks when available and found them valuable. But without formal structures and built-in time for such networks — and absent a culture of collaboration and teamwork — they will remain tools never used to their full potential.

The Governor, the Chancellors, both of the Regents and the NYC Education Department and the Commissioner must recognize that no matter how many regulations you pass, no matter how top-down and proscriptive the message, no matter the express or implicit threats we will only improve outcomes by involving principals and teachers, by creating cultures that honor and respect working in teams, and we urge unions to support programs that encourage teachers not to fear a culture of collaboration that includes exposing themselves to the inspection of colleagues.

There is a narrow window to make substantive changes, a new administration that is anxious to put its stamp on teaching and learning and extinguish twelve years of folly; a new administration that must not look to the past for answers; a district leadership that must look to teachers and school leaders, the folks “in the trenches” each and every day.

Windows close quickly.

Read description of a New York City high school peer review exemplar: http://www.wnyc.org/story/303290-teachers-peer-review-would-strengthen-the-profession/

Read articles discussing labor-management collaboration here