Become a Catalyst member

Celebrating 25 years of Catalyst


Join the conversation

We encourage our readers to leave comments and engage in dialogue about our stories. But before you do, please check out our "rules of the road."

Subscribe to by e-mail feeds

Current Issue

The race for City Hall

Jobs and schools promise to be top issues in next year’s city elections. The mayor’s education agenda faces its toughest test in the African-American communities that gave him strong support in 2011.

Gates-funded teacher evaluation study sheds light on Chicago system

A report released Tuesday by the Bill & Melinda Gates Foundation suggests that Chicago’s new teacher evaluation system may be on the right track in helping to determine which teachers are most effective.

Through studies in seven school districts, researchers from 21 universities and organizations put teacher observations, student achievement gains and feedback from student surveys under a microscope—and found these measures were accurate.

Researchers randomly assigned students to math and English teachers who had already received better or worse ratings based on the measures. They got proof the ratings worked when, after a year, students did better or worse according to which teacher taught them.

“As a group, teachers previously identified as more effective caused students to learn more. Teachers identified as less effective caused students to learn less,” says the report. (Read more about the study here.)

Researchers note that one goal is to have a rating system that doesn’t fluctuate too much from year to year, and also predicts students’ performance on state tests and higher-level tests.

They accomplished that using formulas that put 33 percent to 50 percent of the weight on the growth in students’ state test scores, and split the rest of the ratings equally between teacher observations and student survey results.

CPS plan not an exact fit

CPS’ plan bears some similarities, but doesn’t fit the model exactly. Teachers this year will have 75 percent to 90 percent of their evaluations determined by observations of teacher practice. The percentage of evaluations tied to student growth will be 10 to 25 percent, but will increase to up to 30 percent for some teachers in the 2014-2015 school year, then to 35 and perhaps 40 percent in subsequent years. The percentage will include district-designed “performance tasks” as well as standardized tests.

In addition, the research cautions that teachers should be observed at least twice, by at least two different observers, in order to create an accurate rating. CPS teachers are to be observed at least four times when they are evaluated, but there is no guarantee different administrators – such as an assistant principal and the principal – will carry out those evaluations.

"The configuration of an observation schedule is made at the school level and in many cases, both administrators are conducting observations," notes CPS spokeswoman Robyn Ziegler. Some schools also have several assistant principals, meaning some teachers could be observed by three or more different administrators.

CPS and the Chicago Teachers Union are also studying the issue of student surveys. The district will pilot the surveys in 2013-2014, and a joint union-CPS committee will decide whether to include them in teacher evaluations during the 2014-2015 school year. Originally, the district planned to make surveys 10 percent of teachers’ evaluations, but that plan was scrapped during this fall’s teacher strike.

Jean Clements, president of Florida’s Hillsborough Classroom Teachers Association, said on a press call announcing the study results that surveys aren’t part of her district’s evaluation process because they were thought to be most contentious of all.

“We can use the student surveys to improve practice without bringing it into the actual evaluation process, which we think would be controversial and a bit contentious,” Clements said.


Anonymous wrote 2 years 5 days ago

SB7 law and ratings

Please advise. Is it true that if a teacher receives two unsatisfactory ratings in seven years, that his/her teacher license will be revoked in Illinois? What is the protection for teachers under this system?

Anonymous wrote 2 years 4 days ago


2 unsatisfactory ratings can lead to suspension or revocation. However, the school will probably fire you before. Scary stuff. ....if you are basic aka satisfactory your 2nd to go at school closing and cps has in contract that you will not be recalled,,,,,that's why I voted noooo..just for my pride. .dont forget two basics will put you in jeopardy of getting unsatisfactory.....

Anonymous wrote 2 years 4 days ago

The headline indicates the

The headline indicates the report supports CPS VAM, but doesn't make clear how; no light was shed here. Perhaps the article should have quoted someone besides a CPS spokeswoman to analyze the findings?

Anonymous wrote 2 years 4 days ago

Here is an analysis of the

Here is an analysis of the Gates study which raises serious questions as to its validity.

Anonymous wrote 2 years 4 days ago

Same teacher = different value-added ranking 2 consecutive years

The 50 million dollar lie

by Gary Rubinstein

"Last year I spent a lot of time making scatter plots of the released New York City teacher data reports to demonstrate how unreliable value-added measurements are. Over a series of six posts, which you can read here, I showed that the same teacher can get completely different value-added rankings in two consecutive years, in the same year with two different subjects, and in the same year with the same subject, but in two different grades.

Here is an example of such a scatter plot, this one showing the ‘raw’ score (for value added, this is a number between -1 and +1) for the same teachers in two consecutive years. Notice how it looks like someone fired a blue paint ball at the middle of the screen. This is known, mathematically, as a ‘weak correlation.’ If the value-added scores were truly stable from one year to the next, you would see a generally upward sloping line from the bottom left to the top right."

See the data graphs on Gary Rubenstein's site.

Anonymous wrote 2 years 4 days ago

Rutgers' prof: Gates study

Rutgers' prof: Gates study not pertinent to other US districts.

“2. Assuming Data Models Used in Practice are of Comparable Quality/Usefulness

I would go so far as to say that it is reckless to assert that the new Gates findings on this relatively select sub-sample of teachers (for whom high quality data were available on all measures over multiple years) have much if any bearing on the usefulness of the types of measures and models being implemented across states and districts.”

Read more by Rutgers professor Bruce D. Baker.

Anonymous wrote 2 years 4 days ago

qualified observers student groups

Are those observations done by qualified people with out an agenda?

What type of students were looked at and what does random mean in this ciontext?

Anonymous wrote 2 years 4 days ago

You're a teacher?

The grammar in your post is so bad that I really don't understand what it is that you're trying to say. Don't get me started on the punctuation. I hope this isn't an indication of what you teach your students. If it is, I'm glad it's possible to get you out of that classroom to make room for someone who actually knows how to write in English!

Anonymous wrote 2 years 4 days ago

?who said CPS schools have 'several' assistant principals?

not true--administration is overwhelmed by the amount of observations required in this time consuming process. many school APs spend too much of their time writing accident reporting adn first aide due to recess. There is a good article from Tenn. on how the new teacher evaluation process is hurting teachers and principals there are speaking out.

3B wrote 2 years 3 days ago

REACH training

Making matters worse is at my school within the Sorority Network, the principal has done no training for staff on REACH. She stated to faculty, "you need to go online and familiarize yourself with this." Other schools are looking a video of anonymous teachers and evaluating them according to REACH in the hopes that teachers will comprehend this system.

Chicago dad wrote 1 year 51 weeks ago

Real world examples of how VAM hurts great teachers.

The bottom line is this: In each school, VAM grades that group of teachers on a curve. In both struggling and fantastic schools it creates failure where none exists by assuming that anyone on the "bad" side of the curve is a failure in spite of the fact that their students are doing well or actually high performing. VAM has the same absurd assumptions and flaws as the Stacked Ranking System that Microsoft shot itself in both feet with in it's lost decade where it lost significant market share as a result of creating a climate of fear and firing great, productive employees. It seems that Bill Gates learned nothing from that experience and seeks to repeat the insanity by nuking our schools the same way. His children go to schools that would never ever implement the policies he imperiously seeks to impose on our schools in the name of profit. He's embraced the corporate version of "the white man's burden".

Michael R Butz wrote 1 year 51 weeks ago

Further Evidence

It's getting harder and harder to oppose these changes to how we evaluate the quality of teaching in American public schools with the final part of the MET study being released. The study was well-designed and executed and the data should be considered meaningful.

Thank god, for teachers and students, that we're getting a handle on this and bringing teaching up to the professional level of accountability and demand for high quality this profession rightly deserves.

Anonymous wrote 1 year 51 weeks ago

Yes, thank Gates or God.

Yes, thank Gates or God.

Chicago dad wrote 1 year 51 weeks ago

Actually no Mr. Butz, just the opposite.

It's getting harder and harder to justify them. As has been pointed out in the links above by highly qualified critics who seek to let the chips fall where they may, the MET study is flawed in it's conclusions which disagree with the actual data MET collected and processed. The highly deceptive nature of the graphics used, where averages rather than the actual scatter plots were used illustrates this very clearly.
"For some unexplained reason, the statisticians who analyzed the data for the MET Project report divided the 3,000 teachers into 20 groups of about 150 teachers each and plotted the average VAM scores for each group. Why?

And whatever the reason might be, why would one do such a thing when it has been known for more than 60 years now that correlating averages of groups grossly overstates the strength of the relationship between two variables? W.S. Robinson in 1950 named this the "ecological correlation fallacy." Please look it up in Wikipedia. The fallacy was used decades ago to argue that African-Americans were illiterate because the correlation of %-African-American and %-illiterate was extremely high when measured at the level of the 50 states."

Chicago dad wrote 1 year 51 weeks ago

MAP test

One of the tests that is used in Chicago is the MAP test, the same one that teachers in Seattle have refused to give to their students because, among other reasons, the margin of error in the test is greater than the expected gain by students. More important than that, the makers of the test themselves have stated unequivocally that the MAP test should not be used for teacher evaluations since it was not designed for that purpose and is unsuited for it. Why then do politicians and the so called leaders of corporate reform ignore cold hard facts and push these wasteful and destructive policies ahead? What's more important, student learning time or making a profit?

Michael R Butz wrote 1 year 51 weeks ago

I see you disagree with the

I see you disagree with the conclusions drawn by the MET study, Chicago Dad. Others do not.

Research into human behavior and outcomes is never 100% iron-clad in its conclusions. But this study - the scale, design and scope of which has not seen an equal - appears to contain enough actionable and valid data that, along with others which suggest similar correlations between teacher quality and student outcomes, it should be taken as part of the discussion of education reform.

Gosh, even Randi endorsed its conclusion that a meaningful teacher evaluation should include several components, one of which is growth on standardized test scores.

Michael R Butz wrote 1 year 51 weeks ago

Chicago Dad - how would you

Chicago Dad - how would you propose we evaluate the effectiveness of our kids' teachers?

Chicago dad wrote 1 year 51 weeks ago

Why misrepresent my position?

The size of the study is not relevant, that's a smoke screen. I could spend 10X that to prove the moon is made of cheddar cheese and I'd still be lying by saying it is. The data in the MET study disagrees with it's own conclusions. You conveniently ignore the deceptive nature of the way the conclusions are presented to hide that from view. In addition, the fact that the study compares VAM to itself renders any conclusions drawn from that part highly questionable to say the least. Randi's position is a fall back position that acknowledges the overwhelming amount of money and influence brought to bear to gain a market share of our education tax dollars, it is not an outright endorsement. They are political statements, not policy endorsements. She is biding her time and waiting for the whole hose of cards to collapse upon itself. I just became aware of a new study done by Stanford and the Economic Policy Center that refutes the claim that American education is failing and falling behind, the major lie being told to gin up support for the false reforms the MET study seeks to validate.

Chicago dad wrote 1 year 51 weeks ago

Observation systems are the way to go.

VAM is highly flawed and will result in the removal of great teachers from the classroom as well as becoming a major disincentive to attracting the best of the best to the profession. No one wants to commit their life to a field where they can be fired for doing nothing wrong and everything right. If America does what Finland did in terms of insuring teacher quality, that question will be rendered irrelevant and we can focus on the other bigger factors that prevent some of our children from realizing their full potential. In school factors are no more than 20% of what predicts a students success, and 8% of that is teacher effects. It's long past time we admit that and focus on the bigger picture. Don't waste my time with any of the silly arguments that Finland is different than we are culturally and demographically since we are just speaking of teacher quality at your suggestion. Teaching is not a trade but a profession. We need to get the profiteers and politicians out of the teachers way and let them be the professionals the already are. I trust teachers to remove any low performers from their profession. The PAR observation and professional development system empowers them to do that in a way that's fair and accurate. It has already had great success in the places where it is being used and has had none of the absurd failures that VAM has inflicted on us.

Michael R Butz wrote 1 year 51 weeks ago

So you're just opposed to the

So you're just opposed to the use of any student test data in evals, but otherwise you would support observations at least 2x a year by 2 different reviewers along with student survey data? Chicago Dad (I am one, too) there are other studies which indicate VAM can be useful but you're right, most if not all caution VAM should not be the ONLY factor, just one of several. I am a scientist and have read several of these studies...they seem valid to me, what can I say? And I measure things for a living...if you don't include a quantifiable component your results are generally suspect for most things being studied. No study is perfect, including the MET one or any other.

I am very aware of which groups in the US are under-performing on the PISA tests which compare us to other nations...poor kids. When you disaggregate the PISA data by zip code and Census data on income and SES it is very plain that our more affluent children are, literally, the best in the world and our poor kids (of which we have way too many) are much worse, dragging the overall US score down.

But this is not a reason to avoid using student outcomes in teacher evaluations. If anything, it's MORE reason to use them, so we can identify which teachers are better at reaching these students and which are less effective at so doing. That way we can incentivize this group of teachers to teach the kids who need them the most. This study indicates we can, indeed, rank teachers. And we should - it's that important.

I am a Democrat so I support a strong safety net and enhanced protections and supports for the least well off among us. Of course I know that where a kid grows up, and how s/he grows up, have the biggest influence on their academic achievement. But that does not mean we can't make changes within the school house walls that could help. To suggest, well we can only tackle poverty to raise these kids up, is a smoke screen to do nothing right now and just hope for something to change in a generation or two. We must do BOTH.

And yes, comparison's to Finland are not apt as the two countries are wildly different. That doesn't mean there aren't best practices for us to take from Finland or other nations, but to directly compare us is folly.

Chicago dad wrote 1 year 51 weeks ago

Drill down

"if you don't include a quantifiable component your results are generally suspect for most things being studied." The quantifiable component has to be accurately measured and valid, VAM does neither of those well. As I alluded to before, it is not an objective measure by any stretch of the imagination, it is an invalid measure. The caution that so very many others raise is not that VAM should be used only in part, but that it should not be used at all in ANY high stakes decision. Other teacher eval components are used as a crutch so that VAM can be included in so called "multiple measures" schemes in spite of it being so flawed as to provide no useful information. VAM is all about extracting profit and creating churn in the teaching force, a thing that is well known to be the most harmful for the poorest students who are likely to have the least stability in their lives to start with. For the record. student surveys can be collected and looked at, but I have seen far too many instances where teacher popularity is more important to some student populations than teacher quality. I have also seen places where the kids are spot on. I think that student surveys can be both useful and deceptive, and since there's no good way to control for that they can't be a part of the decision process but can be used as a starting point for other examinations of teachers practice. The bigger picture tragedy about VAM is that if it is not used at the individual teacher or school level but as a much broader sampling tool, it does have potential to provide better information than what we now use, but that results in far less profit for the testing companies.
Last, I do not appreciate at all the way you continue to twist my words and misrepresent my positions on policy questions. The manner in which you do so has been seen to be used with regularity by apologists for the profiteers. The congruence between your methods and theirs is far too high for me to believe you are just another "ordinary citizen". The following snip from your post is a classic example of this, and it is a lie that is repeatedly told against those that oppose the hostile corporate takeover of public education.

"To suggest, well we can only tackle poverty to raise these kids up, is a smoke screen to do nothing right now and just hope for something to change in a generation or two."

Anonymous wrote 1 year 51 weeks ago

VAM is not a stable

VAM is not a stable assessment b/c there exists a weak correlation between teacher performance and student test scores.

A stronger correlation exists between zip codes and scores, or family income and scores. (However, "poverty seems to be an excuse" these days.)

Unstable VAM models mean a highly rated teacher this year can appear to be a bomb next. None are up to the task asked of them, but some VAM models are worse than others. Some, for example, require teachers to be evaluated based on test scores of students they haven't taught.

More to read here, from folks who know psychometrics.

Michael R Butz wrote 1 year 51 weeks ago

Chgo Dad - You state that VAM

Chgo Dad - You state that VAM is entirely useless for this purpose as though that were accepted fact; it isn't. The study we're discussing says it works, as do others, including one from the U of C and one from Harvard (as well as others). These aren't fly by night institutions. So your opinion, that VAM cannot be trusted to be any part of an eval is one shared by many and is contrary to the opposite opinion, also shared by many. My point is...your statement it's not valid isn't any more concrete than my statement it is. And smart people who study these things for a living fall on both sides.

I would be interested to know how you believe VAM is linked to "extracting profit" and how it creates churn. And please don't use the text book or test companies in your answer. Lots of people make profit off education and schools (contractors, bus companies, food service companies, tutoring companies, etc...), that alone does not make it evil. In fact it makes it normal...there are very few companies that are not interested in making a profit.

Reading student surveys cannot be included in the eval, either. There's variability there so they can't be included. Which begs the question...under your paradigm, what CAN be included in the eval, or do you call for no changes at all to that aspect of public education?

Chgo Dad, would you please provide me an example of how I am continually twisting your words? I have zero interest in doing anything like that so if I did, it was inadvertent. Please tell me where I have done that in the few posts we've exchanged here. "Congruence with their methods" should read: agreement with their opinions. I have nothing to apologize for.

Finally, as to whether or not I am an "ordinary citizen" I say you're right. I am an EXTRAORINARY citizen! But seriously, I am a regular guy with a child in elementary school in CPS. I have no horse in the race other than my child's education and a love of the country I live in.

But, at least this ordinary citizen has the stones to use his real name. ;-)

Michael R Butz wrote 1 year 51 weeks ago

Anonymous (above my last

Anonymous (above my last post) - The MET study, referred to in the column we're discussing, shows that evals which include VAM did indeed show stability of teacher results year over year. I understand others suggest differently; this is just one more piece to the puzzle we're trying to solve.

And yes, SES correlates with scores for sure.

Chicago dad wrote 1 year 51 weeks ago

No agreeing to disagree for me.

When one of the "founding fathers" of VAM says what I have said, that VAM should not be used for individual, high stakes decisions on an individual teachers career, I tend to believe that. The Gates foundation has a ton of skin in the game when it comes to selling software and hardware, as well as their collaboration with Rupert Murdoch's company in storing and using student data. They are not disinterested observers who only want to help. Far more people in the field have affirmed the position I have embraced than not. None of them are fly by nights, and they all raise the same concerns. there remains no research based defense of VAM being used for individual teacher evaluations, including MET! Yes, that's right, if not willfully or ignorantly misinterpreted, that's the conclusion of MET concerning individual teacher evals.
When it comes to the Harvard study, that too has some big problems, one of which is that it is not actually a study of the validity of VAM for teacher evals!

All in all, there are no studies that do not acknowledge the points I've already made about VAM, that it is just not suitable for individual teacher evaluations. Speaking of those who are not fly by nights,

I will absolutely use the testing companies in my answer because that IS the answer. They answer to their shareholders before anyone else. They use their political clout to continue to push their snake oil on school systems in spite of all the evidence that it is actually snake oil. Teacher churn is a bad thing when done with the frequency and arbitrariness that VAM based firings will produce, since VAM creates failure where none actually exists as referenced here: What's the upside of firing great teachers based on flawed data analysis? Why is this collateral damage to the profession and to schools being thought of as acceptable? Results be damned, were getting rich!

Chicago dad wrote 1 year 51 weeks ago

MET gets 1st runner up (2nd place) for 2011 BUNKUM!

"And then there were those special cases that shone through as prime exemplars of incompetent science. It is these marvels of multi-colored packaging garnished with impressive-looking footnotes, charts and appendices – these advocacy pieces cloaked in research panoply – that we illuminate with a particular type of recognition of merit: our annual Bunkum Awards."

Chicago dad wrote 1 year 51 weeks ago

The evidence against VAM is overwhelming.
But, most importantly, these test scores largely reflect whom a teacher teaches, not how well they teach. In particular, teachers show lower gains when they have large numbers of new English-learners and students with disabilities than when they teach other students. This is true even when statistical methods are used to “control” for student characteristics.

For this reason, Chris Steinhauser, the superintendent in award-winning Long Beach, Calif., where schools have been nationally recognized for progress in closing the achievement gap, refuses to include state test scores in teacher evaluations. He points to one of the district’s expert veteran teachers, who routinely takes the highest-need 4th graders. Because she can move such students forward where others often cannot, they gain much more than they otherwise would. Meanwhile, other teachers who have easier classes can experience greater success, and everyone wins.
"These test scores largely reflect whom a teacher teaches, not how well they teach."

Penalizing such a teacher for taking on the toughest assignment does not make sense. Rather, Steinhauser has spread this model to other schools, allocating the best talent to the neediest students and supporting teacher collaboration.

Similarly, Singapore’s minister of education explained at last year’s International Teaching Summit that his country would never rank teachers by student test scores because doing so would create the wrong incentives and undermine collaboration, which is emphasized in Singapore’s schools and teacher-evaluation system. In fact, no country in the world evaluates its teachers based on annual test-score gains.

Anonymous wrote 1 year 51 weeks ago

I, Anonymous, hereby take it

I, Anonymous, hereby take it entirely upon myself to bestow upon Chicago dad the first-ever annual "Chicago Debunker of the Year Award", 2013, for his marshaling of extensive, reputable studies (with links) in the rapid-fire execution of a diligent assault on the reformy rhetoric of VAM.

Thanks for the brave effort, Chicago dad.

Michael R Butz wrote 1 year 51 weeks ago

"For this reason, Chris

"For this reason, Chris Steinhauser, the superintendent in award-winning Long Beach, Calif., where schools have been nationally recognized for progress in closing the achievement gap, refuses to include state test scores in teacher evaluations. He points to one of the district’s expert veteran teachers, who routinely takes the highest-need 4th graders. Because she can move such students forward where others often cannot, they gain much more than they otherwise would. Meanwhile, other teachers who have easier classes can experience greater success, and everyone wins. "These test scores largely reflect whom a teacher teaches, not how well they teach."

So you're okay with using test scores to identify teachers who can "move such students forward where others often cannot", but not to identify teachers who may need professional support to improve?
As long as it's always in the teacher's favor, it's cool. I see, thanks Chicago Dad for clarifying your feelings by using that example. Interesting.

Since Long Beach obviously has a way to figure out which teachers fit with which groups of children (by using test else did they determine which kids were being moved forward?), they are indeed to be commended. Even if it's not in the eval officially...if they are using data to make staffing decisions internally in the interest of the children they serve, good for them!

But not the greatest example to use when trying to convince someone student test score data has no place in evaluating the effectiveness of teachers.

Add your comment

The content of this field is kept private and will not be shown publicly.
go here for more