Classroom A
(With cheating algorithm applied)
1. 112a4a342cb214d0001acd24a3a12dadbcb4a0000000
2. 1b2a34d4ac42d23b141acd24a3a12dadbcb4a2134141
3. db2abad1acbdda212b1acd24a3a12dadbcb400000000
4. d43a3a24acb1d32b412acd24a3a12dadbcb422143bc0
5. d43ab4d1ac3dd43421240d24a3a12dadbcb400000000
6. 1142340c2cbddadb4b1acd24a3a12dadbcb43d133bc4
7. dba2ba21ac3d2ad3c4c4cd40a3a12dadbcb400000000
8. 144a3adc4cbddadbcbc2c2cc43a12dadbcb4211ab343
9. 3b3ab4d14c3d2ad4cbcac1c003a12dadbcb4adb40000
10. d43aba3cacbddadbcbca42c2a3212dadbcb42344b3cb
11. 214ab4dc4cbdd31b1b2213c4ad412dadbcb4adb00000
12. 313a3ad1ac3d2a23431223c000012dadbcb400000000
13. d4aab2124cbddadbcb1a42cca3412dadbcb423134bc1
14. dbaab3dcacb1dadbc42ac2cc31012dadbcb4adb40000
15. db223a24acb11a3b24cacd12a241cdadbcb4adb4b300
16. d122ba2cacbd1a13211a2d02a2412d0dbcb4adb4b3c0
17. 1423b4d4a23d24131413234123a243a2413a21441343
18. db4abadcacb1dad3141ac212a3a1c3a144ba2db41b43
19. db2a33dcacbd32d313c21142323cc300000000000000
20. 1b33b4d4a2b1dadbc3ca22c000000000000000000000
21. d12443d43232d32323c213c22d2c23234c332db4b300
22. d4a2341cacbddad3142a2344a2ac23421c00adb4b3cb
Take a look at the answers in bold. Did fifteen out of twenty-two students somehow manage to reel off the same six consecutive correct answers (the d-a-d-b-c-b string) all by themselves?
There are at least four reasons this is unlikely. One: those questions, coming near the end of the test, were harder than the earlier questions. Two: these were mainly subpar students to begin with, few of whom got six consecutive right answers elsewhere on the test, making it all the more unlikely they would get right the same six hard questions. Three: up to this point in the test, the fifteen students’ answers were virtually uncorrelated. Four: three of the students (numbers 1, 9, and 12) left at least one answer blank before the suspicious string and then ended the test with another string of blanks. This suggests that a long, unbroken string of blank answers was broken not by the student but by the teacher.
There is another oddity about the suspicious answer string. On nine of the fifteen tests, the six correct answers are preceded by another identical string, 3-a-1-2, which includes three of four incorrect answers. And on all fifteen tests, the six correct answers are followed by the same incorrect answer, a 4. Why on earth would a cheating teacher go to the trouble of erasing a student’s test sheet and then fill in the wrong answer?
Perhaps she is merely being strategic. In case she is caught and hauled into the principal’s office, she could point to the wrong answers as proof that she didn’t cheat. Or perhaps—and this is a less charitable but just as likely answer—she doesn’t know the right answers herself. (With standardized tests, the teacher is typically not given an answer key.) If this is the case, then we have a pretty good clue as to why her students are in need of inflated grades in the first place: they have a bad teacher.
Another indication of teacher cheating in classroom A is the class’s overall performance. As sixth graders who were taking the test in the eighth month of the academic year, these students needed to achieve an average score of 6.8 to be considered up to national standards. (Fifth graders taking the test in the eighth month of the year needed to score 5.8, seventh graders 7.8, and so on.) The students in classroom A averaged 5.8 on their sixth-grade tests, which is a full grade level below where they should be. So plainly these are poor students. A year earlier, however, these students did even worse, averaging just 4.1 on their fifth-grade tests. Instead of improving by one full point between fifth and sixth grade, as would be expected, they improved by 1.7 points, nearly two grades’ worth. But this miraculous improvement was short-lived. When these sixth-grade students reached seventh grade, they averaged 5.5—more than two grade levels below standard and even worse than they did in sixth grade. Consider the erratic year-to-year scores of three particular students from classroom A:
5th GRADE SCORE / 6th GRADE SCORE / 7th GRADE SCORE
Student 3: 3.0 / 6.5 / 5.1
Student 6: 3.6 / 6.3 / 4.9
Student 14: 3.8 / 7.1 / 5.6
The three-year scores from classroom B, meanwhile, are also poor but at least indicate an honest effort: 4.2, 5.1, and 6.0. So an entire roomful of children in classroom A suddenly got very smart one year and very dim the next, or more likely, their sixth-grade teacher worked some magic with a no. 2 pencil.
There are two noteworthy points to be made about the children in classroom A, tangential to the cheating itself. The first is that they are obviously in terrible academic shape, which makes them the very children whom high-stakes testing is promoted as helping the most. The second point is that these students would be in for a terrible shock once they reached the seventh grade. All they knew was that they had been successfully promoted due to their test scores. (No child left behind, indeed.) They weren’t the ones who artificially jacked up their scores; they probably expected to do great in the seventh grade—and then they failed miserably. This may be the cruelest twist yet in high-stakes testing. A cheating teacher may tell herself that she is helping her students, but the fact is that she would appear far more concerned with helping herself.
An analysis of the entire Chicago data reveals evidence of teacher cheating in more than two hundred classrooms per year, roughly 5 percent of the total. This is a conservative estimate, since the algorithm was able to identify only the most egregious form of cheating—in which teachers systematically changed students’ answers—and not the many subtler ways a teacher might cheat. In a recent study among North Carolina schoolteachers, some 35 percent of the respondents said they had witnessed their colleagues cheating in some fashion, whether by giving students extra time, suggesting answers, or manually changing students’ answers.
What are the characteristics of a cheating teacher? The Chicago data show that male and female teachers are about equally prone to cheating. A cheating teacher tends to be younger and less qualified than average. She is also more likely to cheat after her incentives change. Because the Chicago data ran from 1993 to 2000, it bracketed the introduction of high-stakes testing in 1996. Sure enough, there was a pronounced spike in cheating in 1996. Nor was the cheating random. It was the teachers in the lowest-scoring classrooms who were most likely to cheat. It should also be noted that the $25,000 bonus for California teachers was eventually revoked, in part because of suspicions that too much of the money was going to cheaters.
Not every result of the Chicago cheating analysis was so dour. In addition to detecting cheaters, the algorithm could also identify the best teachers in the school system. A good teacher’s impact was nearly as distinctive as a cheater’s. Instead of getting random answers correct, her students would show real improvement on the easier types of questions they had previously missed, an indication of actual learning. And a good teacher’s students carried over all their gains into the next grade.
Most academic analyses of this sort tend to languish, unread, on a dusty library shelf. But in early 2002, the new CEO of the Chicago Public Schools, Arne Duncan, contacted the study’s authors. He didn’t want to protest or hush up their findings. Rather, he wanted to make sure that the teachers identified by the algorithm as cheaters were truly cheating—and then do something about it.