Are High-Stakes Tests Harming NYC Schools?


It was not Mayor Bloomberg’s proudest moment. Last month, the federal government released New York City schools’ rankings on the National Assessment of Educational Progress (NAEP) math tests for 2009—and their scores had flatlined, even as scores on the state Regents exams continued to rise. “Don’t trust the Regents,” shouted a Post editorial headline, saying that the NAEP gap had revealed New York State’s testing regimen to be “a pathetic joke.”

It seemed like yet another Albany scandal, to go along with Client 9 and state legislators locking each other out of the Senate chambers. Yet according to a growing chorus of parents, educators—and, quietly, school administrators—the test-score brouhaha is just a symptom of a deeper problem with roots in Washington and City Hall. The advent of the No Child Left Behind Act, they say, coupled with the test-score-based school Progress Reports that Mayor Bloomberg introduced in their wake, have led to a rash of undesired consequences: curricula overrun by test prep; dumbed-down tests that ask questions designed for younger grade levels; and widespread pressure on both schools and government officials to fudge their numbers—by outright cheating, if necessary.

No Child Left Behind “opened up a Pandora’s box here in New York, where Mayor Bloomberg and the DOE just took it and ran,” says Martha Foote of the statewide coalition Time Out From Testing. The result, she and others charge, is the worst of both worlds: a school system obsessed with test scores that are increasingly meaningless.

The name “No Child Left Behind” (NCLB) was meant literally: By the year 2014, every child in U.S. public schools was supposed to be “proficient” in both math and reading for their grade level. To achieve this ambitious goal, NCLB introduced an alphabet soup of new standards. Schools failing to meet the thresholds for Adequate Yearly Progress (AYP) for two or more years would be tagged SINI (Schools In Need of Improvement), and threatened with escalating sanctions, including having to pay to bus kids to non-failing schools in the same district, adding test prep courses, revamping curriculums, and, ultimately, the death penalty: being taken over by the state, or shut down entirely.

If the fear of failure was supposed to scare schools straight, it hasn’t worked out too well. By 2008, about 40 percent of schools nationwide had landed on the Needs Improvement list, with about half of those being listed at least two years running and, as a result, facing sanctions. These numbers are likely to worsen in coming years, thanks in part to states choosing to set low standards in the early years of NCLB before ramping them up later—a dodge that Monty Neill, director of the Boston-based group FairTest, calls the “balloon payment approach.”

“There is just zero evidence that you’re going to see anything other than three-quarters to 100 percent of schools not make it by 2014,” says Neill.

In New York City, the looming NCLB crisis has drawn little notice. In part, that’s because schools’ concerns about landing on the Needs Improvement list have been superseded by fear of running afoul of Bloomberg’s own “accountability” initiative: the school Progress Reports that assign a letter grade of A through F to every public school—based mostly on state test scores—with principals who earn insufficient marks facing dismissal or even having their schools closed.

“For the first time over the past three years, the Department of Education has set up expectations for schools around what students should learn,” says acting city schools accountability officer Shael Polakow-Suransky. “Principals were held accountable for all kinds of things—number of fire drills they did, number of times they got in the newspaper—but not for how much their kids were learning.”

That only works, though, if the test scores are a reliable gauge of learning. One of the oddities of NCLB is that while it set up national standards for how many kids needed to be proficient, it left it up to each state to determine what “proficiency” means. States not only design their own tests, but also set their own “cut scores,” the minimum necessary to earn a passing grade under NCLB. The result has been a mishmash of testing regimens: Alabama, which lowered its standards, leapt to fifth in the nation in NCLB compliance, while Massachusetts, which kept its tough tests, fell to near the bottom.

New York State pegs NCLB scores to its already existing fourth-grade, eighth-grade, and high school Regents tests, using a fiendishly complex scoring system that involves a weighted average of students scoring at “basic” (2 on a four-point scale) and those judged “proficient” (3 or 4). Since 2003, state scores have risen, with a notable leap last year that resulted in only 297 city schools landing on the Needs Improvement list, down from 409 in 2008.

Whether this is a sign of improving performance or grade inflation is a matter of intense debate. Steve Koss, a longtime city parent and educator, notes that a decade ago, a 65 on the ninth-grade math Regents meant that a student had answered 65 out of 100 questions correctly. Today, students only need to get 30 questions right out of 87 to garner a 65. And the questions, he says, are easier: High-schoolers in 2008 were asked to calculate the percentage discount $15 represents from $18, typically considered middle-school level knowledge.

Koss offers an explanation for what’s happening: Campbell’s Law. This principle, first described in 1976 by sociologist Donald Campbell, is probably best summed up this way: The more a test affects decision making, the more likely it will lead to corruption. In particular, wrote Campbell, “when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.”

As evidence of Campbell’s Law in action, Koss points to an audit released by the state comptroller in November, in which a group of independent teachers and education officials were asked to re-score thousands of randomly selected high school exams. On 80 percent of students’ papers, the audit found, the re-scorers gave the students a lower grade. Taken in concert with the subsequent National Assessment of Educational Progress (NAEP) math scores—which revealed that, for example, while 85 percent of fourth-graders met state standards, only 35 percent were judged proficient on the NAEP—it was enough to have Koss and other school critics crying foul.

City Department of Education spokesperson David Cantor says that the city would welcome higher state standards, but insists that the disparate test results aren’t a problem. “Critics have wanted to say that the NAEP and the state tests are contradictory—they’re not,” he says. “The reality is the NAEP test is harder. Frankly, the NAEP test tests a significant amount of material that we do not test until higher grades.” DOE officials stress that city NAEP math scores have risen between 7 and 11 percent since 2003, which is better than statewide or national trends.

Yet critics say the city’s own measures of school success are flawed as well. The Progress Reports are supposed to improve on NCLB’s fixed thresholds by crediting schools for year-to-year improvement. But when Columbia Teachers College professor Aaron Pallas and education blogger Jennifer Jennings examined elementary and middle-school improvement scores in the city Progress Reports, they found almost no correlation between which schools scored well year to year. They wrote in the new anthology NYC Schools Under Bloomberg and Klein: “You could actually do better randomly picking schools out of a hat to identify those that would receive high scores for student progress, than by relying on last year’s reports as a predictor.”

Polakow-Suransky says the DOE plans to tweak the lower-grade Progress Reports to make them more reliable. But Pallas says the city should have seen it coming, since many testing experts advised the city that three-year scores would be a more reliable measure, but that was deemed insufficient for holding principals’ feet to the fire on an annual basis. “The problem for the city is they’ve chosen—especially for the elementary and middle schools—to use measures that they know are going to be unstable from one year to the next.”

As for signs that the state tests have been watered down, Pallas says, “I suspect that some people in the DOE know exactly what the numbers are saying. But they have to treat them as legitimate because otherwise they’re undermining their own accountability system.”

At the same time, many parents and education experts are warning that the effects of high-stakes testing (the term of art for tests where heads roll if scores are low) can cut deep within the day-to-day working of city schools.

First, there are the intimations that schools have been encouraged to cheat to earn higher scores. In 2005, P.S.33 in the Bronx recorded a 50-point jump on the fourth-grade English test, landing its principal a $15,000 bonus under the city’s new performance incentives; she promptly retired and, because her pension was based on her final-year salary, earned a $12,000-a-year boost in pension. By 2008, the gains had evaporated, and the school had landed on the Needs Improvement list.

Three years later, former students at P.S.48 in the South Bronx—whose principal, John Hughes, earned a Times profile for overseeing a huge jump in test scores—alleged that teachers had fed them answers to Regents test questions. The DOE subsequently launched an investigation into that school as well as nearby M.S.201, where Hughes had moved on to and which had seen a similar jump in scores. (The DOE says its investigation is still ongoing, and expects to issue findings in the next few weeks.)

Many principals, meanwhile, gripe that the incessant focus on test scores is warping schools’ approach to teaching. (As is de rigueur in Klein’s DOE, none would talk without being granted anonymity.) One outer-borough principal—who didn’t even want his borough named—fumes, “There’s no critical thinking. There’s no literature. They’re telling them what passages to read. That’s not education.”

Principals also say the pass/fail nature of the AYPs—in which, unlike the city Progress Reports, all that matters is how many kids score a 3 or better, not their overall average—encourage them to devote more time and resources to “bubble kids” who can, with a little help, be bumped up from a 2 to a 3.

A Brooklyn intermediate principal agrees that teaching to the test is rampant, though he insists his school avoids it—mostly. Around one-quarter of his students “hover around the border” between a 2 and a 3, he says. “We make sure those kids, when they’re doing their homework and writing assignments, they’re being attentive to the things they have to. But we do it through the units they’re learning as opposed to just pulling out test prep books.”

He adds that he has shared his methods with other principals, but hasn’t seen them catch on. “They’re such in a panic mode that they’re going to get a D or an F that they’ll do anything.”

One such “anything” is test prep. Numerous city schools—the city says it doesn’t keep specific records—have hired Kaplan and other private firms to coach their students. Another tactic, allegedly, is avoiding low test scores by keeping out low-scoring students: Foote says school staffers have told her of high school students whose transfer requests were denied because they hadn’t passed the Regents English test. (Asked about this, city schools spokesperson Danny Kanner said that it would be a violation of department policy.)

Polakow-Suransky says the DOE tries to discourage what he calls “knee-jerk test prep culture” through its quality reviews, in which experts do site visits to schools and provide feedback. “There are always going to be cases of weak leaders or weak teachers that go to what they see as the shortest path to a solution,” he says. “I don’t think that’s the dominant response in our schools.”

Yet Brooklyn parent Alla Valente worries that the test emphasis itself is affecting students’ school experience. Her two children, she says, “are very different test takers. I have one child who goes into this dead calm, and actually scores better on standardized test than on her class tests. I have another child who gets so nervous that he rushes through it just to be done. Because there’s so much emotion that goes into taking these tests, I don’t think the test is an accurate assessment of their level.”

More than that, though, Valente says the testing culture has changed the atmosphere at city schools. “My youngest, from when he was in kindergarten, knew the difference between a 2 and a 3 and a 4,” she says. “When schools are being judged under such scrutiny, that trickles down to the teachers, and that, obviously, trickles down to the students. I don’t understand why the city’s desire to do all these analytics has to translate into so much anxiety for my children.”

No Child Left Behind is up for reauthorization in Congress this year, and most educators are counting on its more draconian requirements being eased. “They’re just going to waive the requirements or give an extension,” predicts the outer-borough principal. “They’re raising the bar so high that no one in society has ever accomplished it.” Yet the same arguments were made in 2007, when NCLB was originally set to expire. Instead, Congress extended the existing law. “It was like a third rail—everybody in Congress was afraid to touch it,” says Neill, who rates the odds of the law being revised this year at no better than 50/50.

In fact, testing critics note with alarm that while Obama’s education secretary, Arne Duncan, has criticized NCLB for starting a “race to the bottom” for states to lower standards, his own new policies feature even more emphasis on high-stakes testing. To be eligible for Duncan’s new Race to the Top grants, for example, states must allow teacher evaluation and pay to be tied to student test scores. “It’s not NCLB anymore—it’s NCLB on steroids,” says Class Size Matters director Leonie Haimson. “This is something that was never contemplated under George Bush, and yet the Obama Administration is moving ahead with it.”

As for New York, the widespread assumption is that the state is going to have to toughen up its tests—or at least its scoring—in response to the NAEP controversy. But that would almost certainly result in plunging test scores, even as the NCLB requirements jump up another notch.

City schools officials, meanwhile, are keeping their heads down. “We have seen tremendous gains on state tests,” says the DOE’s Kanner. Will that be enough to keep up with the tougher requirements? “We’re on a good trajectory. But 2014 is a long way away.”