As part of the ongoing debate regarding the proposed expansion of Jorge Elorza’s Achievement First (AF) mayoral academies in Providence, one crucial question is the expansion’s fiscal impact on the Providence Public School District (PPSD) and the City of Providence budget. In the 2016 legislative session, Rhode Island charter law was revised to require…
…the council on elementary and secondary education (to) place substantial weight on the fiscal impact on the city or town, programmatic impact on the sending school district, and the educational impact on the students in the district to ensure that the proposal is economically prudent for the city or town, and academically prudent for the proposed sending school district and for all students in the sending district…
The implementation of this complex estimation is up to the Rhode Island Department of Education (RIDE), which mostly outsourced the task to the Rhode Island Innovative Policy Lab (RIIPL) at Brown University, “a collaboration between Brown University and the Office of the Governor in Rhode Island to develop evidence-based policy” funded by the foundation of hedge fund billionaire John Arnold.
The RIIPL memo takes an estimate of test-score gains by future AF students, runs that through the findings of an influential, and controversial, 2014 paper by by Chetty, Friedman, & Rockoff (CFR) that found a correlation between test score gains and long-term income growth and other measures, and finally calculates an estimate of the long term potential income and educational gains of students attending AF. That’s pretty much the whole analysis.
Since the financial impact memo was presented to the Board of Education and the public on December 6th, it has been publicly criticized on several points. First, whether or not it actually fulfills the legal mandate to measure if “the proposal is economically prudent for the city or town, and academically prudent for the proposed sending school district.” Other public figures including URI Professor of Economics Leonard Lardaro, have questioned the plausibility of such a seemingly precise estimate of future labor market return looking forward over a span of fifty years or more. In this post I will further question the reliability of the test score data that RIIPL used to calculate their impressive predictions of income gain and benefits.
Even if one accepts all of RIDE and RIIPL’s method and rationale for its fiscal analysis, the bottom line is that that all their calculations are derived from a single, early, clearly idiosyncratic study that overstates the positive achievement effects of AF compared to any other study I have subsequently found.
RIIPL’s memo states that “estimates (of the schools’ value-added towards achievement) exist in the literature for other AF schools in northeastern states.” This is true, but RIIPL only uses data from on a single study by Hastings, Neilson, and Zimmerman (2012) (which I will refer to as the New Haven Study) of AF schools in New Haven, Connecticut from four years, 2005-2006 through 2008-2009. Justine Hastings is an author of both the RIIPL memo and the New Haven study and the director of RIIPL.
2005-2009 was still early days for Achievement First. It had only just begun the process of expanding from its initial flagship middle school, Amistad Academy in New Haven, into a multi-state network. Thus this study does not, as one might assume from the fiscal impact memo, include at data from multiple cities or states or across all grade levels.
Based on the New Haven Study, the RIIPL memo states that “the test scores of children winning the lottery to attend Achievement First increase by 0.346 student level standard deviations.” In all further calculations, the memo assumes that AF students will benefit from 0.346 standard deviations of test-score growth compared to their peers in the PPSD, in each year from kindergarten through 12th grade. So this is an important number.
One yellow flag in the memo is the use of one combined number for the value-added in both English/Language Arts (ELA) and mathematics. This is unusual for this type of study or report. American standardized testing, especially post-No Child Left Behind, really only produces two widely comparable numbers, ELA and math, and they are almost always reported and handled separately, including in CFR’s calculations, which found different long-term benefits for ELA and math achievement gains. Thus basing projections on a combined value is less accurate than regarding them separately. So why would RIIPR combine them?
In short, because at the subject level, the value-added data in the New Haven study does not align with any other quantitative or qualitative analysis of Achievement First and similar schools, nor does it make sense as the basis of a 13-year growth projection.
Here is the relevant table from the New Haven Study:
Or, in simpler form, focusing on the relevant numbers:
That is, the New Haven Study on which RIDE and RIIPL’s impact analysis is entirely based, found that Achievement First schools had a slightly negative impact on student achievement, -0.092, in yellow above. Even as an AF skeptic, I don’t believe that this is an accurate representation of AF schools as a whole. If RIDE actually believes the data they have submitted, they are supporting the expansion of a school system that they have found will reduce student achievement in math compared to current PPSD neighborhood schools.
Looking at this math data another way, one might point out that the 0.092 value-loss in these tables is not considered statistically significant. One might then ask why RIDE and RIIPL are basing their fiscal impact analysis on a study where there is no statistically significant data for math. It is not like there are no other similar, more authoritative studies they could use.
The Connecticut test score data used in the New Haven Study reports reading and writing separately, with reading coming out with an impressive but plausible +0.346 value-add, and writing with an extraordinary value-add of +0.785. It is very difficult to explain why these two scores have diverged so widely, and in particular it seems wildly implausible to suggest, as RIIPL implicitly does, that this gap in reading and writing growth could and would be sustained over the entire 13 year K-12 span. As an English teacher I cannot imagine how one would systematically produce thousands of students who grow as writers so much faster than as readers.
RIIPL smooths out the incongruously low math score and the extraordinarily high writing score by averaging them together with the reading score, each with equal weight, into the combined score of +0.346 cited in the fiscal impact memo. Unfortunately there is no standard method for combining math and ELA achievement, nor one for deriving an ELA score from reading and writing scores, but weighing writing equally with mathematics is essentially unheard of, and seems clearly intended to boost the combined growth score.
For comparison, here are some other studies of achievement gains AF and similar charters:
- A meta-analysis of the effect on student achievement of No Excuses charter schools, of which AF is a prominent example, by Cheng, Hitt, Kisida and Mills (2015) found annual gains of 0.16 in English/Language Arts and 0.25 in math, for a combined average of 0.205, or 59% of RIIPL’s prediction.
- The New York City Charter Schools Evaluation Project’s 2009 report by Hoxby, Murarka and Kang found an average gain of 0.06 in English and 0.09 in math, from a sample including AF schools and including many No Excuses schools, for a combined effect of about a fifth of RIIPL’s prediction.
- A 2010 study of Student Achievement in New York City Middle Schools Affiliated with Achievement First and Uncommon Schools by Teh, McCullough and Gill for Mathematica found strong results with a small sample size in the early years of those schools (see the table below) that were nonetheless very different than the numbers from New Haven by subject and contradict the basic assumption of the RIIPL memo that score gains will be consistent from year to year.
I would also note that even AF itself found reason to doubt the predictive value of Connecticut’s middle school testing program — which made up of more than half the achievement data in the New Haven Study — during the time period of the study. In the fall of 2008, The New Haven Register reported:
Last year, 19 of the 56 students in Amistad’s ninth grade—the same cohort that scored so well on the eighth-grade tests in 2007—failed at least one course. And 11 students, 20 percent of the grade, failed two courses, meaning that they did not move on to the tenth grade.
Even if RIDE and RIIPL are basing their projections on an inflated projection of growth, it does not necessarily mean no test score benefit to students attending a school like AF. But if we are talking about an effect size closer to 0.15 than 0.35, then there are many much lower cost in-district interventions which can achieve similar results while doing far less harm, fiscal and otherwise, to the PPSD and the City of Providence.
In particular, a recent paper in The Quarterly Journal of Economics found that “for poor children, a 10 percent increase in per-pupil spending each year of elementary and secondary school was associated with wages that were nearly 10 percent higher, a drop in the incidence of adult poverty and roughly six additional months of schooling.” And, as David L. Kirp pointed out in The New York Times:
A 2011 study by the Berkeley public policy professor Rucker C. Johnson concludes that black youths who spent five years in desegregated schools have earned 25 percent more than those who never had that opportunity.
If Commissioner Wagner needs a different plan — truly different — than what Rhode Island has been doing for the past 25 years, those would be good places to start.