January/February 2005
Buy this issue
Lessons from the Red Sox Playbook
by Beth C. Gamse and Judith D. Singer1
"Believe!" "Keep the faith!" These have been the
refrains of Red Sox fans for nearly a century. Their perpetual optimism
in spite of decades of defeat echoes another refrain familiar to
educators: "All children can succeed; never give up!"
Now that the Sox have finally "reversed the curse,"
the lessons of their 2004 World Series victory will undoubtedly
generate new refrains. The New York Times' analysis, published
the day after they won, noted that the team's victory reflects "the
triumph of a new wave of thinking in baseball, one that has begun
to place increasing importance on the kind of intellectually ambitious
stewardship that stresses rigorous quantitative analysis over instinct
and whim."2 In other words, the Sox approached baseball
not only as an art, but as a science. We think this lesson has special
salience for educators. Like the pre-2004 Sox, many educators have
long resisted the kind of rigorous research and scientific analysis
that could identify the curricula and teaching strategies most likely
to help children succeed in school.
One week before the World Series, the Times'
Samuel Freedman asked why one well-intentioned school district adopted
a new mathematics curriculum-Investigations, developed by the highly
regarded education research organization TERC-that had never been
evaluated using a randomized trial (see "Randomized
Trials in Education"). According to TERC, Investigations has
been evaluated only through "classroom studies, large-scale comparisons
across schools, and small-scale comparisons between classrooms."3
Investigations has the imprimatur of the National Science Foundation
(NSF) and the National Council of Teachers of Mathematics (NCTM).
But these endorsements-which carry great weight in the marketplace
of instructional materials-are based on philosophical considerations,
such as pedagogy, rather than on the evidence of effectiveness (or
lack thereof) that comes only with randomized trials. In fact, some
critics suggest that the curriculum may contribute to the achievement
gap between white and minority students. In the absence of more
rigorous scientific research, the decision to adopt a curriculum
like Investigations is being made on what the Times sportswriters
would call "instinct and whim."
Over the past few years, the Institute of Education
Sciences (IES) at the U.S. Department of Education has funded dozens
of school-based randomized trials at the local and national levels.
IES is also sponsoring a national effort (the What Works Clearinghouse)
to survey the research literature and summarize the evidence on
multiple education-related topics, giving the greatest weight to
rigorous studies based on randomized trials. Many schools and districts,
however, have declined to participate in these trials, and many
in the larger education establishment have greeted them with profound
ambivalence. Why does the mere mention of scientific rigor produce
a level of animus among some educators as bitter as the Red Sox-Yankees
rivalry?
Costs and Benefits
Like many baseball coaches, many educators may simply
lack the skills to interpret data for themselves. Red Sox general
manager Theo Epstein makes critical decisions about his team only
after carefully analyzing relevant data about his players. To evaluate
a player's performance, for example, he calculates the player's
on-base percentage, factoring in both hits and walks, rather than
relying on simpler statistics like the batting average or the number
of runs batted in (RBIs). In other words, he incorporates multiple
indices to yield a more informative indicator. All educators can
surely appreciate indicators that capture more complexity than one
facet of performance alone. Yet many educators are not confident
in their ability to apply these kinds of indicators in everyday
decisionmaking. Or, they may be skeptical about the costs and benefits
of the research necessary to yield scientifically valid results.
What does it mean for a school or district to participate
in a randomized experiment? It means agreeing to try out a new program
and allowing individual students, classes, or even whole schools
to be randomly assigned to either the experimental program or the
control group (usually, the existing program). It means exposing
children in the experimental group to an untested program; conversely,
it means that children in the control group do not have access to
whatever benefits the new program might confer. It means carrying
out a good-faith effort to implement the new program and participating
in data collection to measure its effects-which means collecting
data from all participants, both those in the experimental program
and those in the control group.
It's true that the data-collection demands imposed
by participation in such a study can detract from valuable instruction.
For instance, more classroom time may be spent on student assessment.
And while assessment as part of a single study is unlikely to take
more than a few hours for any individual participant, there are
many large schools and districts in which numerous studies are under
way. The cumulative diminution of instructional time due to participation
in multiple (and presumably unrelated and unsynchronized) studies
may be greater. But wholesale adoption of an untested program can
lead to an even greater loss of instructional time-for instance,
if it doesn't work equally well for all students, requires more
professional development than anticipated, or doesn't segue smoothly
from prior years' instruction.
The Ethics of Experimentation
Are education experiments ethical? Many educators
assert that they are not. Some argue that random assignment unfairly
deprives one group of children of new approaches or interventions
that are potentially helpful, or, conversely, subjects children
to experimental approaches whose efficacy is as yet undetermined.
The counterargument is that a randomized trial is the fairest test
of a program's efficacy. Only this information can ensure that all
children have access to the very best educational practices.
A particularly vexing issue is the tradeoff between
present costs and future benefits, between the short-term consequences
for current students and the long-term consequences for those who
follow. Decisionmakers who decline to participate in studies often
cite "concern for the children." We share this concern, but we believe
that future cohorts are equally important. When Epstein and his
colleagues analyzed data about on-base percentages and fielding
contributions, they decided to trade the superstar shortstop Nomar
Garciaparra. The short-term consequences-for the team's esprit de
corps and the fans-were ominous. But history has shown that Epstein
was right to take the long view.
Finally, still other critics assert that the interactive
nature of teaching precludes it from being a "treatment" that can
be randomly assigned. Following this argument to its conclusion,
education, as a broad field of human interaction, is not an appropriate
arena for experiments. We take heart in knowing that identical arguments
were made when large-scale clinical trials were introduced in medicine
after World War II-a time when medicine was seen as more an art
than a science, much as education is today. Yet few among us today
have not benefited from such trials, whether the lessons were positive
(e.g., the benefits of an aspirin a day to reduce heart-attack risk)
or negative (e.g., the increased risk of cancer associated with
long-term hormone-replacement therapy).
The "new wave of thinking" we advocate will not
come easily. And, like any change, it will require education. There
is fierce competition for schools' dollars from various publishers,
curriculum developers, and professional development providers. Educators
need the skills to recognize, and demand, credible evidence about
program effectiveness. We believe that all educators-from those
standing in front of a roomful of students to those leading state
educational agencies-should be able to participate in and use research
effectively: to distinguish between random sampling and random assignment,
differentiate between credible evidence and anecdotal claims, and
apply scientific conclusions for the benefit of their respective
"teams." If it worked for the Red Sox, it might just work for us.
Beth C. Gamse is a senior associate in Abt
Associates' Education and Family Support Area. She is currently
directing the Reading First Impact Study for the Institute of Education
Sciences at the U.S. Department of Education. Judith
D. Singer is the James Bryant Conant Professor of Education
at the Harvard Graduate School of Education. She specializes in
quantitative research design and statistical analysis.
Notes
1. The order of the authors was
determined by randomization.
2. See Ginia Bellafante. "New-Age
General Manager Ends an Age-Old Curse." New York Times, October
28, 2004, p. D4.
3. See TERC web-page, "Investigations
in Number, Data, and Space."
For Further Information
B. McGrath. "The
Professor of Baseball: Can the Master of Statistics Help the Red
Sox Beat the Yankees?" New Yorker, July 14, 2003.
U.S. Department of Education, Institute of Education
Sciences, National Center for Education Evaluation and Regional
Assistance. Identifying
and Implementing Educational Practices Supported by Rigorous Evidence:
A User Friendly Guide. Washington, DC: Author, 2003.
U.S. Department of Education, Institute of Education
Sciences, National Center for Education Evaluation and Regional
Assistance. Random
Assignment in Program Evaluation and Intervention Research: Questions
and Answers. Washington, DC: Author, 2003.
|