One Day talks with Elias Walsh, who has mastered the lingo of educational research so that it's not just understandable, but surprisingly useful.
February 20, 2020
Educators love to cite rigorous research when deciding what works for kids. But what makes a research study “rigorous”? What do experts actually look for when reviewing the validity of a study?
We asked Elias Walsh (Chi-NWI ’00), a senior researcher at Mathematica, if he was up to the challenge of explaining the basics of rigorous research in a way that a lay person—even a One Day writer who hasn’t cracked a math textbook since sneaking out of pre-calculus with a D—could understand. He graciously accepted.
Walsh’s work includes directing a team of Mathematica reviewers who evaluate research studies for the What Works Clearinghouse (WWC), created by a division of the U.S. Department of Education. One of the functions of the WWC is to evaluate the rigor of studies about education (more than 10,000 since its founding in 2002). Walsh walked us through four common types of research studies, what makes them rigorous (or not), and why that matters.
Randomized Control Trials
In virtually any rigorous research study, researchers seek to compare two groups: a treatment group, which receives the intervention being studied, and a control group, which does not.
In a randomized control trial (“the gold standard,” Walsh says) the treatment and control groups are assigned at random, creating groups that are initially similar to each other. A charter school lottery, for example, can offer ready-made randomization: It can create two groups of students made different only in that one group won slots at the school and the other didn’t. Any differences between the two groups at the end of a high-quality study can confidently be attributed to the intervention received by the treatment group.
Matched Groups Design
What about when true randomization isn’t an option? The most common alternative is matched groups design, where researchers attempt to “match” a treatment group with a suitably similar comparison group—for example comparing two schools with nearly identical demographics and outcomes. “When we have groups that look similar at the beginning, then we can be more confident that the difference in outcomes between the groups at the end of the study are due to the intervention,” Walsh says.
This approach is imperfect because “researchers don’t see everything” when trying to make suitable matches, Walsh says. The most common reason WWC will give a study a “does not meet standards” rating is because the researchers’ groups “were not established to be equivalent,” Walsh says. But, when done well, matching is a clever way to apply rigor to imperfect research conditions.
Regression Discontinuity Design
“This may be getting too into the weeds,” Walsh warns. Here’s his example of regression discontinuity design: Say a district is offering a school turnaround intervention to all schools performing below a certain cutoff point. Comparing all the schools below that cutoff with all the schools above it would not be a rigorous study, because the groups are too different. But by comparing outcomes for the schools just below the cutoff with the schools just above the cutoff, “you can still get very rigorous evidence,” Walsh says.
With no intervention, or with an ineffective intervention, you would expect the results of the schools just below and just above the cutoff to be “continuous.” If there’s a “discontinuity,” like if schools just below the cutoff show significant growth and schools above the cutoff do not, that’s rigorous evidence the intervention worked.
The approach has its limits. “You’re only looking at the schools right near the cutoff,” Walsh says, which may not tell you much about how the intervention would work for schools nearer the extremes.
How can researchers rigorously test interventions, such as those in special education settings, that are tailored to individual students instead of groups? Single-case design solves for that by using an individual student as both treatment and comparison group, directly comparing the student’s outcomes when an intervention is offered to outcomes when it is not.
Of course, it’s even harder to control for confounding variables when dealing with individuals instead of averages across groups. Imagine all the reasons a student’s performance might fluctuate in any given month: illness, stress, family issues, other programs at school, boredom. One type of single-case design, called a reversal/withdrawal design, overcomes this by repeatedly measuring outcomes over an extended period of time with the same student: no intervention, intervention, no intervention, intervention. “You have to do this multiple times before you can really say that this is the effect of the intervention and not something else that might be going on in a student’s home life or classroom,” Walsh says.
Walsh got into a career in research because as a high school math teacher, he suffered for a lack of it. “Looking back, was I an above average teacher? I have no idea,” he says. “I didn’t know what [rigorous evaluation] meant at the time, but that’s what I wanted,” he says. Today, he stresses the importance of using even the best research as tools, not dogma. “Evidence does come in a lot of different flavors. The WWC is focused on the most rigorous kinds of evidence, but that’s not to the exclusion of all the other sorts of evidence that educators might be working with,” Walsh says. In other words, good teachers use research and anecdotal evidence to know what’s working in their classrooms and what’s not. Walsh, who swims in numbers, encourages teachers to pay close attention to the actual humans they see in front of them each day, too.