“Studies Have Shown” Is Not Enough
We’re right to be skeptical of trendy research. It’s the successful application of that research that deserves our buy-in
By Alex Small
We’ve all encountered enthusiastic people who jump onto bandwagons of promising new ideas and harangue everyone else to do likewise. It could be a relative who learned a new productivity trick or a neighbor trying a new diet. They swear that it’s not just a fad, that Studies Have Shown it really works! I’m a physics professor, so I encounter evangelists for new teaching techniques. Speakers promise quick fixes for the hard problem of teaching underprepared students, and as soon as they add “Studies Have Shown” that this technique is especially good for students from marginalized groups, some professors are as hooked as your cousin who just discovered keto.
Personally, I’m more skeptical. Learning always takes time and effort, the opposite of a quick and easy trick. But not being a social scientist, I am not necessarily equipped to offer informed rebuttals. What I need is an intellectually honest way to filter out enthusiastic reports that sound too good to be true. It may well be true that a roomful of students performed well when teachers tried a new trick, but how do we know that it will work reliably in other situations?
Do We Really Need an Intervention?
Consider the voguish educational concept of “growth mindset interventions,” which purport to improve academic performance by instilling a belief that effort matters more than innate talent. On one level, this notion is surely helpful: Talent only matters when used diligently, and confidence can sustain effort during hard trials. However, science and engineering require foundational mathematical skills developed over years of effort, and many students spent their K-12 years in shamefully inequitable schools that never pushed them to develop such skills. It strains credulity when education reformers proclaim that we can close achievement gaps by adding a few supplemental readings or activities to introductory classes. Drastic improvement generally arises from serious effort enabled by years of character-building experiences, not a couple of motivational reading assignments.
My skepticism rests not just on personal experience, but also divides in the published research on mindset interventions. A recent review of 67 published studies examined not only the research findings but also factors such as the composition of the control groups, whether the researchers followed the increasingly common practice of preregistering their studies (to avoid publication bias), and whether the researchers had financial conflicts of interest. Their literature review concludes that “apparent effects of growth mindset interventions on academic achievement are likely attributable to inadequate study design, reporting flaws, and bias.” However, the very same journal published a separate, contemporaneous review of 53 studies, which found that mindset interventions can indeed produce positive educational outcomes.
I might ask friendly social scientists which article to trust, but my friends have varying perspectives. Venturing beyond my immediate circle to seek the majority viewpoint in the field would seem reasonable, but dueling literature reviews suggest a lack of consensus in the wider body of expert writings. Indeed, these conflicting reviews faced the same editorial standards, examined the same body of research, and still diverged like a Rorschach test.
Nor is the solution waiting for experts to converge on a dilemma-free path. I must choose between eschewing new practices absent proven benefits, or embracing apparently reasonable practices absent proven harms. Much hinges on how reasonable these practices seem. If many students are likely to make significant changes in response to helpful articles, then mindset interventions are likely beneficial. Alternately, if habits are hard to change, at least for adults with the freedom to use or abuse their spare time, then the best use of class time might be to review material on which underprepared students are still shaky, rather than to give glorified pep talks.
Subjectivity Is Inescapable
Framing mindset interventions as “glorified pep talks” highlights a broader issue: Do we respond to “Studies have shown…” with enthusiastic adoption of the recommendations, or with skeptical questions? I clearly fall into the second camp, and it is similarly obvious that many professionals (or at least the administrators sending them to workshops) are in the first camp. The research literature, alas, is divided on how students respond to recommendations. (I am not aware of any surveys asking parents and teachers whether 18-year-olds listen to recommendations, but we might hazard an educated guess on what such studies would show…)
It is not unusual that expert advice and the responses thereto depend as much on subjective preferences as the state of the evidence. The experience of the COVID pandemic—and the government’s response to it—demonstrated how a recommendation that “you ought to do X” reflects the recommender’s values as much as their factual knowledge. Epidemiologists recommended lockdowns based on both empirical facts about disease severity and a value judgment that the dangers of illness outweighed the harms of social isolation. However, a person with the same empirical data might make the values-driven determination that precious time with loved ones is worth a risk. Many people made exactly that determination, and the political and ideological polarization in the public’s compliance (and noncompliance) illustrates that the question turned as much on values as data.
Still, values do not render facts irrelevant. Prerogatives and responsibilities go hand-in-hand. I might reasonably follow my personal preferences when the demonstrated benefits of a teaching method are minimal, but overcome reluctance when the proven benefits are significant. The key question is how a reasonable non-expert should evaluate contradictory evidence. It’s one thing to disregard the occasional conflicting study, but it’s another to shrug off large-scale analyses conducted by experts.
I propose resolving disputes in the social science literature in the same way that we solve so many other practical problems: Ignore the headlines, let experts focus on social science, and wait on social engineering. Yes, really.
Social Engineering For the Win
“Social engineering” has bad connotations because it is often used to describe top-down efforts to control societies. However, there’s nothing inherently authoritarian about engineering. Engineers are just people who try to combine science and experience to make things work reliably. Scientists gather information and develop theories to understand patterns in the natural world, while engineers apply this knowledge. Scientists can tell you how strong a particular metal is, and even invoke atomic forces to explain why. Engineers consider the full body of knowledge about metals, whether detailed scientific theories or practical experience, and select one to build a bridge out of based on a combination of strength, corrosion resistance, cost and numerous other factors.
Science would be useless without engineers. Our journals are full of ideas that might work. There is no shortage of laboratory work on killing tumor cells or converting sunlight to electrical power, but it is much harder to reliably stop tumors in human beings, or produce large amounts of affordable electricity in real-world conditions. Figuring out which interesting ideas will yield reliable devices is the job of engineers.
Treating social science with due respect means applying it in the same way that physics is applied: with an engineer’s enthusiasm-dampening mindset, grounded in practical experience. Indeed, at least one research article by noted mindset researcher David Yeager calls for a profession of “psychological engineers” who develop and implement robust interventions rooted in psychological research. One need not share his optimism about such interventions to appreciate and applaud his distinction between science and engineering.
Sadly, even many natural scientists fail to grasp this distinction when the application involves a social science field. A workshop presenter need only say “Studies Have Shown…,” and excitable personalities will rush to adopt. A second round of studies soon follows, showing that enthusiastic early adopters have gotten great results in the classroom. This should be no surprise to anyone familiar with the placebo effect—or its close cousin, the Hawthorne effect: Novel stimuli elicit change. Instructors overflowing with enthusiasm for something new generally work harder on their lessons, try to get students similarly fired up, and pay more attention to their tasks. This is just another way of saying that they’re teaching well, irrespective of the merits of the particular new trick.
It is, of course, possible that they are also fastidious practitioners of a genuinely effective new technique. That is laudable if true. However, we shouldn’t be adopting teaching techniques that only work when the most optimistic and caffeinated people on campus work overtime. Instead, we need methods that will work even when professors have a thick stack of papers to grade, onerous committee assignments to attend to, and a backlog of research projects to finish. Laziness and overwork lead to a common goal: improvement with minimal added burden.
Engineers designing products for the general public understand this: They need to make things easier for users, not more difficult. Nobody outside of auto racing uses vehicles that require constant service from a team of roadside mechanics. Even surgeons—highly skilled professionals working in sterile, controlled rooms—need devices that work in the bloody, messy conditions of an emergency surgery and can work again with the next patient, after cleaning by the same on-staff technicians who also handle other equipment. In a similar vein, cash-strapped colleges with decidedly non-ideal conditions should not hang their hopes on teaching techniques that require true-believer buy-in and intravenous caffeine.
Close examination of the dueling surveys of mindset interventions supports my point about the difference between early adopters and mass-market users. A third analysis noted that the more positive of the two studies explicitly considered heterogeneity, seeking particular situations where the interventions might be effective, and taking into account how carefully researchers tried to replicate the exact conditions of the original positive results. The optimistic take is that interventions work when very carefully targeted and replicated. The pessimistic take is that at best these interventions work under strictly controlled conditions, and at worst only in a cherry-picked subset of cases.
While we wait for statisticians to ascertain whether the positive results are real and robust, real but fragile, or figments of mere cherry-picking, the bandwagon for mindset interventions and other psychological quick fixes will have abundant riders. The quickness of the promised fixes is only part of their allure. Another key element is a modern belief that humans are easy to manipulate.
If unconscious biases explain major social problems, if modest nudges can fruitfully guide behavior, then why wouldn’t simple interventions suffice to erase academic achievement gaps rooted in a lifetime of inequitable resources and preparation? After all, a presentation on trendy research was enough for some professors to believe that deep problems are solvable, so why wouldn’t pep talks elicit Herculean efforts from underprepared teenagers living without parental supervision for the first time in their lives?
Additionally, modern culture frames many social problems in terms of mental health, and health is one place where even cynics generally defer to experts. (Who would want an amateur to treat heart disease?) So we hear that the experts have studied the educational fix du jour and the Studies Have Shown (supposedly) that some interventions work. If we’d take a heart medication prescribed in accordance with what medical Studies Have Shown, why not use a lesson plan designed according to scientific studies? It’s a compelling notion, but it hinges on whether the studies have actually shown what is alleged.
This goes back to my rule of thumb: Scientific studies, when well-designed, can determine whether an effect occurs, but engineers determine whether a tool based on that effect will work reliably in non-ideal conditions. I need more than a few studies under laboratory conditions before I revisit a practice. I want data from multiple independent experts and examples of consistent, in-the-trenches successes. Until such evidence is available, a considered indifference to what social science Studies Have Shown is an intellectually defensible and even honest stance.