I’m sorry, can you repeat that?
No? Then why the hell is it published in a scientific journal?
Unbeknownst to many, science – especially social sciences and psychology – is in the midst of a replication crisis, a wide-ranging inability to successfully repeat the findings of previously published papers. In fact, right now, around seventy percent of scientific studies are failing when attempts are made to replicate their findings. That’s pretty horrifying when you consider how much science and the fallacious consensus argument are cited today by politicians making some of the biggest fundamental changes to society in history.
More scrutinizing analyses of many highly influential studies are discovering cracks in science’s foundation and flaws in its practices, proving only that we can’t always trust what we read… even in prominent science journals.
10 It’s a Really Widespread Problem
In 2018, a coalition of social scientists that included psychologists and economists attempted to replicate 21 findings published in two prestigious journals: “Nature” and “Science”. Included in this review were highly influential, widely reported studies, such as a 2011 dissertation on whether access to search engines hinders human memory, and a report regarding the impact of reading on children’s ability to understand different viewpoints, also known as “theory of mind.”
Hindsight being 20/20, the scientists conducting the replications made the retests more rigorous than the original case studies. To ensure total transparency, they preregistered their study and analysis – a safeguard against any researcher trying to save face by letting conclusions partially rewrite their initial intent. In some cases, the number of participants also was increased as much as fivefold, a quantity-equals-quality tactic that diminishes the likelihood of circumstantial evidence becoming scientific canon.
Alarmingly, only 13 of the 21 studies replicated under the increased scrutiny. That’s one success shy of 2/3, which itself wouldn’t be anywhere near acceptable for studies published in such well-regarded, supposedly discerning science publications. The result is clear: junk science is becoming accepted truth.
9 Even Many Successful Studies Are Now Suspect
Imagine two students getting midterm marks back. One gets an A, the other a D. Both passed, but there’s a hell of a difference between those two grades.
Nearly as worrisome as the inability to replicate certain studies is that, even among those technically passing muster, the effect sizes – the difference between the group participating in an active experiment and those sidelined as a generic benchmark (called a “control group”) – often drops precipitously under more scrutinizing conditions. Such was the case with many of the 13 studies replicated from the 21 referenced in the preceding entry. A few decreased by about half – a concerningly high figure indicating that the original findings likely overstated the power of the experimental manipulation.
A hypothetical example: Suppose one could scientifically conclude that an apple a day does, indeed, keep the doctor away. The logical next question is “Sure, but by how much?” Many studies are being espoused as clear-cut findings when, in reality, they are barely making the grade. In science, the extent of cause and effect matters enormously.
Researchers conducting the aforementioned replication studies admitted as much, pronouncing systematic bias in published findings “partly due to false positives and partly due to the overestimated effect sizes of true positives.” In other words, eat that apple with a grain of salt.
8 Replication Itself Has Limitations
Not all replications are created equally. There are often extenuating circumstances involved that make a full analysis of an individual study impractical or even impossible.
And sometimes the replicators themselves are, seemingly, as cavalier as the original researchers. The aforementioned study finding a correlation between search engine access and human memory was among the eight that failed to replicate. But the replication experiment was limited to a word-priming task – whether merely thinking about the internet’s availability makes it harder to retrieve information – rather than a real-world experiment involving actual trivia-style quizzing of subjects. It also ignores self-evident scenarios: for example, since the advent of smartphones, far fewer people remember individual phone numbers anymore – a clear indication that reliance on technology makes recalling of facts less necessary and, therefore, less likely.
Other studies may not replicate because of a change to participants themselves. In 2014, MIT psychologist David Rand published a study on human cooperation. For the study, participants played an online economics game meant to determine points of collaboration and, conversely, selfishness.
When the experiment failed to replicate, Rand argued that the pool of typical online study participants have since grown familiar with his game, making it a less useful tool for examining hypothetical real-life behaviors. That’s a stretch, but the larger point is that widespread familiarity with an experiment can tarnish the experiment itself.
More often than not, though, experiments fail to replicate because they simply weren’t viable in the first place, per the next few items.
7 Some Studies Are Just Incredibly Stupid
It’s sometimes scary what passes for science. One experiments that failed to replicate examined whether challenging people to think more rationally would make them less religious Though on its face a bit insulting – such a test carries the underlying assumption that religious belief is inherently irrational – the goal was to see if people would be more open to finding cause and effect in the physical world rather than the spiritual one.
The experiment itself was, well, dumb. One of its test had participants stare for several minutes at a picture of Rodin’s famous statue, The Thinker. So basically, the experiment was to determine whether looking at a picture of a nude guy with his fist positioned thoughtfully under his chin would dispel their belief in a deity.
Sound science? I think not. Hysterically, the study’s architect thinks the fatal flaw – i.e. what made it fail to replicate – was in the sample size and not the idiocy of the experiment itself. “When we asked them a single question on whether they believe in God, it was a really tiny sample size, and barely significant,” said University of Kentucky psychologist Will Gervais. “I’d like to think it wouldn’t get published today,” he continued, in the understatement of the scientific century.
6 Soft Logic: The Marshmallow Test & The Oversimplification of Social Science
Other studies fail to replicate because the variables for which they initially accounted were insufficient or incomplete. A prime example here is the so-called “marshmallow test”, which originally correlated the ability to delay gratification early in life with success in adolescence or adulthood.
The experiment reads like entertaining torture. A child has a marshmallow placed in front of him, and a researcher presents a choice: If the child can wait for him to leave the room and return, he gets another marshmallow. If not, no extra marshmallow will be awarded. Immediate gratification on one hand, double the deliciousness on the other.
Years later, the children who waited achieved higher SAT scores, lower levels of substance abuse, lower likelihood of obesity and sound social skills. The clear conclusion was that kids who show self control early in life are likely to become more disciplined, high-functioning teens and adults.
The problem is that a child’s life is more complicated than a confectionary. When replication researchers reexamined the findings and accounted for factors like family background, the correlation went away. The likeliest conclusion, of course, is that the children who waited for the second marshmallow had the benefits of proper parenting, good nutrition, etc. They weren’t inherently better – they were simply raised better.
5 Think Again: The Replication Crisis in Psychology
The replication crisis in psychology is even more prominent, perhaps because pinpointing the impetus behind thoughts may be even more complicated than determining causes and effects for actions or accomplishments.
One event in particular triggered psychology’s much-needed mental breakdown. In 2010, a paper using accepted experiment methods claimed evidence of something that, scientifically, was broadly unacceptable: people, it found, were capable of perceiving the future.
The problem wasn’t the study’s laughable conclusion so much as its serious conduction: the experiment was performed over a period of 10 years, during which time Cornell University Psychology Professor Daryl Bem rigorously performed nine replications – eight of which were successful. When the research was made public, Daniel Engber of Slate aptly describes the fallout: “The paper posed a very difficult dilemma,” he writes. “It was both methodologically sound and logically insane.”
Unlike the future, the result was predictable: the study prompted a wholesale reconsideration of practices like smaller-scale sample sizes – which, compared with larger studies, are far likelier to draw conclusions from sheer coincidence. Best practices for effective randomization – meaning the best ways to prevent biases or other stimuli from impacting results – are also getting a harder look in a scientific upheaval that, having begun less than a decade ago, is still very much a work in progress.
4 Psychology’s Replication Reckoning
Unless “Dr.” Peter Venkman from Ghostbusters was actually on to something, extrasensory perception is at best unproven, at worst disproven entirely. So when the aforementioned Daryl Bem used accepted experiment practices to “prove” the unacceptable, psychology was forced to come to grips with how studies were conducted.
Emerging from this discipline-wide review was a group effort 2015 paper, published in Science magazine. The report detailed an overarching problem: When 270 psychologists tried to replicate 100 experiments published in leading journals, only about 40% successfully repeated; the rest either failed or produced inconclusive data and, like some social science experiments referenced earlier, many studies that did pass replication showed weaker effects than the originals.
The report’s conclusion is about as strongly worded as a science journal gets. “After an intensive effort… how many of the effects have we established are true? Zero. And how many of the effects have we established as false? Zero.”
This was not intellectual finger-pointing, but rather declaring that science isn’t as clear-cut as previously imagined. “It is the reality of doing science,” the conclusion continues, “even if it is not appreciated in daily practice.”
Finally, the heart of the matter: “Humans desire certainty, and science infrequently provides it.” The report then suggests there is room for improvement in psychological studies – extra steps and considerations that may lead to more reproducible findings. The larger point, however, is that there is no magic-wand solution.
3 A Common-sense Correction?
Fortunately, the very practice of performing replications can provide researchers from all walks of science insight regarding what experiments are likely to replicate – and, by process of elimination, which are unlikely to and therefore conceptually flawed.
With the replication crisis now widely acknowledged, more discerning replication experiments can help sharpen scientists’ intuitions about what hypotheses are worthy of testing and which are not. In this fashion, the due diligence of replication studies can lead to a more pragmatic approach to new theories and experiments.
Here’s an example: a replication study led by psychologist Brian Nosek, Director of the Center for Open Science, included a prediction component. A group of scientists took bets on which studies they thought would replicate, and which wouldn’t.
Promisingly, the bets largely tracked with the final results – showcasing that the scientists had a sort of professional bullshit detector. Experiments the scientists largely predicted wouldn’t replicate included a report that merely washing one’s hands alleviates “post-decisional dissonance”, a fancy term for confirmation bias that solidifies, in hindsight, difficult decisions in our minds.
The good news, then, is that solving a complex crisis will be aided mightily by a simple quality: common sense. If a study sounds too good to be true, it probably is.
2 Don’t Bring Me Down (Ego Depletion, Part 1)
But sometimes, it’s the seeming sensibleness of a study that allows its permeation despite a flawed premise. Here, showcasing the replication crisis’ intricacies and cascading effects requires two consecutive list items (part 2 follows).
A high-profile failed replication involves a prolific experiment whose key finding has been cited more than 3,000 times: ego depletion. As a concept, it appears logical. Ego depletion posits – and its researchers seemingly proved – that a blow to one’s ego can have carryover effects into a broad range of ensuing tasks, including self-control, responsible decision-making and problem-solving abilities.
For the experiment, psychologists Roy Baumeister and Dianne Tice placed fresh-baked cookies (yum) beside a bowl of radishes (yuck). They then told some participants to eat only radishes, and the other group only cookies. Afterward, each volunteer tried to solve a puzzle that was intentionally designed to be impossible. The cookie-eating clan scrutinized the puzzle an average of 19 minutes, matching those in a control group who hadn’t snacked at all. The radish-eaters quit within an average of eight minutes.
Baumeister and Tice surmised that this revealed a fundamental fact: humans have a limited supply of willpower, and it decreases with overuse. Eating a radish when surrounded by cookies is a draining feat of self-denial, one that has a spillover, fedupedness impact on subsequent challenges.
The finding became a juggernaut, cited to explain far-reaching circumstances of cause and effect, essentially saying that our willpower reserve has a significant say in the successful execution of any given task. In 2011, Baumeister published a best-selling self-help book titled “Willpower: Rediscovering the Greatest Human Strength”.
Settled science, right? Wrong…
1 Ego Deflation (Ego Depletion, Part 2)
And then, like its trademark cookies, ego depletion crumbled. A 2016 paper in the journal “Perspectives on Psychological Science” details a massive 2,000-subject effort to reproduce ego depletion’s stated effect. Taking nothing for granted (the way sound science should), the study comprised two-dozen labs on several continents.
Per the opening line of the study’s conclusion, what it found was exactly nothing: “Results from the current multilab registered replication of the ego-depletion effect provide evidence that, if there is any effect, it is close to zero.” Researchers could not find a discernible link between a blow to one’s ego and the ability to complete subsequent challenges. If you’re good at doing crossword puzzles, a kick to the groin won’t deter you from getting 22 Across.
Just like that, a study cited 3,000 times – and one recreated hundreds of times in various ways – was now at best suspect, at worst bogus. It can’t be overstated enough that ego depletion was so ingrained in psychological theory as to be considered near-canon.
The issue – and a fundamental challenge of the replication crisis – is WHY ego depletion became so widely-accepted before collapsing. Ego depletion, on its face, seems reasonable, and we still see it referenced – for example, when a soccer or baseball player in an offensive slump is accused of having that carry over into his defensive play.
Such seemingly common-sense theories get the benefit of the doubt – a bias that researchers take into ensuing experiments. The replications become based partially on assumptions and, instead of the house of cards tumbling, it merely receives another level – which, in turn, further props up the understandable yet incorrect original theory.