Science is a framework for understanding the world. We observe a phenomenon, ask questions about it, and build models to describe and predict it. Crucially, this process is iterative. We’re constantly refining our experiments, theories, and models, in an effort to improve our understanding of some phenomenon. Instead of accepting the results of a study as “fact”, we ask whether those results can be replicated reliably under similar and different conditions, what theories can be extrapolated from those results, and how we might extend those results to new domains or new questions.
Iteration is how science fosters dialogue. Without it, science isn’t really science; it’s just a bunch of people shouting over one another, with nobody listening. Or even worse, it’s like a game of telephone, in which nobody checks whether the last person got the message right.
Enter: the replicability crisis
If you follow the state of contemporary science, you’re probably familiar with the replicability crisis. Scientists found that the results of studies across a wide variety of disciplines – psychology, medicine, biology, chemistry, etc. – either did not replicate, or were reduced in the size of the effect. According to a poll of 1500 scientists published by Nature, 70% of scientists across many disciplines had failed to reproduce the findings of at least one other lab (Baker, 2016).
This means, effectively, that the findings of the original study could be wrong, and thus any conclusions drawn from those findings might also be wrong.
This is highly problematic when you consider:
- How widespread the replication failures were.
- The fact that many theories in many disciplines rest on untested assumptions drawn from previous findings.
It’s understandable – and even expected – that not every study will replicate. Spurious findings can arise from all sorts of factors, including: the size and scope of your subject pool, the kinds of statistical tests you ran, how well you controlled the experiment, and more. This is precisely why it’s important that scientists make an effort to replicate both their own research, as well as the research of others – it’s a way to hold both yourself and others accountable.
But without this accountability, spurious findings proliferate throughout the literature. Study 1 discovers Spurious Finding 1, which motivates Study 2 and the discovery of Spurious Finding 2, and so on, all the way to the development of a Spurious Theory. In other words, accountability and replication are what – theoretically, at least – distinguish science from other frameworks of understanding the world.
So what causes the proliferation of bad science?
Bad science: causes and explanations
Understandably, scientists and science journalists alike were very alarmed by the replicability crisis. They began theorizing about what caused it, and what we can do about it. Some researchers, like John Ioannidis, have been talking about problems with our scientific process for over a decade; Ioannidis’s 2005 article, Why most published research findings are false, has been cited almost 5000 times.
As with most systemic problems, this one slices across many different levels, and each sub-problem feeds into the others. I couldn’t possibly do justice to the size and scope of the problem, but I’ll try to summarize some of the main points below. Note that these points are an aggregate of what’s been argued by other science and journalists (Begley & Ioannidis, 2015; Everett & Earp, 2015; Earp & Trafimow, 2015).
Publish or perish
To survive in the modern academic world, scientists need to publish their work. A key variable determining the success of a scientist (the granting of tenure, awarding of grants, etc.) is their impact – how many times their work has been cited in the literature. This creates a pressure for scientists to publish both frequently, and in high-impact journals.
Unfortunately, good science takes time, and the chance of getting accepted in a high-impact journal (and being cited more frequently) is generally higher if your results are more surprising or novel. Thus, we arrive at the second problem…
Science is rigorous. Rigor isn’t always fun; it means designing experiments to control for all the confounding variables, poring over your data multiple times, making sure all of your analyses are correct, and so on. But rigor is important if you want to be confident in your findings.
Rigor also takes time. And as I mentioned above, time is of the essence for scientists looking for tenure, grants, and the respect of their colleagues. This leads to corner-cutting and a general sloppiness at all stages of science – experimental design, data collection, analysis, and interpretation. One example of sloppy statistics is p-hacking, in which researchers run a variety of statistical tests, then report only the significant analyses and extrapolate their conclusions from that. P-hacking isn’t always malevolent or even intentional; exploratory analyses are a natural part of science, but should be distinguished from the analyses a scientist planned to run when designing an experiment.
Still, sloppiness could be addressed by holding sloppy scientists accountable, e.g. through attempting to replicate their results. The only problem with this is…
Replications aren’t valued
Most scientists know replication is an essential part of science. But in general, replications aren’t valued very highly by the community. The problem is that replications have one of two outcomes:
- The results of the original study are replicated.
- The results of the original study are not replicated (e.g. a null result).
If the results replicate, the replication is appreciated, but not particularly well-discussed (and probably not well-cited). That is, if a study has already discovered X, then confirming X is less exciting or novel to other scientists; it might be acknowledged as important, but won’t really gain the scientist much notoriety, and it might even be hard to publish. After all, many journals are highly selective, and there’s not much of an information gain from publishing a replication study – we already know X, so why should we publish X again?
It’s potentially more interesting if the results don’t replicate. But in this case, the authors of the original study – who obviously have a stake in their findings being valid – might accuse you of not replicating it correctly (e.g. you changed the stimuli, the analysis, the participant pool, etc.). And it’s still hard to publish because people generally aren’t as interested in reading about a null result. The best-case scenario of publishing a failed replication is that it muddies the theoretical waters, which, though important in a deeply fundamental way, is not nearly as exciting as making a new contribution.
So if replications aren’t particularly impactful, researchers interested in bolstering their own impact will have little interest to do them, which means that sloppy science won’t be held accountable.
The Takeaway: what can be done?
The problem is systemic; it’s not the fault of any particular scientist or group of scientists. Systemic problems are hard to understand, and they’re even harder to address, because it’s so difficult to isolate the “root cause”. So does this mean the problem is hopeless?
Clearly, scientists are interested in addressing the problem. The replicability crisis has catalyzed considerable effort to incentivize replications in science. For example, the Open Science Framework (OSF) is a website for sharing research and collaborating with other scientists; the OSF can also be used to facilitate massive, multi-lab replication projects by sharing experimental stimuli, methods, and even data. It also allows scientists to pre-register the analyses they plan to run, which helps avoid sloppy statistics like p-hacking by keeping scientists accountable. Similarly, many journals, such as Nature, ask authors to include detailed supplementary information about the analyses they ran so that other scientists can verify the results.
These efforts won’t necessarily solve the problem, but hopefully the improved accountability will in turn improve the overall quality of science.
So what does this mean for non-scientists? One suggestion is that anyone – scientist or not – should practice informed skepticism. This means not treating all scientific findings as fact, and exercising caution in what we infer from any given finding. Besides problems with replicability, considerable nuance gets lost in translation between an academic paper, a press release, and a write-up in one’s news outlet of choice. Note that “informed skepticism” should not be a license for distrusting science altogether; it’s simply a reminder that science is conducted by humans, and humans are subject to bias.
Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533(7604), 452-454.
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124.
Begley, C. G., & Ioannidis, J. P. (2015). Reproducibility in science. Circulation research, 116(1), 116-126.
Everett, J. A., & Earp, B. D. (2015). A tragedy of the (academic) commons: interpreting the replication crisis in psychology as a social dilemma for early-career researchers. Frontiers in psychology, 6.
Earp, B. D., & Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in social psychology. Frontiers in psychology, 6.
 What constitutes a “phenomenon of interest” is, of course, a very difficult question to answer, and depends on the scientific paradigm you’re operating under. To ask questions about the world, you need to carve up the world in certain ways, which already makes a set of assumptions and constrains the space of questions you can even ask.
 This assertion is complicated by a few factors. One, it’s hard to know when you’ve accumulated enough evidence to be “certain” of something. Of course, scientific findings are inherently probabilistic and have some measure of uncertainty – but more practically, any finding has the possibility of forming some sort of “paradigm” within which future scientists pose questions and interpret results, which means it’s important to bear in mind how certain we are that the paradigm we’re operating under is a useful paradigm. Second, vested interests like funding sources, egos, etc., can interfere with the alleged “purity” of the scientific process. In Science Mart, Philip Mirowski argued that science has been largely coopted and commoditized by corporate powers, transforming science from a process for understanding the world into a tool for increasing profit. Whenever money is involved, science runs the risk of succumbing to the bias of the funder. It’s no accident that scientists funded by tobacco companies argued that smoking and lung cancer were totally unrelated. This problem, of course, reinforces the need for frequent replication from multiple labs.