Randomised controlled trials (RCTs) are generally viewed as the foundational experimental method of the social and medical sciences. Economists depend on them, for certain questions, as their most valued method. Yet RCTs are not flawless. In my study, Why all randomised controlled trials produce biased results,* I argue that RCTs are not able to establish precise causal effects of an intervention.
Many of us however likely have used some medication, own some technology or support some public policy tested in a trial. To be able to assess how effective they may be prior to supporting them—either as patients or consumers or voters—RCTs are often conducted by splitting up a sample of people into a treatment group and a control group. Contrary to the common belief, I argue in my study that some degree of bias inevitably arises in any trial. This is because some share of recruited people refuse to participate in any trial (which leads to sample bias), some degree of partial blinding or unblinding of the various trial persons generally arises in any trial (which leads to selection bias), and participants generally take treatment for different lengths of time and different dosages in any trial (which leads to measurement bias), among other issues.
The ten most-cited RCTs worldwide, which I assess in the study, suffer from such general issues. But they also suffer from other methodological issues that affect their estimated results as well: participants’ background characteristics (like age, health status, level of need for the treatment etc.) are often poorly allocated across trial groups, participants at times switch between trial groups, and trials often neglect alternative factors contributing to their main reported outcome, among others. Some of these issues cannot be avoided in trials—but they affect the robustness and validity of their results and conclusions.
This is important as the level of validity of a trial’s causal claims is at times a life-or-death matter—for example in public health. The study itself is about the RCT method and not any individual RCTs, and the insights outlined in this study are useful and important for researchers using RCTs in economics, psychology, agriculture and the like (though the ten most-cited RCTs worldwide that are assessed happen to be medical trials).
Assumptions and biases generally increase at each step when carrying out trials
That is, from how we create our variables, select our initial sample and randomise participants into trial groups, to how we analyse the data for participants with different lengths of time and amounts of treatment and how we try and ensure everyone involved is fully blinded before the trial begins and throughout its entire implementation—among many other steps before, in between and after these.
I thus argue that the reproducibility crisis is, to a large extent, the result of the scientific process always being a complex human process that involves many actors (study designers, all participants, data collectors, implementing practitioners, study statisticians etc.) who must make many unique decisions at many different steps over time when designing, implementing and analysing any given study—and some degree of bias unavoidably arises during this process. Variation between study outcomes is thus the norm, and one-to-one replication is not possible.
Researchers should thus not assume that the RCT method inevitably produces valid causal results—in fact, that all trials face some degree of bias is simply the trade-off for studies to actually be conducted in the real world. A number of things inevitably do not go as planned or designed given the multiple complex processes over time involved in carrying out trials. Once a study is conducted and completed some biases will have arisen and nothing can be done about a number of them. The study, at the same time, aims to improve how RCTs are carried out by outlining how researchers can reduce some of the biases.
Are biased results in trials still good enough to inform our decisions in public health and social policy?
In many cases they are. But that judgement generally depends on how useful the results are in practice and their level of robustness relative to other studies that use the same method or at times other methods. Yet no single study should be the sole and authoritative source used to inform policy and our decisions.
Some may respond, “are RCTs not still more credible than other methods even if they may have biases?” For most questions we are interested in, RCTs cannot be more credible because they cannot be applied—e.g. for most complex phenomena we study such as effective government institutions, long life expectancy, democracy, inequality, education systems, psychological states etc. Other methods (such as observational studies) are needed for many questions generally not amendable to randomisation but also at times to help design trials, interpret and validate their results, provide further insight on the broader conditions under which treatments may work, among other reasons discussed in the study. Different methods are thus complements (not rivals) in improving understanding.
Taken together, researchers, practitioners and policymakers need to become better aware of the broader range of biases facing trials. Journals need to begin, as I illustrate in the study,* requiring researchers to outline in detail the assumptions, biases and limitations in their studies. If researchers do not report this crucial information in their studies, practitioners and citizens will have to just rely on information and warning labels provided by policymakers, biopharmaceutical companies and the like implementing the tested policies and selling the tested treatments.
* Krauss, Alexander. Why all randomised controlled trials produce biased results. Annals of Medicine, 50:4, 312-322 (2018).