dean_ornish's picture
Founder and President of the non-profit Preventive Medicine Research Institute
Large Randomized Controlled Trials

It is a commonly held but erroneous belief that a larger study is always more rigorous or definitive than a smaller one, and a randomized controlled trial is always the gold standard . However, there is a growing awareness that size does not always matter and a randomized controlled trial may introduce its own biases. We need more creative experimental designs.

In any scientific study, the question is: "What is the likelihood that observed differences between the experimental group and the control group are due to the intervention or due to chance?" By convention, if the probability is less than 5% that the results are due to chance, then it is considered statistically significant, i.e., a real finding. 

A randomized controlled trial (RCT) is based on the idea that if you randomly-assign subjects to an experimental group that receive an intervention or to a control group that does not, then any known or unknown differences between the groups that might bias the study are as likely to affect one group as another. 

While that sounds good in theory, in practice a RCT can often introduce its own set of biases and thus undermine the validity of the findings. 

For example, a RCT may be designed to determine if dietary changes may prevent heart disease and cancer. Investigators identify patients who meet certain selection criteria, e.g., that they have heart disease. When they meet with prospective study participants, investigators describe the study in great detail and ask, "If you are randomly-assigned to the experimental group, would you be willing to change your lifestyle?" In order to be eligible for the study, the patient needs to answer, "Yes."

However, if that patient is subsequently randomly-assigned to the control group, it is likely that this patient may begin to make lifestyle changes on their own, since they have already been told in detail what these lifestyle changes are. If they're studying a new drug that only is available to the experimental group, then it is less of an issue. But in the case of behavioral interventions, those who are randomly-assigned to the control group are likely to make at least some of these changes because they believe that the investigators must think that these lifestyle changes are worth doing or they wouldn't be studying them. 

Or, they may be disappointed that they were randomly-assigned to the control group, and so they are more likely to drop out of the study, creating selection bias. 

Also, in a large-scale RCT, it is often hard to provide the experimental group enough support and resources to be able to make lifestyle changes. As a result, adherence to these lifestyle changes is often less than the investigators may have predicted based on earlier pilot studies with smaller groups of patients who were given more support. 

The net effect of the above is to (a) reduce the likelihood that the experimental group will make the desired lifestyle changes, and (b) increase the likelihood that the control group will make similar lifestyle changes. This reduces the differences between the groups and makes it less likely to show statistically significant differences between them. 

As a result, the conclusion that the intervention had no significant effect may be misleading. This is known as a "type 2 error" meaning that there was a real difference but these design issues obscured the ability to detect them. 

That's just what happened in the Women's Health Initiative study, which followed nearly 49,000 middle-aged women for more than eight years. The women in the experimental group were asked to eat less fat and more fruits, vegetables, and whole grains each day to see if it could help prevent heart disease and cancer. The women in the control group were not asked to change their diets. 

However, the experimental group participants did not reduce their dietary fat as recommended—over 29 percent of their diet was comprised of fat, not the study's goal of less than 20 percent. Also, they did not increase their consumption of fruits and vegetables very much. In contrast, the control group reduced its consumption of fat almost as much and increased its consumption of fruits and vegetables, diluting the between-group differences to the point that they were not statistically significant. The investigators reported that these dietary changes did not protect against heart disease or cancer when the hypothesis was not really tested. 

Paradoxically, a small study may be more likely to show significant differences between groups than a large one. The Women's Health Initiative study cost almost a billion dollars yet did not adequately test the hypotheses. A smaller study provides more resources per patient to enhance adherence at lower cost. 

Also, the idea in RCTs that you're changing only one independent variable (the intervention) and measuring one dependent variable (the result) is often a myth. For example, let's say you're investigating the effects of exercise and its effects on preventing cancer. You devise a study whereby you randomly assign one group to exercise and the other group to no exercise. On paper, it appears that you're only working with one independent variable.

In actual practice, however, when you place people on an exercise program, you're not just getting them to exercise; you're actually affecting other factors that may confound the interpretation of your results even if you're not aware of them. 

For example, people often exercise with other people, and there's increasing evidence that enhanced social support significantly reduces the risk of most chronic diseases. You're also enhancing a sense of meaning and purpose by participating in a study, and these also have therapeutic benefits. And when people exercise, they often begin to eat healthier foods. 

We need new, more thoughtful experimental designs and systems approaches that take into account these issues. Also, new genomic insights will make it possible to better understand individual variations to treatment rather than hoping that this variability will be "averaged out" by randomly-assigning patients.