diane_f_halpern's picture
Professor, Claremont McKenna College; Past-president, American Psychological Association; Author, Sex Differences in Cognitive Abilities
A Statistically Significant Difference in Understanding the Scientific Process

Statistically significant difference — It is a simple phrase that is essential to science and that has become common parlance among educated adults. These three words convey a basic understanding of the scientific process, random events, and the laws of probability. The term appears almost everywhere that research is discussed — in newspaper articles, advertisements for "miracle" diets, research publications, and student laboratory reports, to name just a few of the many diverse contexts where the term is used. It is a short hand abstraction for a sequence of events that includes an experiment (or other research design), the specification of a null and alternative hypothesis, (numerical) data collection, statistical analysis, and the probability of an unlikely outcome. That is a lot of science conveyed in a few words.

It would be difficult to understand the outcome from any research without at least a rudimentary understanding of what is meant by the conclusion that the researchers found or did not find evidence of a "statistically significant difference." Unfortunately, the old saying that "a little knowledge is a dangerous thing" applies to the partial understanding of this term. One problem is that "significant" has a different meaning when used in everyday speech than when used to report research findings.

Most of the time, the word "significant" means that something important happened. For example, if a physician told you that you would feel significantly better following surgery, you would correctly infer that your pain would be reduced by a meaningful amount—you would feel less pain. But, when used in "statistically significant difference," the term "significant" means that the results are unlikely to be due to chance (if the null hypothesis were true); the results may or may not be important. In addition, sometimes, the conclusion will be wrong because researcher can only assert their conclusion at some level of probability. "Statistically significant difference" is a core concept in research and statistics, but as anyone who was taught undergraduate statistics or research methods can tell you, it is not an intuitive idea.

Despite the fact that "statistically significant difference" communicates a cluster of ideas that are essential to the scientific process, there are many pundits who would like to see it removed from our vocabulary because it is frequently misunderstood. Its use underscores the marriage of science and probability theory, and despite its popularity, or perhaps because of it, some experts have called for a divorce because the term implies something that it does not, and the public is often misled. In fact, experts are often misled as well. Consider this hypothetical example: In a well-done study that compares the effectiveness of two drugs relative to a placebo, it is possible that Drug X is statistically significantly different from a placebo and Drug Y is not, yet Drugs X and Y might not be statistically significant different from each other. This could result when Drug X is statistically different from placebo at a probability level of p < .04, but Drug Y is statistically significantly different from a placebo only at a probability level of p < .06, which is higher than most a priori levels used to test for statistical significance. If just reading about this makes your head hurt, you are among the masses who believe they understand this critical shorthand phrase which is at the heart of the scientific method, but actually may have a shallow-level of understanding.

There are many critically important ways that findings of "statistically significant difference" can be misleading. But, even though there are real problems with understanding this term, it is firmly entrenched in everyday discussions of research, and for the general public, it shows some knowledge of the process of science.

A better understanding of the pitfalls associated with this term would go a long way toward improving our "cognitive toolkits." If common knowledge of what this term means included the ideas that a) the findings may not be important and b) conclusions based on finding or failure to find statistically significant differences may be wrong, then we would have significantly advanced general knowledge. When people read or use the term "statistically significant difference," it is an affirmation of the scientific process, which, for all of its limitations and misunderstandings, is a significant advance over alternative ways of knowing about the world. If we could just add two more key concepts to the meaning of that phrase, we could improve how the general public thinks about science.