In his Edge interview, Sulloway gives the impression that self-report personality tests -- the kind where people answer questions about themselves -- are worthless and that the psychologists who construct them are naive enough to take the subjects' statements about themselves at face value. The truth is that personality tests are sophisticated devices that have been honed and improved over time. They are examined for internal consistency and checked against other sources of information; test items that don't work are eliminated. No single item on the test can do the job unaided; the scorers of these tests are looking for *patterns* of responses. The "Big Five personality dimensions" that Sulloway talks about are a product of the same tests that he dismisses when they produce results not to his liking.

Sulloway asks but one question of his historical subjects: Do they or don't they believe in evolutionary theory, or phrenology, or the Protestant Reformation? It's a test consisting of a single item. How well can we judge someone's personality by his answer to a single question?

But Sulloway has more than historical data: he has modern data from a variety of personality tests and measures. The data he uses for this purpose were all collected before 1981: they are from studies reviewed in a 1983 book by the Swiss psychologists Cecile Ernst and Jules Angst (that's right, Ernst and Angst -- I'm not making this up). Ernst & Angst concluded that most of the studies they reviewed were worthless because the researchers had failed to control for family size and/or socioeconomic class (variables that are themselves correlated). They threw out the worthless studies, looked closely at the ones that remained, and concluded that birth order was a crock. "This may signify," they said, "that most of our opinions in the field of dynamic psychology may have to be revised."

Sulloway reexamined the same studies that Ernst & Angst reviewed -- the ones that used the proper controls -- and came to different conclusions. There are a number of problems, however, with his reexamination; I discuss them in detail in Appendix 1 of The Nurture Assumption (due out in September). For example, how many studies did Sulloway include in his reanalysis? Five times in his book Born to Rebel, and three times in his Edge interview, he gives the number of properly controlled studies as 196, but I spent days combing through Ernst & Angst's book and found nowhere near that number. The explanation of this discrepancy is contained in a note underneath a table in Born to Rebel: "Each reported finding constitutes a `study.'" Thus, if a researcher reported that the firstborns in a particular sample of subjects were more conventional, conscientious, assertive, and neurotic than the laterborns, Sulloway's definition allowed him to count these four findings as four "studies." Only by counting some studies more than once could Sulloway have obtained his total of 196. Although multiple findings generated from the same sample of subjects are not statistically independent, Sulloway nonetheless tested his data with a statistic based on the assumption that each outcome is independent.

"Unfortunately," Sulloway says in his Edge interview, "most psychologists -- to this day -- do not appreciate the issue of statistical power." He is objecting to attempts to test his claims with samples of only 200 to 400 subjects. Well, if birth order effects were as big and important as Sulloway implies, 200 to 400 subjects should be plenty to demonstrate them. In any case, he has given the impression that bigger studies are more likely than smaller ones to turn up significant birth order effects, which is what you'd expect if birth order effects were real but small. Just the opposite is true, however. Of the research reviewed by Ernst & Angst, only 19 percent of the findings from the largest studies (more than 400 subjects) were favorable to Sulloway's theory, versus 38 percent of the findings from the smallest ones (fewer than 200). Sulloway calls his reanalysis a "meta-analysis," but that term is usually used to describe a procedure that takes into account the size of the included studies and the magnitude of their effects. Neither sample size nor effect size was taken into account in Sulloay's analysis.