robert_plomin's picture
Professor of Behavioral Genetics, King's College London; Author, Blueprint
Polygenic Scores

Polygenic scores are beginning to deliver personal genomics from the front lines of the DNA revolution. They make it possible to predict genetic risk and resilience at the level of the individual rather than at the level of the family, which has far-reaching implications for science and society.

Polygenic means many genes. Classical genetic studies over the past century have consistently supported Ronald Fisher’s 1918 theory that the heritability of common disorders and complex traits is caused by many genes of small effect. What had not been realized until recently was just how many and how small these effects are. Systematic gene-hunting studies began a decade ago using hundreds of thousands of DNA differences throughout the genome, called genome-wide association (GWA). The early goal was to break the 1% barrier, that is, to achieve the power to detect DNA associations that account for less than 1% of the variance of common disorders and complex traits. Samples in the tens of thousands were needed to detect such tiny effects, after massively correcting for multiple testing hundreds of thousands of DNA differences in a GWA study. A great surprise was that these GWA studies powered to detect DNA associations that account for 1% of the variance came up empty-handed.

GWA studies needed to break a 0.1% barrier, not just a 1% barrier. This requires samples in the hundreds of thousands. As GWA studies break that barrier, they are scooping up many DNA differences that contribute to heritability. But what good are DNA associations that account for less than 0.1% of the variance? The answer is "not much" if you are a molecular biologist wanting to study pathways from genes to brain to behavior because this means that there is a welter of minuscule paths.

Associations that account for less than 0.1% of the variance are also of no use for prediction. This is where polygenic scores come in. When psychologists create a composite score, like an IQ score or a score on a personality test, they aggregate many items. They don’t worry about the significance or reliability of each item because the goal is to create a reliable composite. In the same way, polygenic scores aggregate many DNA differences to create a composite that predicts genetic propensities for individuals.

A new development in the last year is to go beyond aggregating a few genome-wide significant "hits" from GWA studies. Predictive power of polygenic scores can be increased dramatically by aggregating associations from GWA studies as long as the resulting polygenic score accounts for more variance in an independent sample. Polygenic scores now often include tens of thousands of associations, underlining the extremely polygenic architecture of common disorders and complex traits.   

Polygenic scores derived from GWA studies with sample sizes in the hundreds of thousands can predict substantial amounts of variance. For example, polygenic scores can account for 20% of the variance of height and 10% of the variance in UK national exam scores at the end of compulsory education. This is still a long way from accounting for the entire heritability of 90% for height and 60% for educational achievement—this gap is called missing heritability. Nonetheless, these predictions from an individual’s DNA alone are substantial. For the sake of comparison, the polygenic score for educational achievement is a more powerful predictor than the socioeconomic status of students’ families or the quality of their schools.

Moreover, predictions from polygenic scores have unique causal status. Usually correlations do not imply causation, but correlations involving polygenic scores imply causation in the sense that these correlations are not subject to reverse causation because nothing changes inherited DNA sequence variation. For the same reason, polygenic scores are just as predictive at birth or even prenatally as they are later in life. 

Like all important findings, polygenic scores have potential for bad as well as for good. Polygenic scores deserve to be high on the list of scientific terms that ought to be more widely known so that this discussion can begin.