Beautiful, Elegant, And … Absolutely Wrong
On April 25, 1953, in a short note to Nature entitled "Molecular structure of nucleic acids", James Watson and Francis Crick announced their deduction of the double-helix structure of DNA. Their remarkable article famously ends with one of the most laconic understatements in all of science: "It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material." Indeed, the double-strands of DNA, each a mirror image of the other, explains how cells can replicate—the "secret of life" as Watson and Crick modestly announced to the startled clientele of The Eagle pub in Cambridge during a break in their labors. But the most remarkable thing about this beautiful and elegant discovery is that it is not remarkable enough to be the topic of this essay.
In fact, their structure held the key to something far more subtle and arguably of even greater fundamental importance than the mechanism for genetic replication. What Watson and Crick had uncovered was a window on the world's oldest fossil, one which sheds light not only on the static structure of modern molecular biology, but also on the dynamical processes leading to the emergence of life on Earth! The unraveling of this story is a fascinating episode in the history of science, because it involves not one, but two beautiful and elegant ideas, each of which is absolutely and unequivocally wrong! And tellingly, each beautiful but flawed idea was conceived by a physicist trying to understand biology as a jigsaw puzzle at the molecular scale. As we proceed into the second decade of this "century of biology", flooded by data but starved of fundamental insight, we will do well to resist the temptation to build biology on naïve, static principles of beauty and elegance that seemingly serve so well in the physical sciences.
Watson and Crick had shown how to pack together the four molecules ("nucleotides"—known concisely by the symbols A,C,G,T) making up DNA into a structure that fitted X-ray measurements of the atomic positions. The precise sequence of the symbols A,C,G,T somehow encoded the composition of proteins, known to be built up from a palette of twenty small molecules known as amino acids. It was the physicist George Gamow who a year later pointed out that the "key-and-lock" relation between the nucleotides of the DNA structure exhibited diamond-shaped holes, and that for geometrical reasons, these diamonds were specified by the three surrounding nucleotides. Gamow enumerated all such holes, and discovered that there were twenty different types of hole—one for each amino acid! Thus was born the idea of a "genetic code": the stupendously complicated biochemical processes going on inside every cell to make proteins could be reduced down to a simple code table, easily able to fit on a tee-shirt, that told you how to read DNA and translate its message into the proteins of living cells.
Gamow's ingenious idea for a genetic code was absolutely wrong, but serendipitously he did get right that triplets of nucleotides ("codons") code for the twenty amino acids of life. But how do they? If you write down all possible triplets of ACGT, you'll find that there are 64 (4x4x4) possible sets of three letters or codons. Evidently some of these possible codons must code for the same amino acid, or else the cell would use 64 amino acids. If the codons were doublets, then there would only be 16 (4x4) possible sets of such letters, and that would not be enough to specify each of the 20 amino acids used by life. So we are stuck with triplets, and evidently some of these triplets code for the same amino acid.
And now there are two mysteries: not only what is the code, but also, why are only 20 amino acids used by life?
These questions were answered in a brilliant article published by Crick, Griffith and Orgel in 1957. Let's follow Crick and friends, as they try to hack the genome, reverse engineering the genetic code by appealing to a logic that one might mischievously call "intelligent design". Let's suppose that we have a section of the message six characters long, some small part of the three billion characters of the human genome:
… ACGGAC …
Knowing about triplet codons, you would parse this as
But, you say, how do we know that the first letter, the A, is actually where the message starts? Maybe the A is the last symbol in the previous codon (the … above), and the message should be parsed as
…A, CGG, AC …
with the last AC the start of the following codon. In other words, if you set out to transmit a code in this way, what would be the smart way to remove this sort of ambiguity? Crick et al. proposed that you construct the code such that it only makes sense when read with the correct starting point. In other words, you make a code in which ACG and GAC stand for meaningful symbols (in this case the amino acids Threonine and Aspartic acid), but CGG would be meaningless. Such a code can be read in only one way, so it requires no punctuation: it is what they called a "comma-less code". The beautiful discovery of Crick et al. is that the maximum number of symbols (ie. amino acids) that can be encoded in this way with triplets is twenty. And so, they concluded, Nature had been forced to work only with twenty amino acids, and not a much larger set, in order to build proteins, because it was a mathematical impossibility to construct a comma-less genetic code with more amino acids.
From their reasoning, it is not difficult to enumerate the possible comma-less genetic codes. There are 288 of them. But the actual genetic code, conclusively determined during the 1960's, is not one of them!
Beautiful, elegant, and absolutely wrong: key-and-lock mechanisms have been important in understanding snapshots of biological structure, but they are misleading clues in the search to understand the dynamical processes of evolution that have led to the phenomenon of life. Far richer than Gamow and Crick conceived, the genetic code is now thought to have been shaped rapidly over evolutionary time through the cooperative dynamics of early organisms, thus ensuring a code that is remarkably robust to errors in translation and mutations of the genome sequence. Only with such a code could such complex organisms as you and me evolve. And that is the real secret of life.