Genetic code of life is a developing story

The near-universality of the DNA code is convincing evidence modern life traces back to a single ancestor

In 1953 James Watson and Francis Crick published the molecular structure of the biological hereditary material deoxyribonucleic acid (DNA) whose famous double-helical structure is now imprinted in the consciousness of all biologist.

This revolution in molecular genetics was completed in the 1960s with the unravelling of the molecular genetic code explaining how the hereditary information encoded in DNA is translated into the structure of proteins that carry out the work of biological cells. This work on DNA is a strong contender for recognition as the greatest-ever scientific and intellectual achievement in biology.

It was believed that the genetic code unravelled in the 1960s is exactly the same across all species – as French biochemist Jacques Monod remarked in 1961 – “What is true for E.coli (a bacterium) is true for an elephant.” However, we now know that the code is not entirely universal. The latest developments in this area will be discussed at a European Molecular Biology Organisation (EMBO) workshop in Bantry in mid-May, organised by my colleague Prof Pasha Baranov, school of biochemistry and cell biology UCC. Another UCC colleague, Prof John Atkins, a founding father of this field, will receive a lifetime achievement award at this meeting.

The long DNA molecule is made of four kinds of chemicals called purine and pyrimidine bases – Adenine (A), Thymine (T), Guanine (G), Cytosine (C) – all strung together end-to-end. The genetic information in DNA is encoded in the sequence of these bases. This genetic language contains four letters A, T, G, C grouped together in threes to create the 64 words in which the language is written. Each three-letter word is called a codon.

READ MORE

Proteins carry out most activities in cells and the genetic information in DNA, stored in the cell’s central nucleus, specifies the types of proteins made in a cell. The information in DNA is first copied (transcribed) into long molecules of another type of nucleic acid called messenger ribonucleic acid (mRNA) in which the base Uracil (U) is substituted for Thymine (T). mRNA then moves from the cell’s nucleus into the cytoplasm where its genetic information is translated into protein.

A protein is basically a long string of amino acids strung together end to end. There are 20 different types of amino acid and the primary sequence of amino acids in a protein ultimately determines everything about that protein’s structure and activity. This sequence is specified by the sequence of codons on the mRNA chain that in turn is a copy of the sequence of codons in a stretch of nuclear DNA – the gene for the particular protein.

Sixty-four permutations of three letters (4 x 4 x 4) can be made from the four nucleotides A, T, G, U found in mRNA and so, 64 codons are available to code for the 20 different amino acids. This means that each amino acid can be specified by more than one codon eg CAU and CAC both code for amino acid Histidine. Also, a specific START codon AUG tells the cell where to start decoding the long mRNA string to begin running off the new protein. Three different codons UAA, UAG, UGA signal where to STOP decoding and release the finished protein.

Although this genetic code is nearly universal, exceptions to its rules have been observed in different species over the years. Variants of the canonical code almost always involve reassignment of STOP codons. In 1979 it was discovered that UGA, a STOP codon in the canonical code, codes for amino acid Tryptophane in human mitochondria, energy-generating organelles in the cell.

Mitochondria are descended from bacteria-like organisms that fused with the ancestor of all eukaryotic organisms billions of years ago, forming a symbiotic relationship. Also, various single-celled ciliates (eg Paramycium) have variants of the canonical genetic code, eg using UAA and UAG to code for glutamate rather than STOP. And, unexpected dynamics of genetic decoding were recently uncovered in cancer cells in a study published in Nature by Baranov and Atkins.

The near-universality of the code is convincing evidence that modern life traces back to a single ancestor. If there had been several origins of life all whose descendants developed the DNA to Protein system, it is vanishingly unlikely that almost all modern species would have the same code. Variants of the code in a relative handful of organisms arose through normal evolutionary processes after all present life evolved.

William Reville is an emeritus professor of biochemistry at UCC