The discoveries that DNA is the genetic material that contains the instructions for how to construct an organism, and that it is structured as a linear polymer of nucleotides were important for the modern understanding of genetics. However these discoveries alone didn't make is possible to fully utilize genetic science. It was necessary to discover how the information in the DNA sequence is read by organisms. The elucidation of the code was made in the decade following the discovery of the structure of DNA, and culminated in the 1968 award of the Nobel Prize for Physiology and Medicine to Har Gobind Khorana, Robert W. Holley and Marshall Nirenberg.
The way that the DNA sequence is read by organisms is called the genetic code. The genetic code is universal across all organisms. While organisms have differences in how they store and process DNA, the genetic code is universal across all life forms.
As per Crick's dogma, DNA is transcribed to mRNA, which is further translated to protein. There are twenty amino acids found in proteins, but only four nucleotides. Because there are fewer nucleotides than amino acids, this means that groups of nucleotides need to encode for each amino acid. Groups of two nucleotides would only produce sixteen possible combinations (4^2 = 16), however groups of three nucleotides would produce sixty four combinations (4^3 = 64), easily sufficient to encode for each amino acid. Each group of three nucleotides is called a codon, and each codon encodes for a specific amino acid, or a signal to stop protein translation (Figure 1).
Because there are more possible codons than there amino acids, there is a certain amount of redundancy in the genetic code, with multiple codons encoding for the same amino acids. As well as this redundancy, there is an imbalance in the number of codons that encode for each amino acid. Alanine has six codons encoding it, while tryptophan has only one. This disparity reflects the frequency of different amino acids, those amino acids that occur more frequently have more possible codons, those that occur less frequently have fewer codons encoding them (Figure 2).
Codon redundancy means that mutations may not change the sequence of amino acids. Generally speaking, codons that express the same amino acid will have the same first two nucleotides of the codon sequence while the last nucleotide will vary (Figure 1). Because of this, a mutation at the last nucleotide will not change the amino acid. This provides some protection of the genetic sequence against mutation. Codon redundancy also gives rise to the phenomenon of codon usage. Codon usage is where organisms appear to have preferences for which codons they use for encoding certain amino acids. These usage patterns vary between organisms (Figure 3) and may reflect regulating how the protein product folds, or translational efficiency.
The universality of the genetic code is what enables modern genetics. Scientists can analyze the sequences of genes in multiple different organisms to predict the function of those genes through homology. Genetic engineering using molecular biology techniques can also be used to move genes from one organism to another. This allows scientists to produce proteins from humans (which would be hard to extract in large quantities) into bacteria to produce such proteins for protein function research. Scientists can also produce transgenic organisms with different genes from other organisms.