Codon bias is a bias for synonymous codons encoding the same amino acid, for example a bias for UUU over UUC when both code phenylalanine. Codon biases have implications in protein expression, and there are considerations for choosing host cells or altering codon sequences.
In other words, multiple codons can encode the same amino acid during protein translation, but host cells have a bias toward using one specific codon out of the other options to encode every amino acid. It varies between organisms and is an important issue during heterologous gene expression.
Here is an important and brief recap about the genetic code and how the code is translated to proteins.
The genetic code is the correspondence between nucleic acid sequences and the polypeptides they encode. When mRNA gets translated, in its nucleotide sequence, every three nucleotides (triplet) in tandem code for an amino acid. These triplet nucleotide sequences that code for specific amino acids are called codons.
The chart below helps visualize it easier. Vertically, on the left are the nucleotides for U,C,A and G, representing uracil, cytosine, adenine and guanine in an mRNA sequence. This vertical set of nucleotides represents the first letter of an amino acid.
Running horizontally, in green, are the nucleotides representing the second letter of an amino acid of the codon. Again, these nucleotides are U,C, A and G.
Finally, running vertically on the right, in pink, are the nucleotides that represent the third letter of an amino acid. These are also U, C, A and G.
Figure 1. Amino acid codon chart.
The chart works like a matrix where each line will allow you to combine the first nucleotide of that line, with the second one in the column and the third one in that line.
For example, the very first line to the left is the nucleotide sequence UUU, combining the first letter, U for uracil, from the vertical nucleotides on the left, with the first letter, U, of the nucleotides running horizontally in green, and the first letter, U, of the nucleotides running horizontally on the right in pink.
This combination is just one codon combination that produces the amino acid phenylalanine.
If you notice, there are 64 total variations of possible codons because each of those three nucleotides can be either an adenine (A), guanine (G), cytosine (C) or uracil (U), which makes it 4x4x4 = 64 combinations.
Figure 2. Demonstrates the number of possible triplet codons.
This gives rise to an interesting situation. In biological systems, there are only about 20 amino acids but here we have 64 different codons that encode them. Because of this, every amino acid can be encoded by more than one codon.
This is known as degeneracy of the genetic code. Codons encoding for the same amino acid are called synonymous codons.
For example, both UUU and UUC code for phenylalanine, and therefore would be synonymous codons.
These synonymous codons occur with different frequencies in the mRNAs of different organisms, and sometimes different cell types in the same organism.
For example, there are six synonymous codons that encode the amino acid arginine: CGU, CGC, CGA, CGG, AGA and AGG. Among these, human mRNAs mostly use AGG and AGA for encoding arginine. On the other hand, E. coli mRNAs almost never have these codons; they use CGC and CGU for encoding arginine.
This organismal/ cell type preference for a particular codon encoding a specific amino acid is that codon bias or codon usage bias we’re talking about.
Here, we will take a look at how codon bias is a crucial consideration in gene cloning and expression procedures.
When expressing a transgene in a heterologous host, the cloned DNA sequence may need tweaking to make it compatible with the codon usage pattern of the host – otherwise, translation of the cloned DNA might be hampered. Called codon optimization, this is important for expressing recombinant proteins.
To discuss why codon optimization might be required to ensure optimal translation of the cloned transgene, we need to recap a key step in the protein translation process.
The codons in the mRNA are recognized and bound by the corresponding tRNAs – because of base-paring between the codon on the mRNA and the anti-codon on the corresponding tRNA.
Figure 3. Binding between the codon in the mRNA and the corresponding anti-codon in the aminoacyl tRNA.
Please see the figure here. The tRNA has a clover leaf shape. Two remarkable features to note here are:
- The anti-codon – which binds to the corresponding codon on the mRNA due to complementarity.
- The amino acid bound to the tRNA at its 3’ end.
The sequence of the anti-codon is complementary to the mRNA sequence it binds to. This tRNA, at its 3’ end, is joined to the specific amino acid that this codon encodes. This tRNA, where it is joined to the corresponding amino acid, is called aminoacyl tRNA.
In biological systems, there are 20 different aminoacyl tRNAs, corresponding to the 20 different amino acids – aminoacyl tRNASer (for Serine), aminoacyl tRNALeu (for Leucine), aminoacyl tRNAPro (for Proline) etc.
In turn, the aminoacyl tRNAs for a particular amino acid vary based on their anticodon sequence. In other words, each aminoacyl tRNA has variations based on its anti-codon sequence.
For example, there are six variations of aminoacyl tRNAs for arginine corresponding to the six different codons encoding arginine.
Figure 4. Three examples of aminoacyl tRNA for arginine: aminoacyl tRNAArgCGU, aminoacyl tRNAArgCGC, aminoacyl tRNAArgCGA.
So, for arginine, we would have six aminoacyl tRNAs in the table below.
Table 1. Six aminoacyl tRNAs of arginine.
Binds to (specific arginine codon)
arginine encoding codon CGU
arginine encoding codon CGC
arginine encoding codon CGA
binds to the arginine encoding codon CGG
binds to the arginine encoding codon AGA
arginine encoding codon AGG
Of these 6 types of aminoacyl tRNAs for arginine, E. coli cells, because of their typical codon usage pattern, will have a very high proportion of aminoacyl tRNAArgCGU and aminoacyl tRNAArgCGC.
Conversely, human cells will have a high proportion of tRNAArgAGA and tRNAArgAG
Keeping all this in mind, let us see what might happen if a human gene is cloned in an E. coli expression host.
In this case, the mRNA of the cloned human gene would have codons AGG and AGA encoding arginine. However, the corresponding aminoacyl tRNAs for arginine (tRNAArgAGA and tRNAArgAGG) that would be necessary to translate the genetic information from these codons would be rare or even absent in the E. coli host cell due to codon usage bias.
The net result of this incompatibility is that the cloned human gene would not get optimally translated in the expression host. And ultimately, production of the recombinant protein would be hindered.
This is why codon bias is such an important consideration during transgene cloning and expression.
The simplest way around this is to make necessary changes in the coding sequence of the transgene cloned – such that the codons for each amino acid in the transgene are changed to the synonymous codons that the expression host organism uses to encode the corresponding amino acids.
In our example, the way to do this would be to change the codons for arginine in the cloned human gene from sequences AGG and AGA to CGC or CGU when using E. coli as the expression host.
However, the procedure is far more complicated than it sounds, and changing the sequence of the cloned transgene to make it compatible with the codon usage bias of the expression host might have other consequences. Here are some important points.
The native mRNA sequence of a gene might have functions other than merely coding for amino acids. An mRNA’s native sequence determines how that mRNA folds into secondary structures, which have implications on that mRNA’s stability, translation efficiency etc.
So, altering the native sequence of the mRNA considering codon bias reasons might affect these steps of intrinsic regulation of gene expression.
Also, if the translation efficiency of the cloned gene is grossly improved by codon optimization, there might be incompatibility between the rate of translation and the rate of post-translation modifications and recombinant protein folding. And this might have negative consequences on the structure and function of the cloned product.
This point is especially relevant for recombinant proteins for therapeutic purposes. Improving translation by codon optimization might decrease safety or efficacy of the product.
Because of this, codon optimization is done very carefully, often by using sophisticated AI algorithms to ensure optimal protein translation while also maintaining the fidelity of its structure and function.
Indeed, due to the mismatch in codon usage pattern between the expression host (E. coli) and the organism (human) from which the foreign transgene is cloned, a lot of issues arise while mass producing a recombinant protein, especially for therapeutic purposes.
When these issues arise, often the expression host needs to be changed. In our human-to-E. coli example, scientists may instead consider transfecting the cloned human transgene into an animal/ human cell line and using that as the expression host instead of E. coli.
As we saw in this article, codon bias has a big impact in gene cloning experiments, especially where the objective is to mass produce recombinant proteins in a heterologous host. With the advent of new cloning approaches that extensively employ artificial intelligence, codon optimization continues to get more refined and sophisticated.
Elena et al. 2014. Expression of codon optimized genes in microbial systems: current industrial applications and perspectives. Front Microbiol. Vol 5.
Mauro. 2018. Codon Optimization in the Production of Recombinant Biotherapeutics: Potential Risks and Considerations. Biodrugs. 32, 69-81
Mauro and Chappell. 2014. A critical analysis of codon optimization in human therapeutics. 20, 11-P604-613.
Mirzaei et al. 2016. Cloning, Codon Optimization, and Expression of Yersinia intermedia Phytase Gene in E. coli. 14(2)-63-69