The Human Genome Project took over a decade and required extensive institutional efforts.
Today one single technician can sequence several human genomes in just three hours! All thanks to the advances in Sanger and Next Generation sequencing techniques (short-read and long-read sequencing).
The first sequencing technique was Sanger. In 1977, Frederick Sanger and colleagues developed what is also called the chain termination method.
The principle behind this technique consists of mixing a DNA polymerase with special nucleotides (bases without OH groups). Then when the correct nucleotide hybridizes with the DNA strain, the reaction will end. Afterward, the hybridized molecules are run on a gel to decipher the order of the sequence.
Around 2006, Next-Generation Sequencing, or short sequencing, was launched. It was aimed to perform high-throughput reactions.
The chemistry of NGS is sequencing by synthesis. Here, each nucleotide harbors a different fluorescent protein. Then they are mixed with DNA where thousands of fluorescent reactions are simultaneously captured by a powerful computer, generating millions of short sequences and data.
Recently long-sequencing techniques (LST) looked to obtain long sequences from DNA templates - up to 15kb.
Here, a polymerase is attached to the bottom of a well, and a powerful camera detects the fluorescence emitted by the hybridization of nucleotides to the DNA templates in real-time. Although LST may overcome challenges in NGS, this technology is still less accurate than the other techniques.
In this article, I will explain the principles of common sequencing techniques as well as detail their advantages, disadvantages, and applications. Enjoy it!
Article Table of Contents
Sequencing technology identifies the order of the nucleotides in a DNA molecule. This order is unique in each organism and dictates what we are, how we react, and how we face a dynamic environment. The two main techniques are Sanger and Next Generation Sequencing.
Therefore, unraveling a species’ individual DNA sequence means understanding life's molecular basis.
There are two main sequencing techniques. They are Sanger sequencing and Next-Generation Sequencing (NGS). The NGS also has two subgroups named short-read sequencing and long-read sequencing.
However, with advances in computational power and technology, it is highly likely that new and more advanced techniques will appear soon.
Figure 1. Different types of sequencing techniques.
Sanger sequencing is also known as the chain termination method. This is based on using radiolabeled dideoxynucleotides as substrates for the DNA polymerase.
The dideoxynucleotides, compared to standard DNA nucleotides, have an H group instead of an OH group in the sugar molecule (figure 1).
This is key in the Sanger sequencing technique because once the polymerase uses one dideoxynucleotide to hybridize with the DNA template, the polymerase cannot extend the reaction further, and the process ends. Or, in other words, the chain terminates.
Differences between deoxynucleotide and dideoxynucleotide.
So, Mr. Sanger and colleagues were smart enough to use this basic principle to discover the order of nucleotides in a DNA template. Using his own technique, Sanger took about four years to decipher the 5000 bases of the virus phiX174!
Classical Sanger Sequencing used to be done in three steps. The first step consisted of making four different reactions in different tubes. Each tube had a mixture of a primer, dNTPs (standard nucleotides), a DNA polymerase, and a low concentration of a specific radiolabeled dideoxynucleotide.
The dideoxynucleotide (ddTP) could be adenine (ddATP), thymine (ddTTP), cytosine (ddCTP), and guanine (ddGTP).
Chain termination reaction in Sanger sequencing.
Let’s see inside one single tube, for instance ddATP. Here you will have several sequences of different lengths all ending in the same radiolabeled ddATP. Then, if the length of the DNA fragment is known, the location of adenines in the sequence can be estimated.
Classical Sanger sequencing methodology.
The second step used polyacrylamide gel electrophoresis and autoradiography. This step showed different bands for each tube. In the third and final step, the order of the nucleotides was identified.
However, Sanger sequencing evolved into a modern technique. In modern sequencing, the step in different tubes is omitted and instead of performing four different reactions, each type of dideoxynucleotide has a different fluorescent dye attached so the reaction can be done in one single tube. Then, instead of using gels, modern sequencing uses microfluidics and powerful computers to identify the nucleotide being sequenced.
To learn more about DNA Sanger sequencing we invite you to check the following video.
Overview Sanger sequencing.
NGS: Short-read Sequencing
Short-read sequencing (SRS), led by Illumina, is currently the most used sequencing technique due to its high-throughput performance. Short-read sequencing is also known as Second Next Generation Sequencing.
SRS uses the chemistry called sequencing by synthesis. And, although SRS also uses different fluorophores added to each nucleotide (similar to modern Sanger sequencing), thousands of data are generated due to many reactions occurring in the clustering process.
In the clustering process, hundreds of DNA molecules are attached to a flow cell (a slide with many tiny lanes) and are mixed with fluorescent dideoxynucleotides (ddTPs). Then a high-resolution computer captures the fluorescence of millions of reactions that happen in real-time.
It is important to mention that unlike sanger, NGS involves millions of sequences. So, the goal in NGS is to reassemble tiny sequences based on overlapping sections.
Just imagine you have to read a book, however pages of the book have been shred and you have to rebuild the book correctly. How do you make it? Similar to NGS, overlapping words are used as clues to reassemble the genome correctly.
For sure, the downside of this is although powerful algorithms are used to automatize the reassembling, they are not always accurate. In coming years, it is likely better algorithms and even artificial intelligence-based software will be trained to help us improve the reconstruction of genomes and transcriptomes.
SRS using Illumina technology can be done in two general steps.
The first step, after the library preparation, is the clustering step. DNA molecules with attached adapters are amplified using an isothermal process (volume and pressure may change, but the temperature remains constant). This process allows amplification of the DNA templates being sequenced.
Let’s look at the clustering process in more detail.
Each line in the flow cell contains thousands of oligos (known as DNA short sequences) that bind by complementation to the adapters at each end of the DNA molecules. Some oligos bind to the 3’ while other oligos bind to the 5’ extremes. The oligos amplify DNA templates in a process called bridge clustering or bridge PCR.
In the first stage, one side of the DNA binds to the first type of oligo in the flow cell. The reaction contains DNA polymerases and nucleotides that hybridize with the DNA template.
In the second stage, DNA templates fold over, and the second type of oligo hybridizes to the other extreme of the DNA molecule.
Then, the polymerization happens again. So, both extremes of DNA molecules are cloned and amplified. After clustering, thousands of DNA molecules attached to the flow cell are produced.
Bridge PCR used in short-read sequencing.
In the second step, the sequencing by synthesis occurs. Here, the flow cell is mixed with DNA polymerases and fluorescent nucleotides, which hybridize with the many DNA molecules in real time.
In parallel, a powerful computer captures the fluorescence emitted by the multiple reactions and produces millions of small virtual sequences called reads.
Long-read DNA sequencing (LRS) is a modern method used to produce long and more complete DNA sequences. It is also called Third Next Generation Sequencing.
Like short-read sequencing, it also uses sequencing by synthesis chemistry, although the bridge amplification is not performed.
Unlike short-read sequencing, long-read sequencing creates large sequences from one DNA template, however the error rate (the lack of accuracy in the correct nucleotide order) is higher compared to short-sequencing (up to a 10% error).
Although it has higher error in the base calling, long-read sequencing is ideal for identifying the sequence of complex DNA regions such as repeats (a region composed of many adjacent copies of the same sequence).
One example where repeat regions are prevalent is in cereals, where up to 80% of a genome may be composed of repeats! (Wicker et al, 2001).
Long-read sequencing is performed in two general steps. In the first, thousands of DNA templates are primed to polymerases attached to a well plate's bottom (only 100nm height). Each well contains a fixed DNA polymerase in the bottom and a minicamera below.
Figure 7. Priming
process in long-read sequencing.
In the second step, fluorescent nucleotides hybridize orderly with the DNA templates, and the tiny camera captures the fluorescent reactions. When the correct base binds to the DNA template, the signal's intensity is increased, revealing the correct order of the nucleotides.
Process of hybridization in long-read sequencing.
- Identification of single genetic variants and disease studies
- Validation of next-generation sequencing (NGS) results
- Microsatellite markers genotypification
- For matching donors and patient in transplants (HLA typing)
- Sequencing plasmids, inserts, mutations for verification in molecular cloning techniques
- Useful for whole genome sequencing
- Studies of transcriptomes using RNA-Seq
- Studies of methylation using Methyl-Seq
- Studies of exomes, using Exome-Seq
- Metagenomics studies
- Gene expression profiling
- Population genetics
- Variant detection
- For structural variant detection
- For whole genome sequencing of complex genomes such as cereals
- Used in sequencing of very repetitive regions/genomes
- Used in chromosome building from telomere-to-telomere assemblies
- Used for diploid genomes
Adewale, B. A. (2020). Will long-read sequencing technologies replace short-read sequencing technologies in the next 10 years? African Journal of Laboratory Medicine, 9(1). https://doi.org/10.4102/ajlm.v9i1.1340
Cottrell, P. (2018). Advantages and Drawbacks of Next Generation Sequencing. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3183340
Deschamps, S., Llaca, V., & May, G. D. (2012). Genotyping-by-Sequencing in Plants. Biology, 1(3), 460-483. https://doi.org/10.3390/biology1030460
Donkor, E. (2013). Sequencing of Bacterial Genomes: Principles and Insights into Pathogenesis and Development of Antibiotics. Genes, 4(4), 556-572. https://doi.org/10.3390/genes4040556
Head, S. R., Komori, H. K., LaMere, S. A., Whisenant, T., Van Nieuwerburgh, F., Salomon, D. R., & Ordoukhanian, P. (2014). Library construction for next-generation sequencing: Overviews and challenges. BioTechniques, 56(2), 61-77. https://doi.org/10.2144/000114133
Punetha, J., & Hoffman, E. P. (2013). Short Read (Next-Generation) Sequencing: A Tutorial With Cardiomyopathy Diagnostics as an Exemplar. Circulation: Cardiovascular Genetics, 6(4), 427-434. https://doi.org/10.1161/CIRCGENETICS.113.000085