If you’re new to omics, or not familiar, omics is a recent scientific field in biology, where large amounts of data are generated from the sequencing of genomes, transcripts, proteins, and metabolites. Respectively, these are called genomics, transcriptomics, proteomics and metabolomics.
These scientific disciplines are developed to provide deep insights about each section of the biology dogma DNA>transcripts>proteins>metabolites.
Next-Generation Sequencing (NGS) and third generation sequencing technologies including Illumina, PacBio and Oxford Nanopore, are the methods to generate the large amount of data.
However, nowadays, an integrative approach called multi-omics (MO) allows correlating genes with transcripts, transcripts with metabolites, and proteins with metabolites, expanding the scenario into new and more profound questions inside plant research.
Each omics has its own principles and workflows, leading to multiple applications. This article will detail the basics and general pipelines for the main omics like genomics, transcriptomics, proteomics, and metabolomics. I will also explain how these omics can be integrated into a multi-omics approach and summarize the goals pursued by other recent omics like phenomics and epigenomics.
In this article:
Genomics studies the organization of genes, the genetic information within the genome (the set of all genes within an organism), the methods utilized for collecting and analyzing this information, and the effect of this organization on the biological functionality of the genes.
Plants genomes have particularities such as genome size, gene content, length of repetitive sequences, and polyploidy/duplication events.
These features can vary considerably from one plant species to another. For instance, while the plant Genlisea margaretae has 63 megabase pairs (Mbp), Fritillaria assyriaca has 125 gigabase pairs (Gbp). It is, the second species that is 1900-fold bigger than the first.
Although there is no direct correlation between the genome size and the complexity of the plant species, more complex genomes tend to be larger than less complex genomes.
Researchers usually perform a genetic map and/or a physical map in plant genomics. These are approaches highly used for developing plant breeding programs.
The key unit in a genetic map is the use of molecular markers. A molecular marker in genomics is a fragment of DNA that is associated with a certain location within the genome.
A genetic map (also known as a linkage map) presents the order of molecular markers throughout chromosomes as well as the genetic distances between neighboring molecular markers. The distances are expressed in centiMorgans (cM) which is the distance between genes on the chromosome and is used to measure the recombination frequency. A cM is not a true physical distance, because it will depend on the recombination events happening in a specific species. To know more about cM in plants please click here.
On the other hand, physical maps are defined as a set of relatively large pieces of partially overlapping DNA encompassing a given chromosome and are measured in Mbp (Ayding, 2016).
In genomics, researchers are interested in performing a genome annotation. Genome annotations identify genes and give them a function.
But the first step before annotation is the masking step. Because plant genomes contain a variable number of repeats (repetitive DNA sequences), these sequences must be masked to allow downstream analysis. The process of masking involves transforming every nucleotide identified as a repeat to an ‘N’ or to a lower-case a, t, g, or c. The masking process can be done using tools such as RepeatMasker.
After [KM1] annotation steps, researchers proceed with structural annotation to predict the structure for protein-coding genes, nonprotein coding genes (RNA genes), and regulatory regions in genomic DNA.
The structural annotation can be done using 1) ab initio methods 2) homology-based/similarity-based methods, and 3) integrated methods. These categories rely on statistical and computational methods (ab initio) or use external data like expressed sequence tags (ESTs) (homology method) to determine gene-specific features such as promoters, start and stop codons, exons, and introns, and GC content, among others (Aydin, 2016).
Once the structure of a gene is identified, researchers convert them into a protein sequence and assign them a putative function using predicted protein databases such as UniProt.
- Molecular breeding
- Identification of economically important genes
- Comparative evolutionary genomics
Transcriptomics studies gene expression. The total amount of transcripts (or RNAs) present in a given moment within a cell/tissue is known as a transcriptome.
Different plant tissues and organs exhibit temporal and spatial differences in gene expression. Therefore, understanding and comparing these differences is important to reveal the gene regulatory networks, how plants express secondary metabolites and resistance genes, and how they respond to the surroundings.
Next-generation sequencing (NGS) like RNA-Seq and third-generation sequencing technology methods like PacBio single-molecule real-time sequencing technology (SMRT) and Oxford Nanopore sequencing are tools used to obtain the plant transcriptomes.
In transcriptomics, if there is an existing reference genome, researchers can use it as a reference and perform a reference assembly. If the plant under study does not have a sequenced genome, researchers can conduct a de novo assembly.
Once the assembly is done, the gene function annotation proceeds. The gene function annotation uses bioinformatics tools to compare unknown gene sequences with databases of genes with known functions.
- Gene expression analyses
- Gene quantitative analyses
- Splicing alternative events
- SNP analysis
- Mutation function analysis
Metabolomics is the identification and study of all metabolites present in a tissue. And the set of all metabolites within a tissue is known as a metabolome.
Recently, metabolomics research focused on understanding metabolites related to quality (like fruit quality) and yield traits.
Unlike the other omics (e.g., genomics and transcriptomics), in metabolomics, there is no “reference” that guides the identification of the vast array of metabolites from a single extract. Thus, metabolomics analyses are more challenging.
Two approaches are used to perform metabolomics: nuclear magnetic resonance (NMR) and the mass spectrometry (MS).
NMR is a non-destructive method extensively used to identify smaller molecular weight (<50 kDa) metabolites based on the utilization of magnetic properties of atomic nuclei under a magnetic field.
The MS method is based on the intrinsic properties of fragmented metabolites, such as mass and charge. MS has greater sensitivity than NMR and led to identifying novel molecules and biomarkers, and the reconstruction of metabolic pathways and networks.
After data generation by NMR or MS, the files should be converted to a format that is easy to interpret by metabolomics software. Then, the step of data cleaning allows you to remove biased data (outliers).
Next, researchers proceed with sample alignment against databases like Metlin, Mass Bank of North America, or standard tools for statistical analysis, and compound identification.
- Metabolite diversity analysis
- Identification of chemical biomarkers
- Metabolite comparative analyses
- Metabolic pathway enrichment analyses
Proteomics is the study of the protein population in a tissue sample, cell, or subcellular compartment. A proteome is a set of proteins expressed in an organism by its genome (Shahzad et al., 2016).
At present, cell wall, cell membrane, chloroplast, mitochondrial, and nucleus proteomic studies have been performed using the model plant Arabidopsis. Here, researchers identified the main types of proteins associated with each cell part.
Separation of protein is based on the charge, size, and conformation. Researchers can use, electrophoresis, 2D gel electrophoresis, and mass spectrometry.
Protein identification using electrophoresis is based on the electric field. As proteins carry a negative charge, the electric field forces its movement toward the other end. Therefore, protein migration depends upon its charge, size, and shape.
When 2D gel electrophoresis is used, the protein is first fractionated based on the isoelectric field of the protein charge (first dimension), and then is separated based on their molecular weight.
Finally, the common method to separate proteins in mass spectrometry is matrix-assisted laser desorption/ionization time of flight (MALDI-TOF).
MALDI-TOF is based on the mass-to-charge ratio of the peptide fragments that are produced by enzymatic cleavage of the parent protein using the trypsin enzyme.
After the data is obtained from MALDI-TOF, a similar approach used in metabolomics is also performed in proteomics. Here, the differences rely on the use of dedicated peptide databases.
- Posttranslational modification analyses
- Protein profiling
- Protein expression analyses
The multi-omics approach is used to perform conducting systems-level analyses. Here, genomics, transcriptomics, metabolomics, and proteomics are used here to carry out joint research.
Multi-omics allow comprehensively assimilating, annotating, and modeling large data sets. However, plants are especially challenging due to large poorly annotated genomes, multi-organelles, and diverse secondary metabolites.
The workflow in a multi-omic approach depends on the type of study and the questions to answer. Also, no matter which combination of omics researchers want to perform, they should define the level of the multi-omic approach. We could say there are three levels to integrate omics. The first is at the element-based level where genes, proteins, transcripts, metabolites are connected. The second is the pathway-based level, where signaling pathways are connected, and the last level where researchers use models is called mathematical-based integration.
With the evolution of sequencing technologies, more and more data are generated, and different and deep questions are being answered.
For instance, epigenomics describes the changes in the gene regulation that acts independently of changes in gene sequences. In other words, epigenomics studies changes in the chromatin and histones which do not alter the DNA sequence.
The physical interactions between proteins and other cellular entities are studied in interactomics. These interactions can be protein–protein interactions, between same or different types of proteins, or protein–DNA interactomics.
Furthermore, other omics complementary to genomics are emerging, such as phenomics, which allows a better understanding of the genotype–phenotype relation, that is, of the pathways that connect genotypes to phenotypes.
Argueso, C. T., Assmann, S. M., Birnbaum, K. D., Chen, S., Dinneny, J. R., Doherty, C. J., Eveland, A. L., Friesner, J., Greenlee, V. R., Law, J. A., Marshall-Colón, A., Mason, G. A., O’Lexy, R., Peck, S. C., Schmitz, R. J., Song, L., Stern, D., Varagona, M. J., Walley, J. W., & Williams, C. M. (2019). Directions for research and training in plant omics: Big Questions and Big Data. Plant Direct, 3(4), 1–16.
Crandall, S. G., Gold, K. M., Jiménez-Gasco, M. del M., Camila, Filgueiras, C., & Willett, D. S. (2020). A multi-omics approach to solving problems in plant disease ecology. PLoS ONE, 15(9 September), 1–23.
Fukushima, A., Kusano, M., Redestig, H., Arita, M., & Saito, K. (2009). Integrated omics approaches in plant systems biology. Current Opinion in Chemical Biology, 13(5–6), 532–538.
Guo, J., Huang, Z., Sun, J., Cui, X., & Liu, Y. (2021). Research Progress and Future Development Trends in Medicinal Plant Transcriptomics. Frontiers in Plant Science, 12(July), 1–10.
Hakeem, K. R., Tombuloğlu, H., & Tombuloğlu, G. (2016). Plant omics: Trends and applications. In Plant Omics: Trends and Applications.
Houle, D., Govindaraju, D. R., & Omholt, S. (2010). Phenomics: The next challenge. Nature Reviews Genetics, 11(12), 855–866.
Jamil, I. N., Remali, J., Azizan, K. A., Nor Muhammad, N. A., Arita, M., Goh, H. H., & Aizat, W. M. (2020). Systematic Multi-Omics Integration (MOI) Approach in Plant Systems Biology. Frontiers in Plant Science, 11(June).
Kalavacharla, V. (Kal), Subramani, M., Ayyappan, V., Dworkin, M. C., & Hayford, R. K. (2017). Chapter 16 – Plant Epigenomics. Handbook of Epigenetics, 245–258.
Kumar, R., Bohra, A., Pandey, A. K., Pandey, M. K., & Kumar, A. (2017). Metabolomics for plant improvement: Status and prospects. Frontiers in Plant Science, 8(August), 1–27.
Turumtay, H., Sandallı, C., & Turumtay, E. A. (2016). Plant metabolomics and strategies. In Plant Omics: Trends and Applications.