Protocol Preview: Long-read sequencing, Nature’s Method of the Year
Blog post by Jasmin Skinner
While the 2022 calendar year was filled with extensive scientific discovery and innovation, one method in particular made a substantial impact within the life sciences community. Long-read sequencing (LRS), also known as third-generation sequencing, has been named 2022’s Method of the Year by Nature Methods, and with very good reason (1). In this blog post, we’ll discuss what exactly long-read sequencing is, how it was developed, and some of its potential research uses and applications.
What is long-read sequencing?
Long-read sequencing is a recently developed method of genomic analysis, capable of sequencing strands of DNA that are 10,000-100,000 nucleotides in length (2) . Unlike short-read sequencing, which requires the genome to first be broken into small fragments of less than 1000 base pairs, LRS is capable of producing long, continuous strands of DNA without the need for amplification (2). This has allowed researchers to sequence areas of the genome that have been previously ‘dark’, such as the highly repetitive sequences found in telomeres (3). Previous technologies were unable to analyze these portions of the genome, making it near-impossible to study certain complex structures.
Why was this technique developed?
Next-generation sequencing (NGS), the predecessor of long-read sequencing, represented a significant methodological achievement when it was first introduced 20 years ago (1). It allowed scientists to accomplish in a single day what had previously required more than a decade of manual effort (4). However, despite the numerous benefits, it does have a number of limitations. Due to the sheer size and complexity of many genomes, it was difficult to generate complete genomic sequences with NGS, often resulting in many missing parts or errors (1). As the study of genomics fundamentally relies on identifying variants/mutations in the genome, any error introduced during sequencing could invalidate the results. This motivated the development of a sequencing technology that was capable of creating long, accurate sequences of DNA while also being able to detect some of the genomic complexity that was beyond the capabilities of previous methods (1).
Interested in learning more about current techniques in genomics? Check out this blog post in which we dive into the protocol for a novel method of in vivo gene therapy that involves utilizing peptides to deliver a gene-modifying enzyme. READ HERE
How does it work?
The two long-read sequencing technologies that currently dominate the market are both capable of traversing the more repetitive regions of the genome, but their differing methods of sequence detection impact their read length, accuracy, and throughput (2). Pacific Biosciences Single Molecule Real-Time (SMRT) sequencing incorporates fluorescently labeled dNTPs into the nascent strand, after which point they are detected following laser excitation (2). The fluorophore is then cleaved from the nucleotide before the next dNTP is added to the strand so that only a single base pair is recorded at a time. This technique is capable of sequencing a DNA insert of anywhere between 1 kb to over 100 kb, greatly surpassing the less than 300 base pair read derived with a previous short-read sequencing technology, Illumina (2).
The other major sequencing technology is produced by Oxford Nanopore Technologies (ONT), and utilizes a rather innovative and unique methodology. Linear DNA molecules are first attached to an adapter preloaded with a motor protein, and then loaded into a flow cell containing potentially thousands of nanopores embedded within a synthetic membrane (2). The motor protein unwinds the DNA and, coupled with an electric current, drives the DNA through the pore at a controlled rate. Depending on the nucleotide that passes through the pore a specific disruption in the current occurs, which allows the sequence of the DNA to be determined in real time. ONT sequencing has been capable of producing reads of over 1 Mb in length, over ten times longer than what is capable with Pacific Bioscience’s SMRT sequencing (2).
While these two techniques both produce longer strands than were previously possible, their differences in sequencing principles results in yield reads with varied lengths, error rates, and throughputs (1). Neither technology is without its respite drawbacks, but one technique may suit a particular research project better than the other.
What can you use long-read sequencing for?
Aside from its obvious use in sequencing a genome, long-read sequencing has seen extensive use within the fields of microbiology, transcriptomics, and epigenomics. For example, in microbial genomics, long-read sequencing has been very useful in creating high-quality ‘metagenome-assembled genomes’, and could pave the way towards development of a complete microbial phylogenetic tree (1).
The study of transcriptomes, which encompasses all coding and noncoding RNA, also benefits from the use of long-read sequencing. This technology has the potential to explain some of the hidden complexity and dynamism of the transcriptome, such as isoform structure and expression. As gene expression, intra-, and intermolecular interaction are critical to isoform diversity, being able to fully characterize these proteins will provide invaluable insight into their underlying mechanisms (1).
Lastly, long-read sequencing has also shown promising results within epigenomics and epitranscriptomics. This ever-growing area of research has been boosted by the ability of LRS to detect far more chemical modifications within DNA and RNA than previous detection methods. As so few DNA and RNA modifications have been comprehensively studied, this technology may facilitate a more comprehensive understanding of epigenetics, and interactions with down-stream gene expression (1).
What is the future of long-read sequencing?
While long-read sequencing does present many advantages and opportunities to further our understanding of the genome, it does have some limitations. For example, accuracy-per-read can be much lower than short-read sequencing technology, likely due to the speed at which the sequence is read (5). However, through the tandem use of these sequencing techniques, as well as by utilizing other methodologies like circular consensus sequencing, highly accurate reads of approximately 99.8% can be achieved (5). As this method is developed further and incorporated with existing approaches, it represents an exciting and significant step towards a more comprehensive and complete understanding of genomics.
About the Author
About the Author
Jasmin Skinner is an undergraduate student at the University of Western Ontario completing a Specialization in Biology and a Minor in Chemistry, with focused interest in applying these concepts to environmental conservation. As a lover of the outdoors and the arts, much of her time is spent in nature and within the local London art community, creating and connecting with all walks of life. After graduating, she hopes to continue her passion of finding unconventional solutions to environmental issues by working with nature, not against it.
- Method of the Year 2022: long-read sequencing. Nat Methods. 2023;20:1. DOI: 10.1038/s41592-022-01759-x.
- Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet. 2020;21(10):597-614. DOI: 10.1038/s41576-020-0236-x.
- Cechova M, Miga KH. Comprehensive variant discovery in the era of complete human reference genomes. Nat Methods 2023;20:17–9. DOI: 10.1038/s41592-022-01740-8.
- Behjati S, Tarpey PS. What is next generation sequencing? Arch Dis Child Educ Pract Ed. 2013;98(6):236-8. DOI: 10.1136/archdischild-2013-304340.
- Mobley I. Long-read sequencing vs short-read sequencing [Internet]. Winchester, Hampshire (UK): Front Line Genomics; 2021 [cited 2023 Feb 10]. Available from: https://frontlinegenomics.com/long-read-sequencing-vs-short-read-sequencing/#:~:text=Short%2Dread%20technologies%20carry%20out,be%20modified%20with%20identifying%20tags