Sequencing accuracy and systematic errors of nanopore direct RNA sequencing.

Wang Liu-Wei , Wiep van der Toorn , Patrick Bohn , Martin Hölzer , Redmond P Smyth , Max von Kleist

BMC Genomics

Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany.

Published: May 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Direct RNA sequencing (dRNA-seq) on the Oxford Nanopore Technologies (ONT) platforms can produce reads covering up to full-length gene transcripts, while containing decipherable information about RNA base modifications and poly-A tail lengths. Although many published studies have been expanding the potential of dRNA-seq, its sequencing accuracy and error patterns remain understudied.

Results: We present the first comprehensive evaluation of sequencing accuracy and characterisation of systematic errors in dRNA-seq data from diverse organisms and synthetic in vitro transcribed RNAs. We found that for sequencing kits SQK-RNA001 and SQK-RNA002, the median read accuracy ranged from 87% to 92% across species, and deletions significantly outnumbered mismatches and insertions. Due to their high abundance in the transcriptome, heteropolymers and short homopolymers were the major contributors to the overall sequencing errors. We also observed systematic biases across all species at the levels of single nucleotides and motifs. In general, cytosine/uracil-rich regions were more likely to be erroneous than guanines and adenines. By examining raw signal data, we identified the underlying signal-level features potentially associated with the error patterns and their dependency on sequence contexts. While read quality scores can be used to approximate error rates at base and read levels, failure to detect DNA adapters may be a source of errors and data loss. By comparing distinct basecallers, we reason that some sequencing errors are attributable to signal insufficiency rather than algorithmic (basecalling) artefacts. Lastly, we generated dRNA-seq data using the latest SQK-RNA004 sequencing kit released at the end of 2023 and found that although the overall read accuracy increased, the systematic errors remain largely identical compared to the previous kits.

Conclusions: As the first systematic investigation of dRNA-seq errors, this study offers a comprehensive overview of reproducible error patterns across diverse datasets, identifies potential signal-level insufficiency, and lays the foundation for error correction methods.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11134706	PMC
http://dx.doi.org/10.1186/s12864-024-10440-w	DOI Listing

Publication Analysis

Top Keywords

sequencing accuracy

systematic errors

error patterns

sequencing

direct rna

rna sequencing

drna-seq data

read accuracy

sequencing errors

errors

A PHP Error was encountered