Insufficiently complex unique-molecular identifiers (UMIs) distort small RNA sequencing.

Klay Saunders , Andrew G Bert , B Kate Dredge , John Toubia , Philip A Gregory , Katherine A Pillman , Gregory J Goodall , Cameron P Bracken

Sci Rep

Centre for Cancer Biology, University of South Australia and SA Pathology, Adelaide, SA, Australia.

Published: September 2020

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

The attachment of unique molecular identifiers (UMIs) to RNA molecules prior to PCR amplification and sequencing, makes it possible to amplify libraries to a level that is sufficient to identify rare molecules, whilst simultaneously eliminating PCR bias through the identification of duplicated reads. Accurate de-duplication is dependent upon a sufficiently complex pool of UMIs to allow unique labelling. In applications dealing with complex libraries, such as total RNA-seq, only a limited variety of UMIs are required as the variation in molecules to be sequenced is enormous. However, when sequencing a less complex library, such as small RNAs for which there is a more limited range of possible sequences, we find increased variation in UMIs are required, even beyond that provided in a commercial kit specifically designed for the preparation of small RNA libraries for sequencing. We show that a pool of UMIs randomly varying across eight nucleotides is not of sufficient depth to uniquely tag the microRNAs to be sequenced. This results in over de-duplication of reads and the marked under-estimation of expression of the more abundant microRNAs. Whilst still arguing for the utility of UMIs, this work demonstrates the importance of their considered design to avoid errors in the estimation of gene expression in libraries derived from select regions of the transcriptome or small genomes.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7471316	PMC
http://dx.doi.org/10.1038/s41598-020-71323-0	DOI Listing

Publication Analysis

Top Keywords

identifiers umis

small rna

pool umis

umis required

umis

insufficiently complex

complex unique-molecular

unique-molecular identifiers

umis distort

small

Similar Publications

Non-invasive fetal HPA genotyping by UMI-NGS: a robust method for antenatal diagnosis including 48 fetal DNA markers.

Blood Transfus

August 2025

EFS BloodCenter of Brittany, HLA-HPA Laboratory, Rennes, France.

Gerald Bertrand , Orlane Levallois , Cecilia Gonzalez Santesteban , Nuria Nogues , Virginie Renac

Background: Non-invasive fetal HPA typing is a valuable tool to identify the pregnancies at risk of fetal and neonatal alloimmune thrombocytopenia (FNAIT). Different approaches have been developed, mainly based on real-time PCR and droplet digital-PCR. Those methods have a limited ability to multiplex and require replicates due to the contamination risk.

View Article and Find Full Text PDF

Similar Publications

Tranquillyzer: A Flexible Neural Network Framework for Structural Annotation and Demultiplexing of Long-Read Transcriptomes.

bioRxiv

July 2025

Department of Epigenetics, Van Andel Research Institute, Grand Rapids, MI, USA.

Ayush Semwal , Jacob Morrison , Ian Beddows , Theron Palmer , Mary F Majewski

Long-read single-cell RNA sequencing using platforms such as Oxford Nanopore Technologies (ONT) enables full-length transcriptome profiling at single-cell resolution. However, high sequencing error rates, diverse library architectures, and increasing dataset scale introduce major challenges for accurately identifying cell barcodes (CBCs) and unique molecular identifiers (UMIs) - key prerequisites for reliable demultiplexing and deduplication, respectively. Existing pipelines rely on hard-coded heuristics or local transition rules that cannot fully capture this broader structural context and often fail to robustly interpret reads with indel-induced shifts, truncated segments, or non-canonical element ordering.

View Article and Find Full Text PDF

Similar Publications

High-resolution spatial transcriptomics in fixed tissue using a cost-effective PCL-seq workflow.

Genome Res

September 2025

Department of Orthopedics, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai 200092, China

Xue Dong , Mengzhu Hu , Xiaonan Cui , Wenjian Zhou , Jingtao Cai

The spatial heterogeneity of gene expression has driven the development of diverse spatial transcriptomics technologies. Here, we present photocleavage and ligation sequencing (PCL-seq), a spatial indexing method utilizing a light-controlled DNA labeling strategy applied to tissue sections. PCL-seq employs photocleavable oligonucleotides and ligation adapters to construct transcriptional profiles of specific regions of interest (ROIs) designated via microscopically controlled photo-illumination.

View Article and Find Full Text PDF

Similar Publications

A duplex sequencing approach for high-sensitivity detection of genome-edited plants.

Food Chem (Oxf)

December 2025

IGA Technology Services S.R.l., via Jacopo Linussio 51, I-33100 Udine, Italy.

Laura Bonfini , Moreno Colaiacovo , Cristian Savini , Christoph von Holst , Matteo Maretti

In this paper, we have evaluated a targeted high-throughput massive parallel sequencing approach for detecting single nucleotide mutations or small genomic changes generated by new genomic techniques (NGT). We used unique molecular identifiers (UMIs) for the quantification of the mutant alleles and duplex sequencing to confirm a mutation on both strands to avoid polymerase chain reaction (PCR) artefacts or sequencing miss-calls. We tested the approach in blinded analyses on a set of mixed NGT-modified tomato lines and identified each single nucleotide mutation or small insert/deletion (InDel) down to a 0.

View Article and Find Full Text PDF

Similar Publications

RNA-Based High-Throughput Sequencing of the Human Immunoglobulin Repertoire.

Methods Mol Biol

July 2025

Immunology Laboratory of Dupuytren Hospital University Center (CHU) of Limoges, Limoges, France.

Séléna Teillaud , Sébastien Bender , Virginie Pascal

Lymphocytes use somatic diversification processes to express a wide variability of antigen receptors, generating a highly diversified repertoire that is unique to each individual. The study of these repertoires is now possible with the advent of next-generation sequencing (NGS) techniques. Here we describe the "RACE Rep-Seq" methodology for high-throughput sequencing of immunoglobulin (Ig) repertoires using RNA templates.

View Article and Find Full Text PDF

Similar Publications