Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Long-read sequencing (LRS) has revealed a far greater diversity of RNA isoforms than earlier technologies, increasing the critical need to determine which, and how many, isoforms per gene are biologically meaningful. To define the space of relevant isoforms from LRS, many existing analysis pipelines rely on arbitrary expression cutoffs, but a single threshold cannot accommodate the broad variability in isoform complexity across genes, cell-types, and disease states captured by LRS. To address this, we propose using -an interpretable measure derived from entropy-that quantifies the effective number of isoforms per gene based on the full, unfiltered isoform ratio distribution. Calculating perplexity for 124 ENCODE4 PacBio LRS datasets spanning 55 human cell types, we show that it provides intuitive assessments of isoform diversity and captures uncertainty across genes with varying complexity. Perplexity can be calculated at multiple gene regulatory levels-from transcript to protein-to compare how isoform diversity is reduced across stages of gene expression. On average, genes have an ORF-level perplexity of 2.1, indicating production of two distinct protein isoforms. We extended this analysis to evaluate expression variation across tissues and identified 4,593 ORFs across 3,102 genes with moderate to extreme tissue-specificity. We propose perplexity as a consistent, quantitative metric for interpreting isoform diversity across genes, cell types, and disease states. All results are compiled into a community resource to enable cross-study comparisons of novel isoforms.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12236620PMC
http://dx.doi.org/10.1101/2025.07.02.662769DOI Listing

Publication Analysis

Top Keywords

isoform diversity
16
isoforms gene
8
disease states
8
cell types
8
isoform
6
isoforms
6
perplexity
5
diversity
5
genes
5
perplexity metric
4

Similar Publications

Genetic variants of various cytochrome P450 (CYP) enzymes significantly impact pharmacokinetics. The highly polymorphic hepatic CYP2C9 metabolizes ~ 15% of clinically used drugs. This study aimed to characterize the ligand-binding properties of the understudied CYP2C9.

View Article and Find Full Text PDF

Dysfunction of several WD40 family proteins causes diverse endocrine diseases. Until recently, MEP50, a WD40 protein, was considered a Gene of Unknown Significance (GUS) because no inherited diseases had been linked to its function. However, genetic inactivation of MEP50 in mouse models or somatic mutations in humans drive oncogenesis in several endocrine-related cancers, including those of the prostate, breast, and uterus.

View Article and Find Full Text PDF

Recursive splice sites are rare motifs postulated to facilitate splicing across massive introns and shape isoform diversity, especially for long, brain-expressed genes. The necessity of this unique mechanism remains unsubstantiated, as does the role of recursive splicing (RS) in human disease. From analyses of rare copy number variants (CNVs) from almost one million individuals, we previously identified large, heterozygous deletions eliminating an RS site (RS1) in the first intron of that conferred substantial risk for attention deficit hyperactivity disorder (ADHD) and other neurobehavioral traits.

View Article and Find Full Text PDF

Introduction: Glucose transporter (GLUT) research in parasitic nematodes focuses on identifying and characterizing developmentally regulated isoforms, elucidating their regulatory and structural properties, and evaluating their potential as drug targets. While glucose transport mechanisms have been well characterized in the free-living nematode , data on parasitic species remain limited. s.

View Article and Find Full Text PDF

Transcript isoform diversity defines molecular subtypes and prognosis in acute myeloid leukemia through long-read sequencing.

Cell Rep

September 2025

Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 197 Ruijin Er Road, Shanghai 200025, China; School of Life Sciences and Biotechnology, Shang

Acute myeloid leukemia (AML) is a genetically complex and clinically heterogeneous hematopoietic malignancy. This study employs long-read transcriptome analysis using oxford nanopore technologies sequencing on 60 primary AML bone marrow samples. This approach delivers comprehensive isoform-level resolution of splicing abnormalities and overcomes limitations of short-read sequencing.

View Article and Find Full Text PDF