98%
921
2 minutes
20
Long-read sequencing (LRS) has revealed a far greater diversity of RNA isoforms than earlier technologies, increasing the critical need to determine which, and how many, isoforms per gene are biologically meaningful. To define the space of relevant isoforms from LRS, many existing analysis pipelines rely on arbitrary expression cutoffs, but a single threshold cannot accommodate the broad variability in isoform complexity across genes, cell-types, and disease states captured by LRS. To address this, we propose using -an interpretable measure derived from entropy-that quantifies the effective number of isoforms per gene based on the full, unfiltered isoform ratio distribution. Calculating perplexity for 124 ENCODE4 PacBio LRS datasets spanning 55 human cell types, we show that it provides intuitive assessments of isoform diversity and captures uncertainty across genes with varying complexity. Perplexity can be calculated at multiple gene regulatory levels-from transcript to protein-to compare how isoform diversity is reduced across stages of gene expression. On average, genes have an ORF-level perplexity of 2.1, indicating production of two distinct protein isoforms. We extended this analysis to evaluate expression variation across tissues and identified 4,593 ORFs across 3,102 genes with moderate to extreme tissue-specificity. We propose perplexity as a consistent, quantitative metric for interpreting isoform diversity across genes, cell types, and disease states. All results are compiled into a community resource to enable cross-study comparisons of novel isoforms.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12236620 | PMC |
http://dx.doi.org/10.1101/2025.07.02.662769 | DOI Listing |
FEBS Lett
September 2025
Laboratory of Molecular Diagnostics and Biotechnology, Institute of Bioorganic Chemistry of the National Academy of Sciences of Belarus, Minsk, Belarus.
Genetic variants of various cytochrome P450 (CYP) enzymes significantly impact pharmacokinetics. The highly polymorphic hepatic CYP2C9 metabolizes ~ 15% of clinically used drugs. This study aimed to characterize the ligand-binding properties of the understudied CYP2C9.
View Article and Find Full Text PDFEndocr Connect
September 2025
Dysfunction of several WD40 family proteins causes diverse endocrine diseases. Until recently, MEP50, a WD40 protein, was considered a Gene of Unknown Significance (GUS) because no inherited diseases had been linked to its function. However, genetic inactivation of MEP50 in mouse models or somatic mutations in humans drive oncogenesis in several endocrine-related cancers, including those of the prostate, breast, and uterus.
View Article and Find Full Text PDFRecursive splice sites are rare motifs postulated to facilitate splicing across massive introns and shape isoform diversity, especially for long, brain-expressed genes. The necessity of this unique mechanism remains unsubstantiated, as does the role of recursive splicing (RS) in human disease. From analyses of rare copy number variants (CNVs) from almost one million individuals, we previously identified large, heterozygous deletions eliminating an RS site (RS1) in the first intron of that conferred substantial risk for attention deficit hyperactivity disorder (ADHD) and other neurobehavioral traits.
View Article and Find Full Text PDFFront Cell Infect Microbiol
September 2025
Department of Biochemistry, Faculty of Biology and Biotechnology, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland.
Introduction: Glucose transporter (GLUT) research in parasitic nematodes focuses on identifying and characterizing developmentally regulated isoforms, elucidating their regulatory and structural properties, and evaluating their potential as drug targets. While glucose transport mechanisms have been well characterized in the free-living nematode , data on parasitic species remain limited. s.
View Article and Find Full Text PDFCell Rep
September 2025
Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 197 Ruijin Er Road, Shanghai 200025, China; School of Life Sciences and Biotechnology, Shang
Acute myeloid leukemia (AML) is a genetically complex and clinically heterogeneous hematopoietic malignancy. This study employs long-read transcriptome analysis using oxford nanopore technologies sequencing on 60 primary AML bone marrow samples. This approach delivers comprehensive isoform-level resolution of splicing abnormalities and overcomes limitations of short-read sequencing.
View Article and Find Full Text PDF