Interpretable prioritization of splice variants in diagnostic next-generation sequencing.

Am J Hum Genet

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA. Electronic address:

Published: September 2021


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8456162PMC
http://dx.doi.org/10.1016/j.ajhg.2021.06.014DOI Listing

Publication Analysis

Top Keywords

splice variants
20
variants diagnostic
8
highly conserved
8
variants
7
splice
6
squirls
5
interpretable prioritization
4
prioritization splice
4
diagnostic next-generation
4
next-generation sequencing
4

Similar Publications

Poultry egg production is shaped by the intertwined action of multiple physiological systems, greatly magnifying the complexity of its underlying genetic regulation. Although multitissue mapping of regulatory variants offers a powerful route to untangle this complexity, comprehensive data sets in ducks remain scarce. Meanwhile, the contributions of peripheral systems beyond neuroendocrine regulation on poultry egg production are still largely unexplored.

View Article and Find Full Text PDF

Nonsense-mediated mRNA decay (NMD) is a conserved RNA surveillance mechanism that degrades transcripts with premature termination codons (PTCs) and finetunes gene expression by targeting RNA transcripts with other NMD inducing features. This study demonstrates that conditional knockout of , a key NMD component, in oligodendrocyte lineage cells disrupts the degradation of PTC-containing transcripts, including aberrant variants of the RNA-binding protein The loss of SMG5 in both sexes of mice impaired oligodendrocyte differentiation, reduced myelin gene expression, and led to thinner myelin sheaths and compromised motor function in mice. Mechanistically, HNRNPL was shown to regulate the alternative splicing of myelin-associated genes and , and promote oligodendrocyte differentiation.

View Article and Find Full Text PDF

Purpose: Advancements in sequencing technologies have significantly improved clinical genetic testing, yet the diagnostic yield remains around 30-40%. Emerging technologies are now being deployed to address the remaining diagnostic gap.

Methods: We tested whether short-read genome sequencing could increase the diagnostic yield in individuals enrolled into the UCI-GREGoR research study, who had suspected Mendelian conditions and prior inconclusive testing.

View Article and Find Full Text PDF

Characterization of the extrinsic and intrinsic signatures and therapeutic vulnerability of small cell lung cancers.

Signal Transduct Target Ther

September 2025

State Key Laboratory of Molecular Oncology & Department of Medical Oncology & Department of Pathology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Small-cell lung cancer (SCLC), an aggressive neuroendocrine tumor strongly associated with exposure to tobacco carcinogens, is characterized by early dissemination and dismal prognosis with a five-year overall survival of less than 7%. High-frequency gain-of-function mutations in oncogenes are rarely reported, and intratumor heterogeneity (ITH) remains to be determined in SCLC. Here, via multiomics analyses of 314 SCLCs, we found that the ASCL1/MKI67 and ASCL1/CRIP2 clusters accounted for 74.

View Article and Find Full Text PDF

A clinical and genotype-phenotype analysis of MACF1 variants.

Am J Hum Genet

September 2025

Department of Clinical Genetics, Erasmus MC, University Medical Center Rotterdam, PO Box 2040, Rotterdam 3000 CA, the Netherlands.

Microtubule-actin cross-linking factor 1 (MACF1) is a large protein of the spectraplakin family, which is essential for brain development. MACF1 interacts with microtubules through the growth arrest-specific 2 (Gas2)-related (GAR) domain. Heterozygous MACF1 missense variants affecting the zinc-binding residues in this domain result in a distinctive cortical and brain stem malformation.

View Article and Find Full Text PDF