98%
921
2 minutes
20
Copy Number Variations (CNVs) play pivotal roles in the etiology of complex diseases and are variable across diverse populations. Understanding the association between CNVs and disease susceptibility is of significant importance in disease genetics research and often requires analysis of large sample sizes. One of the most cost-effective and scalable methods for detecting CNVs is based on normalized signal intensity values, such as Log R Ratio (LRR) and B Allele Frequency (BAF), from Illumina genotyping arrays. In this study, we present CNV-Finder, a novel pipeline integrating deep learning techniques on array data, specifically a Long Short-Term Memory (LSTM) network, to expedite the large-scale identification of CNVs within predefined genomic regions. This facilitates the efficient prioritization of samples for subsequent, costly analyses such as short-read and long-read whole genome sequencing. We focus on five genes-Parkin (), Leucine Rich Repeat And Ig Domain Containing 2 (), Microtubule Associated Protein Tau (), alpha-Synuclein (), and Amyloid Beta Precursor Protein ()-which may be relevant to neurological diseases such as Alzheimer's disease (AD), Parkinson's disease (PD), or related disorders such as essential tremor (ET). By training our models on expert-annotated samples and validating them across diverse cohorts, including those from the Global Parkinson's Genetics Program (GP2) and additional dementia-specific databases, we demonstrate the efficacy of CNV-Finder in accurately detecting deletions and duplications. Our pipeline outputs app-compatible files for visualization within CNV-Finder's interactive web application. This interface enables researchers to review predictions and filter displayed samples by model prediction values, LRR range, and variant count in order to explore or confirm results. Our pipeline integrates this human feedback to enhance model performance and reduce false positive rates. Through a series of comprehensive analyses and validations using both short-read and long-read sequencing data, we demonstrate the robustness and adaptability of CNV-Finder in identifying CNVs with regions of varied sparsity, noise, and size. Our findings highlight the significance of contextual understanding and human expertise in enhancing the precision of CNV identification, particularly in complex genomic regions like 17q21.31. The CNV-Finder pipeline is a scalable, publicly available resource for the scientific community, available on GitHub (https://github.com/GP2code/CNV-Finder; DOI 10.5281/zenodo.14182563). CNV-Finder not only expedites accurate candidate identification but also significantly reduces the manual workload for researchers, enabling future targeted validation and downstream analyses in regions or phenotypes of interest.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601614 | PMC |
http://dx.doi.org/10.1101/2024.11.22.624040 | DOI Listing |
Front Genet
August 2025
Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, China.
Objective: The aim of this study was to determine the diagnostic value of prenatal chromosomal microarray analysis (CMA) for fetuses at high risk for various conditions on chromosomal abnormalities.
Methods: In the study, 8,560 clinical samples were collected from pregnant women between February 2018 and June 2022, including 75 villus, 7,642 amniotic fluid, and 843 umbilical cord blood samples. All samples were screening for chromosomal abnormalities using both CMA and karyotyping.
J Med Case Rep
September 2025
Department of Anesthesiology, LMU University Hospital Munich LMU, Marchioninistrasse 15, 81377, Munich, Germany.
Background: The treatment of critically ill patients in intensive care units is becoming increasingly complex. For example, organ transplants are regularly carried out, the recipients are seriously ill, and the postoperative course can be complicated. This is why organ replacement and hemadsorption procedures are becoming increasingly important.
View Article and Find Full Text PDFGenome Biol
September 2025
Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 100101, Beijing, China.
Background: Centromeres are crucial for precise chromosome segregation and maintaining genome stability during cell division. However, their evolutionary dynamics, particularly in polyploid organisms with complex genomic architectures, remain largely enigmatic. Allopolyploid wheat, with its well-defined hierarchical ploidy series and recent polyploidization history, serves as an excellent model to explore centromere evolution.
View Article and Find Full Text PDFMol Genet Genomic Med
September 2025
Department of Maternal-Fetal Medicine, Augusta University, Augusta, Georgia, USA.
Introduction: Spinal muscular atrophy (SMA), caused by pathogenic variants in the survival motor neuron (SMN) gene, is the most common genetic cause of mortality in children under the age of two. Prior reports of obstetric sonograms performed in pregnancies with severe forms of fetal SMA have discrepant findings that may stem from a failure to account for the SMN2 copy number.
Methods: We present a neonate diagnosed with SMA type 0 postnatally (0SMN1/1SMN2 genotype).
JDS Commun
September 2025
Livestock Improvement Corporation Ltd., Newstead, Hamilton 3240, New Zealand.
SLICK1 is an allelic variant of the prolactin receptor () that is found in Senepol beef cattle. The presence of a single copy of this allele produces a short hair coat and confers heat tolerance. We aimed to determine the effect of 2 copies of this allele on milking performance of dairy cattle.
View Article and Find Full Text PDF