CNV-Finder: Streamlining Copy Number Variation Discovery.

Nicole Kuznetsov , Kensuke Daida , Mary B Makarious , Bashayer Al-Mubarak , Kajsa Atterling Brolin , Laksh Malik , Cedric Kouam , Breeana Baker , Miriam Ostrozovicova , Katherine M Andersh , Pin-Jui Kung , Yasser Mecheri , Yi-Wen Tay , Behloul Soundous Malek , Nada Al Tassan , Maria Teresa Periñan , Samantha Hong , Mathew Koretsky , Lana Sargeant , Kristin Levine

bioRxiv

Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.

Published: November 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Copy Number Variations (CNVs) play pivotal roles in the etiology of complex diseases and are variable across diverse populations. Understanding the association between CNVs and disease susceptibility is of significant importance in disease genetics research and often requires analysis of large sample sizes. One of the most cost-effective and scalable methods for detecting CNVs is based on normalized signal intensity values, such as Log R Ratio (LRR) and B Allele Frequency (BAF), from Illumina genotyping arrays. In this study, we present CNV-Finder, a novel pipeline integrating deep learning techniques on array data, specifically a Long Short-Term Memory (LSTM) network, to expedite the large-scale identification of CNVs within predefined genomic regions. This facilitates the efficient prioritization of samples for subsequent, costly analyses such as short-read and long-read whole genome sequencing. We focus on five genes-Parkin (), Leucine Rich Repeat And Ig Domain Containing 2 (), Microtubule Associated Protein Tau (), alpha-Synuclein (), and Amyloid Beta Precursor Protein ()-which may be relevant to neurological diseases such as Alzheimer's disease (AD), Parkinson's disease (PD), or related disorders such as essential tremor (ET). By training our models on expert-annotated samples and validating them across diverse cohorts, including those from the Global Parkinson's Genetics Program (GP2) and additional dementia-specific databases, we demonstrate the efficacy of CNV-Finder in accurately detecting deletions and duplications. Our pipeline outputs app-compatible files for visualization within CNV-Finder's interactive web application. This interface enables researchers to review predictions and filter displayed samples by model prediction values, LRR range, and variant count in order to explore or confirm results. Our pipeline integrates this human feedback to enhance model performance and reduce false positive rates. Through a series of comprehensive analyses and validations using both short-read and long-read sequencing data, we demonstrate the robustness and adaptability of CNV-Finder in identifying CNVs with regions of varied sparsity, noise, and size. Our findings highlight the significance of contextual understanding and human expertise in enhancing the precision of CNV identification, particularly in complex genomic regions like 17q21.31. The CNV-Finder pipeline is a scalable, publicly available resource for the scientific community, available on GitHub (https://github.com/GP2code/CNV-Finder; DOI 10.5281/zenodo.14182563). CNV-Finder not only expedites accurate candidate identification but also significantly reduces the manual workload for researchers, enabling future targeted validation and downstream analyses in regions or phenotypes of interest.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601614	PMC
http://dx.doi.org/10.1101/2024.11.22.624040	DOI Listing

Publication Analysis

Top Keywords

copy number

genomic regions

short-read long-read

cnv-finder

cnvs

cnv-finder streamlining

streamlining copy

number variation

variation discovery

discovery copy

Similar Publications

A retrospective study for the diagnostic value of chromosomal microarray analysis in fetuses with high-risk prenatal indications.

Front Genet

August 2025

Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, China.

Hui Xiao , Junfang Xiao , Huan Zhang , Shuhui Huang , Qing Lu

Objective: The aim of this study was to determine the diagnostic value of prenatal chromosomal microarray analysis (CMA) for fetuses at high risk for various conditions on chromosomal abnormalities.

Methods: In the study, 8,560 clinical samples were collected from pregnant women between February 2018 and June 2022, including 75 villus, 7,642 amniotic fluid, and 843 umbilical cord blood samples. All samples were screening for chromosomal abnormalities using both CMA and karyotyping.

View Article and Find Full Text PDF

Similar Publications

Challenges in the use of the CytoSorb adsorber in an intensive care patient with liver dysfunction of unknown origin: a case report.

J Med Case Rep

September 2025

Department of Anesthesiology, LMU University Hospital Munich LMU, Marchioninistrasse 15, 81377, Munich, Germany.

Caroline Gräfe , Michael Paal , Michael Irlbeck , Uwe Liebchen , Christina Scharf

Background: The treatment of critically ill patients in intensive care units is becoming increasingly complex. For example, organ transplants are regularly carried out, the recipients are seriously ill, and the postoperative course can be complicated. This is why organ replacement and hemadsorption procedures are becoming increasingly important.

View Article and Find Full Text PDF

Similar Publications

Distinct evolutionary trajectories of subgenomic centromeres in polyploid wheat.

Genome Biol

September 2025

Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 100101, Beijing, China.

Yuhong Huang , Yang Liu , Chang Liu , Congyang Yi , Jinsheng Lai

Background: Centromeres are crucial for precise chromosome segregation and maintaining genome stability during cell division. However, their evolutionary dynamics, particularly in polyploid organisms with complex genomic architectures, remain largely enigmatic. Allopolyploid wheat, with its well-defined hierarchical ploidy series and recent polyploidization history, serves as an excellent model to explore centromere evolution.

View Article and Find Full Text PDF

Similar Publications

Antenatal Ultrasound Findings in Spinal Muscular Atrophy Type 0.

Mol Genet Genomic Med

September 2025

Department of Maternal-Fetal Medicine, Augusta University, Augusta, Georgia, USA.

Stephanie Stokes , Madeline Snipes , Lee D Moore , Natalia Schlabritz-Lutsevich , Vidalin Amy

Introduction: Spinal muscular atrophy (SMA), caused by pathogenic variants in the survival motor neuron (SMN) gene, is the most common genetic cause of mortality in children under the age of two. Prior reports of obstetric sonograms performed in pregnancies with severe forms of fetal SMA have discrepant findings that may stem from a failure to account for the SMN2 copy number.

Methods: We present a neonate diagnosed with SMA type 0 postnatally (0SMN1/1SMN2 genotype).

View Article and Find Full Text PDF

Similar Publications

The relative milk production of dairy cattle in tropical Costa Rica that are heterozygous and homozygous for the SLICK1 allele.

JDS Commun

September 2025

Livestock Improvement Corporation Ltd., Newstead, Hamilton 3240, New Zealand.

E G Donkersloot , A M Winkelman , I L Leathwick , J A Arias , J Manuel-Sanchez

SLICK1 is an allelic variant of the prolactin receptor () that is found in Senepol beef cattle. The presence of a single copy of this allele produces a short hair coat and confers heat tolerance. We aimed to determine the effect of 2 copies of this allele on milking performance of dairy cattle.

View Article and Find Full Text PDF

Similar Publications