Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Copy Number Variations (CNVs) play pivotal roles in the etiology of complex diseases and are variable across diverse populations. Understanding the association between CNVs and disease susceptibility is of significant importance in disease genetics research and often requires analysis of large sample sizes. One of the most cost-effective and scalable methods for detecting CNVs is based on normalized signal intensity values, such as Log R Ratio (LRR) and B Allele Frequency (BAF), from Illumina genotyping arrays. In this study, we present CNV-Finder, a novel pipeline integrating deep learning techniques on array data, specifically a Long Short-Term Memory (LSTM) network, to expedite the large-scale identification of CNVs within predefined genomic regions. This facilitates the efficient prioritization of samples for subsequent, costly analyses such as short-read and long-read whole genome sequencing. We focus on five genes-Parkin (), Leucine Rich Repeat And Ig Domain Containing 2 (), Microtubule Associated Protein Tau (), alpha-Synuclein (), and Amyloid Beta Precursor Protein ()-which may be relevant to neurological diseases such as Alzheimer's disease (AD), Parkinson's disease (PD), or related disorders such as essential tremor (ET). By training our models on expert-annotated samples and validating them across diverse cohorts, including those from the Global Parkinson's Genetics Program (GP2) and additional dementia-specific databases, we demonstrate the efficacy of CNV-Finder in accurately detecting deletions and duplications. Our pipeline outputs app-compatible files for visualization within CNV-Finder's interactive web application. This interface enables researchers to review predictions and filter displayed samples by model prediction values, LRR range, and variant count in order to explore or confirm results. Our pipeline integrates this human feedback to enhance model performance and reduce false positive rates. Through a series of comprehensive analyses and validations using both short-read and long-read sequencing data, we demonstrate the robustness and adaptability of CNV-Finder in identifying CNVs with regions of varied sparsity, noise, and size. Our findings highlight the significance of contextual understanding and human expertise in enhancing the precision of CNV identification, particularly in complex genomic regions like 17q21.31. The CNV-Finder pipeline is a scalable, publicly available resource for the scientific community, available on GitHub (https://github.com/GP2code/CNV-Finder; DOI 10.5281/zenodo.14182563). CNV-Finder not only expedites accurate candidate identification but also significantly reduces the manual workload for researchers, enabling future targeted validation and downstream analyses in regions or phenotypes of interest.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601614PMC
http://dx.doi.org/10.1101/2024.11.22.624040DOI Listing

Publication Analysis

Top Keywords

copy number
8
genomic regions
8
short-read long-read
8
cnv-finder
6
cnvs
5
cnv-finder streamlining
4
streamlining copy
4
number variation
4
variation discovery
4
discovery copy
4

Similar Publications

Objective: The aim of this study was to determine the diagnostic value of prenatal chromosomal microarray analysis (CMA) for fetuses at high risk for various conditions on chromosomal abnormalities.

Methods: In the study, 8,560 clinical samples were collected from pregnant women between February 2018 and June 2022, including 75 villus, 7,642 amniotic fluid, and 843 umbilical cord blood samples. All samples were screening for chromosomal abnormalities using both CMA and karyotyping.

View Article and Find Full Text PDF

Background: The treatment of critically ill patients in intensive care units is becoming increasingly complex. For example, organ transplants are regularly carried out, the recipients are seriously ill, and the postoperative course can be complicated. This is why organ replacement and hemadsorption procedures are becoming increasingly important.

View Article and Find Full Text PDF

Background: Centromeres are crucial for precise chromosome segregation and maintaining genome stability during cell division. However, their evolutionary dynamics, particularly in polyploid organisms with complex genomic architectures, remain largely enigmatic. Allopolyploid wheat, with its well-defined hierarchical ploidy series and recent polyploidization history, serves as an excellent model to explore centromere evolution.

View Article and Find Full Text PDF

Introduction: Spinal muscular atrophy (SMA), caused by pathogenic variants in the survival motor neuron (SMN) gene, is the most common genetic cause of mortality in children under the age of two. Prior reports of obstetric sonograms performed in pregnancies with severe forms of fetal SMA have discrepant findings that may stem from a failure to account for the SMN2 copy number.

Methods: We present a neonate diagnosed with SMA type 0 postnatally (0SMN1/1SMN2 genotype).

View Article and Find Full Text PDF

SLICK1 is an allelic variant of the prolactin receptor () that is found in Senepol beef cattle. The presence of a single copy of this allele produces a short hair coat and confers heat tolerance. We aimed to determine the effect of 2 copies of this allele on milking performance of dairy cattle.

View Article and Find Full Text PDF