98%
921
2 minutes
20
Whole exome sequencing (WES)-based assays undergo rigorous validation before being implemented in diagnostic laboratories. This validation process generates experimental evidence that allows laboratories to predict the performance of the intended assay. The NA12878 Genome in a Bottle (GIAB) HapMap reference sample is commonly used for validation in diagnostic laboratories. We investigated what data points should be taken into consideration when validating WES-based assays using the GIAB reference in a diagnostic setting. We delineate specific factors that require special consideration and identify OMIM genes associated with diseases that may 'bypass' validation. Four replicates of the NA12878 sample were sequenced at the CHEO Genetics Diagnostic Laboratory on a NextSeq 500; the data were analyzed using the bcbio_nexgen v1.1.2 pipeline. The hap.py validation engine, Real Time Genomics vcfeval tool, and high confidence (HC) variant calls in HC regions available for the GIAB sample were used to validate the obtained variant calls. The same validation process was then used to evaluate variant calls obtained for the same sample by two other clinical diagnostic laboratories. We showed that variant calls in NA12878 can be confidently measured only in the regions that intersect between the GIAB HC regions and the target regions of exome capture. Of the 4139 (as of October 2019) OMIM genes associated with a phenotype and having a known molecular basis of disease, 84 were fully outside of the GIAB HC regions and many of the remaining OMIM genes were only partially covered by the HC regions. A significant proportion of variants identified in the NA12878 sample outside of the HC regions have unknown (UNK) status due to the absence of HC reference alleles. Verification of such calls is possible either by an alternative truth set or by orthogonal testing. Similarly, many variants outside of exome capture regions, if not accounted for, will be deemed false negatives due to insufficient probe coverage. Our results demonstrate the importance of the intersection between genomic regions of interest, capture regions, and the high confidence regions. If not considered, false and ambiguous variant calls could have a negative impact on diagnostic accuracy of the intended WES-based diagnostic assay and increase the need for confirmatory testing. To enable laboratories to identify 'problematic' regions and optimize validation efforts, we have made our VCF and BED files available in UCSC Genome Browser: NA12878 WES Benchmark. Relevant genes and genome annotations are evolving, we implemented a general purpose algorithm to cross-reference OMIM genes with the genomic regions of interest that can be applied to capture genes/regions outside HC regions (see repository of data material section).
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s00439-020-02201-y | DOI Listing |
mBio
September 2025
Department of Biology, Laboratory of Molecular Cell Biology, KU Leuven, Leuven, Flanders, Belgium.
Echinocandins, which target the fungal β-1,3-glucan synthase (Fks), are essential for treating invasive fungal infections, yet resistance is increasingly reported. While resistance typically arises through mutations in Fks hotspots, emerging evidence suggests a contributing role of changes in membrane sterol composition due to mutations. Here, we present a clinical case of () in which combined mutations in and , but not alone, appear to confer echinocandin resistance.
View Article and Find Full Text PDFBrief Bioinform
August 2025
Department of Respiratory Medicine, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157, Xiwu Road, Xincheng District, Xi'an 710004, China.
Accurate tumor mutation burden (TMB) quantification is critical for immunotherapy stratification, yet remains challenging due to variability across sequencing platforms, tumor heterogeneity, and variant calling pipelines. Here, we introduce TMBquant, an explainable AI-powered caller designed to optimize TMB estimation through dynamic feature selection, ensemble learning, and automated strategy adaptation. Built upon the H2O AutoML framework, TMBquant integrates variant features, minimizes classification errors, and enhances both accuracy and stability across diverse datasets.
View Article and Find Full Text PDFMicrobiol Spectr
September 2025
Instituto de Microbiologia Paulo de Góes, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil.
is a commensal bacterium that colonizes the gut of humans and animals and is a major opportunistic pathogen, known for causing multidrug-resistant healthcare-associated infections (HAIs). Its ability to thrive in diverse environments and disseminate antimicrobial resistance genes (ARGs) across ecological niches highlights the importance of understanding its ecological, evolutionary, and epidemiological dynamics. The CRISPR2 locus has been used as a valuable marker for assessing clonality and phylogenetic relationships in .
View Article and Find Full Text PDFNAR Genom Bioinform
September 2025
Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.
Advances in Oxford Nanopore Technologies (ONT) with the introduction of the r10.4.1 flow cell have reduced the sequencing error rates to <1%.
View Article and Find Full Text PDFMol Genet Genomics
September 2025
Human Phenome Institute, MOE Key Laboratory of Contemporary Anthropology, Zhangjiang Fudan International Innovation Center, Fudan University, 825 Zhangheng Road, Shanghai, 201203, China.
Accurate variant calling is essential for next-generation sequencing (NGS)-based diagnosis of rare diseases, yet most benchmarking studies have focused on standard cell lines or trio-based samples, with limited relevance to sporadic cases. Here, we systematically compared the performance of DeepVariant and GATK HaplotypeCaller in two Chinese cohorts of patients with sporadic epilepsy (EP) and autism spectrum disorder (ASD). DeepVariant exhibited higher precision and sensitivity in detecting single nucleotide variants (SNVs), while GATK showed a distinct advantage in identifying rare variants, which are often key to understanding the genetic basis of rare diseases.
View Article and Find Full Text PDF