Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Motivation: Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost.

Results: We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study.

Conclusions: We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging 'big data' problems in biomedical research brought on by the expansion of NGS technologies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4580299PMC
http://dx.doi.org/10.1186/s12859-015-0736-4DOI Listing

Publication Analysis

Top Keywords

variant calling
16
human genomes
16
ngs technologies
12
group-based variant
8
calling variants
8
calling
6
calling leveraging
4
leveraging next-generation
4
next-generation supercomputing
4
supercomputing large-scale
4

Similar Publications

Whole blood (WB) transcriptomics offers a minimal-invasive method to assess patients' immune system. This study aimed to identify transcriptional patterns in WB associated with clinical outcomes in patients treated with immune checkpoint inhibitors (ICIs). We performed RNA-sequencing on pre-treatment WB samples from 145 patients with advanced cancer.

View Article and Find Full Text PDF

Echinocandins, which target the fungal β-1,3-glucan synthase (Fks), are essential for treating invasive fungal infections, yet resistance is increasingly reported. While resistance typically arises through mutations in Fks hotspots, emerging evidence suggests a contributing role of changes in membrane sterol composition due to mutations. Here, we present a clinical case of () in which combined mutations in and , but not alone, appear to confer echinocandin resistance.

View Article and Find Full Text PDF

Accurate tumor mutation burden (TMB) quantification is critical for immunotherapy stratification, yet remains challenging due to variability across sequencing platforms, tumor heterogeneity, and variant calling pipelines. Here, we introduce TMBquant, an explainable AI-powered caller designed to optimize TMB estimation through dynamic feature selection, ensemble learning, and automated strategy adaptation. Built upon the H2O AutoML framework, TMBquant integrates variant features, minimizes classification errors, and enhances both accuracy and stability across diverse datasets.

View Article and Find Full Text PDF

is a commensal bacterium that colonizes the gut of humans and animals and is a major opportunistic pathogen, known for causing multidrug-resistant healthcare-associated infections (HAIs). Its ability to thrive in diverse environments and disseminate antimicrobial resistance genes (ARGs) across ecological niches highlights the importance of understanding its ecological, evolutionary, and epidemiological dynamics. The CRISPR2 locus has been used as a valuable marker for assessing clonality and phylogenetic relationships in .

View Article and Find Full Text PDF

Advances in Oxford Nanopore Technologies (ONT) with the introduction of the r10.4.1 flow cell have reduced the sequencing error rates to <1%.

View Article and Find Full Text PDF