VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.

John C Mu , Marghoob Mohiyuddin , Jian Li , Narges Bani Asadi , Mark B Gerstein , Alexej Abyzov , Wing H Wong , Hugo Y K Lam

Bioinformatics

Department of Electrical Engineering, Stanford University, Stanford, CA 94035, USA, Department of Bioinformatics, Bina Technologies, Redwood City, CA 94065, USA, Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA, Mayo Clinics, Department of Health Science

Published: May 2015

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Summary: VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing.

Availability And Implementation: Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim.

Contact: rd@bina.com

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410653	PMC
http://dx.doi.org/10.1093/bioinformatics/btu828	DOI Listing

Publication Analysis

Top Keywords

high-throughput genome

genome sequencing

variants

varsim high-fidelity

high-fidelity simulation

simulation validation

validation framework

framework high-throughput

sequencing cancer

cancer applications

Similar Publications

Association of MSH2, MSH6, and MLH1 polymorphisms with susceptibility and survival in lung cancer patients.

Genes Genomics

September 2025

Department of Clinical Laboratory, The First Affiliated Hospital of Guilin Medical University, Le Qun Road 15, Guilin, 541001, Guangxi, China.

Jing Cheng , Chao Zuo , Dongli Yang , Yi Liu , Yu Wang

Background: Lung cancer (LC) is the leading cause of cancer-related deaths globally. Genetic variants in mismatch repair (MMR) genes, such as MutS homolog 2 (MSH2), MutS homolog 6 (MSH6) and MutL homolog 1 (MLH1), may influence individual susceptibility and clinical outcomes in LC.

Objective: This study investigated the associations of genetic polymorphisms in MSH2, MSH6, and MLH1 with susceptibility and survival outcomes in lung cancer patients in the Guangxi Zhuang population.

View Article and Find Full Text PDF

Similar Publications

Replication-competent adenovirus reporters utilizing endogenous viral expression architecture.

J Virol

September 2025

Genome Regulation and Cell Signaling, Ellen and Ronald Caplan Cancer Center, The Wistar Institute, Philadelphia, Pennsylvania, USA.

Claire M O'Brien , Lorenzo Serra , Molly R Patterson , Reyes W Acosta , Alison Yu

Unlabelled: Adenoviruses are double-stranded DNA viruses widely used as platforms for vaccines, oncolytics, and gene delivery. However, tools for studying adenoviral gene expression in real time during infection remain limited. Here, we describe a set of fluorescent and bioluminescent reporter viruses built using the modular AdenoBuilder reverse genetics system and informed by high-resolution maps of Ad5 transcription.

View Article and Find Full Text PDF

Similar Publications

Preparation of functional metagenomic libraries from low biomass samples using METa assembly and their application to capture antibiotic resistance genes.

mSystems

September 2025

Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

H M Allman , E P Bernate , E Franck , F J Oliaro , E M Hartmann

A significant challenge in the field of microbiology is the functional annotation of novel genes from microbiomes. The increasing pace of sequencing technology development has made solving this challenge in a high-throughput manner even more important. Functional metagenomics offers a sequence-naive and cultivation-independent solution.

View Article and Find Full Text PDF

Similar Publications

Actionable Genes and Carcinogenic Pathways for Gastric Cancer in Latinos.

Cancer Med

September 2025

Division of Clinical & Translational Cancer Research, Medical Sciences Campus, University of Puerto Rico Comprehensive Cancer Center, San Juan, Puerto Rico.

Ingrid M Montes-Rodríguez , Hilmaris Centeno-Girona , Sol V Pérez-Mártir , Noridza Rivera , Marcia Cruz-Correa

Background: Gastric cancer (GC) is the fourth leading cause of cancer-related death globally. Tumor profiling has revealed actionable gene alterations that guide treatment strategies and enhance survival. Among Hispanics living in Puerto Rico (PRH), GC ranks among the top 10 causes of cancer-related death.

View Article and Find Full Text PDF

Similar Publications

The systematic assessment of completeness of public metadata accompanying omics studies in the Gene Expression Omnibus data repository.

Genome Biol

September 2025

Department of Clinical Pharmacy, Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, 90089, USA.

Yu-Ning Huang , Pooja Vinod Jaiswal , Anushka Rajes , Anushka Yadav , Dottie Yu

Background: Recent advances in high-throughput sequencing technologies have enabled the collection and sharing of a massive amount of omics data, along with its associated metadata-descriptive information that contextualizes the data, including phenotypic traits and experimental design. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data may hinder reproducibility and reusability and limit secondary analyses.

View Article and Find Full Text PDF

Similar Publications