98%
921
2 minutes
20
Summary: VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing.
Availability And Implementation: Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim.
Contact: rd@bina.com
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410653 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btu828 | DOI Listing |
Genes Genomics
September 2025
Department of Clinical Laboratory, The First Affiliated Hospital of Guilin Medical University, Le Qun Road 15, Guilin, 541001, Guangxi, China.
Background: Lung cancer (LC) is the leading cause of cancer-related deaths globally. Genetic variants in mismatch repair (MMR) genes, such as MutS homolog 2 (MSH2), MutS homolog 6 (MSH6) and MutL homolog 1 (MLH1), may influence individual susceptibility and clinical outcomes in LC.
Objective: This study investigated the associations of genetic polymorphisms in MSH2, MSH6, and MLH1 with susceptibility and survival outcomes in lung cancer patients in the Guangxi Zhuang population.
J Virol
September 2025
Genome Regulation and Cell Signaling, Ellen and Ronald Caplan Cancer Center, The Wistar Institute, Philadelphia, Pennsylvania, USA.
Unlabelled: Adenoviruses are double-stranded DNA viruses widely used as platforms for vaccines, oncolytics, and gene delivery. However, tools for studying adenoviral gene expression in real time during infection remain limited. Here, we describe a set of fluorescent and bioluminescent reporter viruses built using the modular AdenoBuilder reverse genetics system and informed by high-resolution maps of Ad5 transcription.
View Article and Find Full Text PDFmSystems
September 2025
Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
A significant challenge in the field of microbiology is the functional annotation of novel genes from microbiomes. The increasing pace of sequencing technology development has made solving this challenge in a high-throughput manner even more important. Functional metagenomics offers a sequence-naive and cultivation-independent solution.
View Article and Find Full Text PDFCancer Med
September 2025
Division of Clinical & Translational Cancer Research, Medical Sciences Campus, University of Puerto Rico Comprehensive Cancer Center, San Juan, Puerto Rico.
Background: Gastric cancer (GC) is the fourth leading cause of cancer-related death globally. Tumor profiling has revealed actionable gene alterations that guide treatment strategies and enhance survival. Among Hispanics living in Puerto Rico (PRH), GC ranks among the top 10 causes of cancer-related death.
View Article and Find Full Text PDFGenome Biol
September 2025
Department of Clinical Pharmacy, Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, 90089, USA.
Background: Recent advances in high-throughput sequencing technologies have enabled the collection and sharing of a massive amount of omics data, along with its associated metadata-descriptive information that contextualizes the data, including phenotypic traits and experimental design. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data may hinder reproducibility and reusability and limit secondary analyses.
View Article and Find Full Text PDF