98%
921
2 minutes
20
Simulation of realistic omics data is a key input for benchmarking studies that help users obtain optimal computational pipelines. Omics data involves large numbers of measured features on each sample and these measures are generally correlated with each other. However, simulation too often ignores these correlations, perhaps due to computational and statistical hurdles of doing so. To alleviate this, we describe three approaches for generating omics-scale data with correlated measures which mimic real datasets. These approaches are all based on a Gaussian copula approach with a covariance matrix that decomposes into a diagonal part and a low-rank part. This decomposition allows for extremely efficient simulation, overcoming a hurdle for adoption of past methods. We use these approaches to demonstrate the importance of including correlation in two benchmarking applications. First, we show that variance of results from the popular DESeq2 method increases when dependence is included. Second, we demonstrate that CYCLOPS, a method for inferring circadian time of collection from transcriptomics, improves in performance when given gene-gene dependencies in some circumstances. We provide an R package, dependentsimr, that has efficient implementations of these methods and can generate dependent data with arbitrary marginal distributions, including discrete (binary, ordered categorical, Poisson, negative binomial), continuous (normal), or with an empirical distribution.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1371/journal.pcbi.1013392 | DOI Listing |
PLoS Comput Biol
September 2025
Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, United States of America.
Biology has been transformed by the rapid development of computing and the concurrent rise of data-rich approaches such as, omics or high-resolution imaging. However, there is a persistent computational skills gap in the biomedical research workforce. Inherent limitations of classroom teaching and institutional core support highlight the need for accessible ways for researchers to explore developments in computational biology.
View Article and Find Full Text PDFBioinformatics
September 2025
Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania United States.
Summary: Causal mediation analysis investigates the role of mediators in the relationship between exposure and outcome. In the analysis of omics or imaging data, mediators are often high-dimensional, presenting challenges such as multicollinearity and interpretability. Existing methods either compromise interpretability or fail to effectively prioritize mediators.
View Article and Find Full Text PDFBioinformatics
September 2025
Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
Motivation: RNA velocity has become a powerful tool for uncovering transcriptional dynamics in snapshot single-cell data. However, current RNA velocity approaches often assume constant transcriptional rates and treat genes independently with gene-specific times, which may introduce biases and deviate from biological realities. Here, we present InterVelo, a novel deep learning framework that simultaneously learns cellular pseudotime and RNA velocity.
View Article and Find Full Text PDFMol Omics
September 2025
Laboratory of Structural Bioinformatics and Computational Biology, Federal University of Rio Grande do Sul, Av. Bento Gonçalves, 9500, Porto Alegre 91501-970, RS, Brazil.
The integration of multimodal single-cell omics data is a state-of-art strategy for deciphering cellular heterogeneity and gene regulatory mechanisms. Recent advances in single-cell technologies have enabled the comprehensive characterization of cellular states and their interactions. However, integrating these high-dimensional and heterogeneous datasets poses significant computational challenges, including batch effects, sparsity, and modality alignment.
View Article and Find Full Text PDFIEEE Trans Comput Biol Bioinform
September 2025
The rapid advancement of single-cell sequencing technology has generated vast amounts of multi-omics data, presenting unprecedented opportunities for single-cell multi-omics clustering analysis. However, existing single-cell clustering algorithms focus on extracting shared representations, overlooking the interactions and correlations among cells. This oversight inevitably leads to biased or confounded cell clustering results.
View Article and Find Full Text PDF