INSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis.

Kai Zhao , Sen Huang , Cuichan Lin , Pak Chung Sham , Hon-Cheong So , Zhixiang Lin

PLoS Genet

Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.

Published: March 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https://github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared low-rank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of > = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing 'adjusted' expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10965063	PMC
http://dx.doi.org/10.1371/journal.pgen.1011189	DOI Listing

Publication Analysis

Top Keywords

biological variables

data

data analysis

missing data

insider

biological

variables

insider interpretable

interpretable sparse

sparse matrix

A PHP Error was encountered