sciLaMA: A Single-Cell Representation Learning Framework to Leverage Prior Knowledge from Large Language Models.

Hongru Hu , Shuwen Zhang , Yongin Choi , Venkat S Malladi , Gerald Quon

bioRxiv

Department of Molecular and Cellular Biology, University of California, Davis, CA USA.

Published: May 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Single-cell RNA sequencing (scRNA-seq) enables high-resolution exploration of cellular diversity and gene regulation, yet analyzing such data remains challenging due to technical and methodological limitations. Existing task-specific deep generative models like Variational Auto-Encoder (VAE) and its variants struggle to incorporate external biological knowledge, while transformer-based foundational large Language Models (LLMs or large LaMs) face limitations in computational cost and applicability to tabular gene expression data. Here, we introduce sciLaMA (single-cell interpretable Language Model Adapter), a novel representation learning framework that bridges these gaps by integrating static gene embeddings from multimodal LLMs with scRNA-seq tabular data through a paired-VAE architecture. Our approach generates context-aware representations for both cells and genes and outperforms state-of-the-art methods in key single-cell downstream tasks, including batch effect correction, cell clustering, and cell-state-specific gene marker and module identification, while maintaining computational efficiency. sciLaMA offers a computationally efficient, unified framework for comprehensive single-cell data analysis and biologically interpretable gene module discovery. Source code is available at https://github.com/microsoft/sciLaMA.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12154950	PMC
http://dx.doi.org/10.1101/2025.01.28.635153	DOI Listing

Publication Analysis

Top Keywords

scilama single-cell

representation learning

learning framework

large language

language models

gene

single-cell representation

framework leverage

leverage prior

prior knowledge

Similar Publications

sciLaMA: A Single-Cell Representation Learning Framework to Leverage Prior Knowledge from Large Language Models.

bioRxiv

May 2025

Department of Molecular and Cellular Biology, University of California, Davis, CA USA.

Hongru Hu , Shuwen Zhang , Yongin Choi , Venkat S Malladi , Gerald Quon

View Article and Find Full Text PDF

Similar Publications