SIMBA: single-cell embedding along with features.

Nat Methods

Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA.

Published: June 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Most current single-cell analysis pipelines are limited to cell embeddings and rely heavily on clustering, while lacking the ability to explicitly model interactions between different feature types. Furthermore, these methods are tailored to specific tasks, as distinct single-cell problems are formulated differently. To address these shortcomings, here we present SIMBA, a graph embedding method that jointly embeds single cells and their defining features, such as genes, chromatin-accessible regions and DNA sequences, into a common latent space. By leveraging the co-embedding of cells and features, SIMBA allows for the study of cellular heterogeneity, clustering-free marker discovery, gene regulation inference, batch effect removal and omics data integration. We show that SIMBA provides a single framework that allows diverse single-cell problems to be formulated in a unified way and thus simplifies the development of new analyses and extension to new single-cell modalities. SIMBA is implemented as a comprehensive Python library ( https://simba-bio.readthedocs.io ).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11166568PMC
http://dx.doi.org/10.1038/s41592-023-01899-8DOI Listing

Publication Analysis

Top Keywords

single-cell problems
8
problems formulated
8
simba
5
simba single-cell
4
single-cell embedding
4
embedding features
4
features current
4
single-cell
4
current single-cell
4
single-cell analysis
4

Similar Publications

Problem: Preeclampsia (PE) is a leading cause of perinatal maternal and fetal mortality. Clinical and pathological studies suggest that placental and decidual cell dysfunction may contribute to this condition. However, the pathogenesis of PE remains poorly understood.

View Article and Find Full Text PDF

Periodontal disease (PD) is a common and complex oral health problem that affects teeth and gums, leading to tooth loss, misalignment, and infection, with significant impact. Identifying the cause and developing new treatments is crucial. This study employed Mendelian randomization (MR), single-cell RNA sequencing (scRNA-seq), and integrated transcriptomics to identify key gene signatures associated with periodontitis.

View Article and Find Full Text PDF

() is one of the bacterial species capable of forming multilayered biofilms on implants. Such biofilms formed on implanted medical devices often require the removal of the implant in order to avoid sepsis or, in the worst case, even the death of the patient. To address the problem of unwanted biofilm formation, its first step, i.

View Article and Find Full Text PDF

We introduce the supervised Gromov-Wasserstein (sGW) optimal transport, an extension of Gromov-Wasserstein that incorporates potential infinity entries in the cost tensor. These infinity entries enable sGW to enforce application-induced constraints on preserving pairwise distance to a certain extent. A numerical solver is proposed for the sGW problem and the effectiveness is demonstrated in various numerical experiments.

View Article and Find Full Text PDF

Multi-omics characterization of individual cells offers remarkable potential for analyzing the dynamics and relationships of gene regulatory states across millions of cells. How to integrate multimodal data is an open problem, existing integration methods struggle with accuracy and modality-specific biological variation retention. In this paper, we present scHyper (scalable, interpretable machine learning for single cell integration), a low-code and data-efficient deep transfer model designed for integrating paired and unpaired single-cell multimodal data.

View Article and Find Full Text PDF