Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

We introduce Cell2Sentence (C2S), a novel method to directly adapt large language models to a biological context, specifically single-cell transcriptomics. By transforming gene expression data into "cell sentences," C2S bridges the gap between natural language processing and biology. We demonstrate cell sentences enable the finetuning of language models for diverse tasks in biology, including cell generation, complex celltype annotation, and direct data-driven text generation. Our experiments reveal that GPT-2, when fine-tuned with C2S, can generate biologically valid cells based on cell type inputs, and accurately predict cell types from cell sentences. This illustrates that language models, through C2S fine-tuning, can acquire a significant understanding of single-cell biology while maintaining robust text generation capabilities. C2S offers a flexible, accessible framework to integrate natural language processing with transcriptomics, utilizing existing models and libraries for a wide range of biological applications.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11565894PMC
http://dx.doi.org/10.1101/2023.09.11.557287DOI Listing

Publication Analysis

Top Keywords

language models
16
large language
8
natural language
8
language processing
8
cell sentences
8
text generation
8
language
7
models
5
c2s
5
cell
5

Similar Publications

Large language models (LLMs) have been successfully used for data extraction from free-text radiology reports. Most current studies were conducted with LLMs accessed via an application programming interface (API). We evaluated the feasibility of using open-source LLMs, deployed on limited local hardware resources for data extraction from free-text mammography reports, using a common data element (CDE)-based structure.

View Article and Find Full Text PDF

To develop and validate a deep-learning-based algorithm for automatic identification of anatomical landmarks and calculating femoral and tibial version angles (FTT angles) on lower-extremity CT scans. In this IRB-approved, retrospective study, lower-extremity CT scans from 270 adult patients (median age, 69 years; female to male ratio, 235:35) were analyzed. CT data were preprocessed using contrast-limited adaptive histogram equalization and RGB superposition to enhance tissue boundary distinction.

View Article and Find Full Text PDF

The fruit fly Anastrepha fraterculus (Wiedemann) (Diptera: Tephritidae) is one of the main pests in apple orchards. Artificial neural networks (ANNs) are tools with good ability to predict phenomena such as the seasonal dynamics of pest populations. Thus, the objective of this work was to determine a prediction model for the seasonal dynamics of A.

View Article and Find Full Text PDF

Harmonizing mouse anatomy terminology: a common language?

Mamm Genome

September 2025

Department of Animal Health and Anatomy, Center for Animal Biotechnology and Gene Therapy, Universitat Autònoma de Barcelona, Travessera Dels Turons, 08193, Cerdanyola del Vallès, Barcelona, Spain.

The mouse remains the principal animal model for investigating human diseases due, among other reasons, to its anatomical similarities to humans. Despite its widespread use, the assumption that mouse anatomy is a fully established field with standardized and universally accepted terminology is misleading. Many phenotypic anatomical annotations do not refer to the authority or origin of the terminology used, while others inappropriately adopt outdated or human-centric nomenclature.

View Article and Find Full Text PDF