An integrated strain-level analytic pipeline utilizing longitudinal metagenomic data.

Boyan Zhou , Chan Wang , Gregory Putzel , Jiyuan Hu , Menghan Liu , Fen Wu , Yu Chen , Alejandro Pironti , Huilin Li

Microbiol Spectr

Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, New York, USA.

Published: November 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Unlabelled: With the development of sequencing technology and analytic tools, studying within-species variations enhances the understanding of microbial biological processes. Nevertheless, most existing methods designed for strain-level analysis lack the capability to concurrently assess both strain proportions and genome-wide single nucleotide variants (SNVs) across longitudinal metagenomic samples. In this study, we introduce LongStrain, an integrated pipeline for the analysis of large-scale metagenomic data from individuals with longitudinal or repeated samples. In LongStrain, we first utilize two efficient tools, Kraken2 and Bowtie2, for the taxonomic classification and alignment of sequencing reads, respectively. Subsequently, we propose to jointly model strain proportions and shared haplotypes across samples within individuals. This approach specifically targets tracking a primary strain and a secondary strain for each subject, providing their respective proportions and SNVs as output. With extensive simulation studies of a microbial community and single species, our results demonstrate that LongStrain is superior to two genotyping methods and two deconvolution methods across a majority of scenarios. Furthermore, we illustrate the potential applications of LongStrain in the real data analysis of The Environmental Determinants of Diabetes in the Young study and a gastric intestinal metaplasia microbiome study. In summary, the proposed analytic pipeline demonstrates marked statistical efficiency over the same type of methods and has great potential in understanding the genomic variants and dynamic changes at strain level. LongStrain and its tutorial are freely available online at https://github.com/BoyanZhou/LongStrain.

Importance: The advancement in DNA-sequencing technology has enabled the high-resolution identification of microorganisms in microbial communities. Since different microbial strains within species may contain extreme phenotypic variability (e.g., nutrition metabolism, antibiotic resistance, and pathogen virulence), investigating within-species variations holds great scientific promise in understanding the underlying mechanism of microbial biological processes. To fully utilize the shared genomic variants across longitudinal metagenomics samples collected in microbiome studies, we develop an integrated analytic pipeline (LongStrain) for longitudinal metagenomics data. It concurrently leverages the information on proportions of mapped reads for individual strains and genome-wide SNVs to enhance the efficiency and accuracy of strain identification. Our method helps to understand strains' dynamic changes and their association with genome-wide variants. Given the fast-growing longitudinal studies of microbial communities, LongStrain which streamlines analyses of large-scale raw sequencing data should be of great value in microbiome research communities.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11542597	PMC
http://dx.doi.org/10.1128/spectrum.01431-24	DOI Listing

Publication Analysis

Top Keywords

analytic pipeline

longitudinal metagenomic

metagenomic data

within-species variations

microbial biological

biological processes

strain proportions

studies microbial

genomic variants

dynamic changes

Similar Publications

OmnibusX: A unified platform for accessible multi-omics analysis.

PLoS Comput Biol

September 2025

OmnibusXLab, OmnibusX Company Limited, Ho Chi Minh City, Vietnam.

Linh Truong , Thao Truong , Huy Nguyen

OmnibusX is an integrated, privacy-centric platform that enables code-free multi-omics data analysis by bridging computational methodologies with user-friendly interfaces. Designed to overcome challenges posed by fragmented analytical tools and high computational barriers, OmnibusX consolidates workflows for diverse technologies - including bulk RNA-seq, single-cell RNA-seq, single-cell ATAC-seq, and spatial transcriptomics - into a single, cohesive application. The application integrates established open-source tools such as Scanpy, DESeq2, SciPy, and scikit-learn into transparent, reproducible pipelines, offering users control over analytical parameters.

View Article and Find Full Text PDF

Similar Publications

Large-scale comparative analysis reveals top graph signal processing features for subject identification.

bioRxiv

August 2025

Thomas A W Bolton , Mikkel Schöttner , Jagruti Patel , Hugo Fluhr , Yasser Alemán-Gómez

Unlabelled: In magnetic resonance imaging, graph signal processing (GSP) is an analytical framework that enables to express regional functional activity time courses in terms of the underlying structural connectivity backbone. To this end, several parameters must be set during the processing of structural and functional data, and a variety of output features have been proposed. While emerging applications of the GSP framework have shown clear merits to reveal the neural underpinnings of brain disorders, behavioural facets or individuality, at present, the optimal parameter choices and feature types for an outcome of interest remain unknown.

View Article and Find Full Text PDF

Similar Publications

Decoding Xylem Development in Flowering Plants: Insights From Single-Cell Transcriptomics.

Plant Cell Environ

September 2025

Institute of Plant Biology, College of Life Science, National Taiwan University, Taipei, Taiwan.

Jhong-He Yu , Jo-Wei Allison Hsieh , Zhifeng Wang , Jia Wei , Quanzi Li

Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative tool for decoding plant development, particularly in elucidating xylem differentiation. By capturing transcriptomic changes at single-cell resolution, scRNA-seq enables reconstruction of developmental trajectories across diverse plant tissues. In this review, we summarize recent advances in the application of scRNA-seq to study both primary and secondary xylem development in monocots and eudicots.

View Article and Find Full Text PDF

Similar Publications

Reproducible single-cell annotation of programs underlying T cell subsets, activation states and functions.

Nat Methods

September 2025

Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.

Dylan Kotliar , Michelle Curtis , Ryan Agnew , Kathryn Weinand , Aparna Nathan

T cells recognize antigens and induce specialized gene expression programs (GEPs), enabling functions like proliferation, cytotoxicity and cytokine production. Traditionally, different T cell classes are thought to exhibit mutually exclusive responses, including T1, T2 and T17 programs. However, single-cell RNA sequencing has revealed a continuum of T cell states without clearly distinct subsets, necessitating new analytical frameworks.

View Article and Find Full Text PDF

Similar Publications

Glucose360: An Open-Source Python Platform with Event-Based Integration for Continuous Glucose Monitoring Data Analysis.

Diabetes Technol Ther

September 2025

Department of Genetics, Stanford University, Stanford, California, USA.

Ben Ehlert , Dhruv Aron , Dalia Perelman , Yue Wu , Michael P Snyder

Continuous glucose monitoring (CGM) devices provide real-time actionable data on blood glucose levels, making them essential tools for effective glucose management. Integrating blood glucose data with food log data is crucial for understanding how dietary choices impact glucose levels. Despite their utility, many CGM applications lack integration with other external services, such as food trackers, and do not generate useful glycemic variability (GV) metrics or advanced visualizations.

View Article and Find Full Text PDF

Similar Publications