SCARAP: scalable cross-species comparative genomics of prokaryotes.

Stijn Wittouck , Tom Eilers , Vera van Noort , Sarah Lebeer

Bioinformatics

Lab of Applied Microbiology and Biotechnology, Department of Bioscience Engineering, University of Antwerp, Antwerpen 2020, Belgium.

Published: December 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Motivation: Much of prokaryotic comparative genomics currently relies on two critical computational tasks: pangenome inference and core genome inference. Pangenome inference involves clustering genes from a set of genomes into gene families, enabling genome-wide association studies and evolutionary history analysis. The core genome represents gene families present in nearly all genomes and is required to infer a high-quality phylogeny. For species-level datasets, fast pangenome inference tools have been developed. However, tools applicable to more diverse datasets are currently slow and scale poorly.

Results: Here, we introduce SCARAP, a program containing three modules for comparative genomics analyses: a fast and scalable pangenome inference module, a direct core genome inference module, and a module for subsampling representative genomes. When benchmarked against existing tools, the SCARAP pan module proved up to an order of magnitude faster with comparable accuracy. The core module was validated by comparing its result against a core genome extracted from a full pangenome. The sample module demonstrated the rapid sampling of genomes with decreasing novelty. Applied to a dataset of over 31 000 Lactobacillales genomes, SCARAP showcased its ability to derive a representative pangenome. Finally, we applied the novel concept of gene fixation frequency to this pangenome, showing that Lactobacillales genes that are prevalent but rarely fixate in species often encode bacteriophage functions.

Availability And Implementation: The SCARAP toolkit is publicly available at https://github.com/swittouck/scarap.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11681940	PMC
http://dx.doi.org/10.1093/bioinformatics/btae735	DOI Listing

Publication Analysis

Top Keywords

pangenome inference

core genome

comparative genomics

genome inference

gene families

inference module

pangenome

inference

module

scarap

Similar Publications

Gene co-occurrence and its association with phage infectivity in bacterial pangenomes.

Philos Trans R Soc Lond B Biol Sci

September 2025

Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Baden-Württemberg, Germany.

Anne Kupczok , Athina Gavriilidou , Emilian Paulitz , Lucía Guerrero-García , Franz Baumdicker

Phages infect bacteria and have recently re-emerged as a promising strategy to combat bacterial infections. However, there is a lack of methods to predict whether and why a particular phage can or cannot infect a bacterial strain based on their genome sequences. Understanding the complex interactions between phages and their bacterial hosts is thus of considerable interest.

View Article and Find Full Text PDF

Similar Publications

SANS ambages: phylogenomics with abundance-filter, multi-threading, and bootstrapping on amino-acid or genomic sequences.

BMC Bioinformatics

September 2025

Genome Informatics, Faculty of Technology and Center for Biotechnology, Bielefeld University, 33615, Bielefeld, Germany.

Fabian Kolesch , Marco Sohn , Andreas Rempel , Pia Hippel , Roland Wittler

Background: The increasing amount of available genome sequence data enables large-scale comparative studies. A common task is the inference of phylogenies- a challenging task if close reference sequences are not available, genome sequences are incompletely assembled, or the high number of genomes precludes multiple sequence alignment in reasonable time. SANS is an alignment-free, whole-genome based approach for phylogeny estimation.

View Article and Find Full Text PDF

Similar Publications

Beyond White-Nose Syndrome: Mitochondrial and Functional Genomics of .

J Fungi (Basel)

July 2025

Faculty "Bioengineering and Veterinary Medicine", Don State Technical University, 344000 Rostov-on-Don, Russia.

Ilia V Popov , Svetoslav D Todorov , Michael L Chikindas , Koen Venema , Alexey M Ermakov

White-Nose Syndrome (WNS) has devastated insectivorous bat populations, particularly in North America, leading to severe ecological and economic consequences. Despite extensive research, many aspects of the evolutionary history, mitochondrial genome organization, and metabolic adaptations of its etiological agent, , remain unexplored. Here, we present a multi-scale genomic analysis integrating pangenome reconstruction, phylogenetic inference, Bayesian divergence dating, comparative mitochondrial genomics, and refined functional annotation.

View Article and Find Full Text PDF

Similar Publications

Pangenome-based genome inference using integer programming.

Genome Res

August 2025

Indian Institute of Science;

Ghanshyam Chandra , Md Helal Hossen , Stephan Scholz , Alexander T Dilthey , Daniel Gibney

Affordable genotyping methods are essential in genomics. Commonly used genotyping methods primarily support single nucleotide variants and short indels but neglect structural variants. Additionally, accuracy of read alignments to a reference genome is unreliable in highly polymorphic and repetitive regions, further impacting genotyping performance.

View Article and Find Full Text PDF

Similar Publications

Pangenome-scale annotation of mycobacteriophages for dissecting phage-host interactions based on a sequence clustering and structural homology analysis strategy.

mSystems

August 2025

College of Life Science and Technology, Guangxi University, Nanning, China.

Xiao Guo , Zheng-Guo He

With the increasing severity of bacterial drug resistance, there is a growing need for phages with well-defined genetic backgrounds to combat drug-resistant infections. Mycobacteriophages constitute the largest genome-sequenced phage group; however, the vast majority of these phage proteins have not yet been effectively annotated. In this study, we employed a structure-based similarity search approach to improve protein annotation.

View Article and Find Full Text PDF

Similar Publications