98%
921
2 minutes
20
Motivation: Much of prokaryotic comparative genomics currently relies on two critical computational tasks: pangenome inference and core genome inference. Pangenome inference involves clustering genes from a set of genomes into gene families, enabling genome-wide association studies and evolutionary history analysis. The core genome represents gene families present in nearly all genomes and is required to infer a high-quality phylogeny. For species-level datasets, fast pangenome inference tools have been developed. However, tools applicable to more diverse datasets are currently slow and scale poorly.
Results: Here, we introduce SCARAP, a program containing three modules for comparative genomics analyses: a fast and scalable pangenome inference module, a direct core genome inference module, and a module for subsampling representative genomes. When benchmarked against existing tools, the SCARAP pan module proved up to an order of magnitude faster with comparable accuracy. The core module was validated by comparing its result against a core genome extracted from a full pangenome. The sample module demonstrated the rapid sampling of genomes with decreasing novelty. Applied to a dataset of over 31 000 Lactobacillales genomes, SCARAP showcased its ability to derive a representative pangenome. Finally, we applied the novel concept of gene fixation frequency to this pangenome, showing that Lactobacillales genes that are prevalent but rarely fixate in species often encode bacteriophage functions.
Availability And Implementation: The SCARAP toolkit is publicly available at https://github.com/swittouck/scarap.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11681940 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btae735 | DOI Listing |
Philos Trans R Soc Lond B Biol Sci
September 2025
Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Baden-Württemberg, Germany.
Phages infect bacteria and have recently re-emerged as a promising strategy to combat bacterial infections. However, there is a lack of methods to predict whether and why a particular phage can or cannot infect a bacterial strain based on their genome sequences. Understanding the complex interactions between phages and their bacterial hosts is thus of considerable interest.
View Article and Find Full Text PDFBMC Bioinformatics
September 2025
Genome Informatics, Faculty of Technology and Center for Biotechnology, Bielefeld University, 33615, Bielefeld, Germany.
Background: The increasing amount of available genome sequence data enables large-scale comparative studies. A common task is the inference of phylogenies- a challenging task if close reference sequences are not available, genome sequences are incompletely assembled, or the high number of genomes precludes multiple sequence alignment in reasonable time. SANS is an alignment-free, whole-genome based approach for phylogeny estimation.
View Article and Find Full Text PDFJ Fungi (Basel)
July 2025
Faculty "Bioengineering and Veterinary Medicine", Don State Technical University, 344000 Rostov-on-Don, Russia.
White-Nose Syndrome (WNS) has devastated insectivorous bat populations, particularly in North America, leading to severe ecological and economic consequences. Despite extensive research, many aspects of the evolutionary history, mitochondrial genome organization, and metabolic adaptations of its etiological agent, , remain unexplored. Here, we present a multi-scale genomic analysis integrating pangenome reconstruction, phylogenetic inference, Bayesian divergence dating, comparative mitochondrial genomics, and refined functional annotation.
View Article and Find Full Text PDFAffordable genotyping methods are essential in genomics. Commonly used genotyping methods primarily support single nucleotide variants and short indels but neglect structural variants. Additionally, accuracy of read alignments to a reference genome is unreliable in highly polymorphic and repetitive regions, further impacting genotyping performance.
View Article and Find Full Text PDFmSystems
August 2025
College of Life Science and Technology, Guangxi University, Nanning, China.
With the increasing severity of bacterial drug resistance, there is a growing need for phages with well-defined genetic backgrounds to combat drug-resistant infections. Mycobacteriophages constitute the largest genome-sequenced phage group; however, the vast majority of these phage proteins have not yet been effectively annotated. In this study, we employed a structure-based similarity search approach to improve protein annotation.
View Article and Find Full Text PDF