PfaSTer: a machine learning-powered serotype caller for genomes.

Microb Genom

Vaccine Research & Development, Pfizer Inc., 401 N. Middletown Rd, Pearl River, NY 10965, USA.

Published: June 2023


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

(pneumococcus) is a leading cause of morbidity and mortality worldwide. Although multi-valent pneumococcal vaccines have curbed the incidence of disease, their introduction has resulted in shifted serotype distributions that must be monitored. Whole genome sequence (WGS) data provide a powerful surveillance tool for tracking isolate serotypes, which can be determined from nucleotide sequence of the capsular polysaccharide biosynthetic operon (). Although software exists to predict serotypes from WGS data, most are constrained by requiring high-coverage next-generation sequencing reads. This can present a challenge in respect of accessibility and data sharing. Here we present PfaSTer, a machine learning-based method to identify 65 prevalent serotypes from assembled genome sequences. PfaSTer combines dimensionality reduction from k-mer analysis with a Random Forest classifier for rapid serotype prediction. By leveraging the model's built-in statistical framework, PfaSTer determines confidence in its predictions without the need for coverage-based assessments. We then demonstrate the robustness of this method, returning >97 % concordance when compared to biochemical results and other serotyping tools. PfaSTer is open source and available at: https://github.com/pfizer-opensource/pfaster.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327508PMC
http://dx.doi.org/10.1099/mgen.0.001033DOI Listing

Publication Analysis

Top Keywords

pfaster machine
8
wgs data
8
pfaster
5
machine learning-powered
4
learning-powered serotype
4
serotype caller
4
caller genomes
4
genomes pneumococcus
4
pneumococcus leading
4
leading morbidity
4

Similar Publications

PfaSTer: a machine learning-powered serotype caller for genomes.

Microb Genom

June 2023

Vaccine Research & Development, Pfizer Inc., 401 N. Middletown Rd, Pearl River, NY 10965, USA.

(pneumococcus) is a leading cause of morbidity and mortality worldwide. Although multi-valent pneumococcal vaccines have curbed the incidence of disease, their introduction has resulted in shifted serotype distributions that must be monitored. Whole genome sequence (WGS) data provide a powerful surveillance tool for tracking isolate serotypes, which can be determined from nucleotide sequence of the capsular polysaccharide biosynthetic operon ().

View Article and Find Full Text PDF