It has been shown that integrating peptide property predictions such as fragment intensity into the scoring process of peptide spectrum match can greatly increase the number of confidently identified peptides compared to using traditional scoring methods. Here, we introduce Prosit-XL, a robust and accurate fragment intensity predictor covering the cleavable (DSSO/DSBU) and non-cleavable cross-linkers (DSS/BS3), achieving high accuracy on various holdout sets with consistent performance on external datasets without fine-tuning. Due to the complex nature of false positives in XL-MS, an approach to data-driven rescoring was developed that benefits from Prosit-XL's predictions while limiting the overestimation of the false discovery rate (FDR).
View Article and Find Full Text PDFMotivation: In mass spectrometry-based proteomics, the availability of peptide prior knowledge has improved our ability to assign fragmentation spectra to specific peptide sequences. However, some peptides exhibit similar analytical values and fragmentation patterns, which makes them nearly indistinguishable with current data analysis tools.
Results: Here we developed the Mass Spectrometry Content Information (MSCI) Python package to tackle the challenges of peptide identification in mass spectrometry-based proteomics, particularly regarding indistinguishable peptides.
We built and characterised a mass spectrometer capable of performing CID (both beam type and resonant type), UVPD, EID and ECD in an automated fashion during an LCMS type experiment. We exploited this ability to generate large datasets through multienzyme deep proteomics experiments for characterisation of these activation techniques. As a further step, motivated by the complexity generated by these dissociation techniques, we developed a single Prosit deep learning model for fragment ion intensity prediction covering all of these techniques.
View Article and Find Full Text PDFMass spectrometry-based metaproteomics, the identification and quantification of thousands of proteins expressed by complex microbial communities, has become pivotal for unraveling functional interactions within microbiomes. However, metaproteomics data analysis encounters many challenges, including the search of tandem mass spectra against a protein sequence database using proteomics database search algorithms. We used a ground-truth dataset to assess a spectral library searching method against established database searching approaches.
View Article and Find Full Text PDFA fundamental challenge in mass spectrometry-based proteomics is determining which peptide generated a given MS2 spectrum. Peptide sequencing typically relies on matching spectra against a known sequence database, which in some applications is not available. Deep learning-based de novo sequencing can address this limitation by directly predicting peptide sequences from MS2 data.
View Article and Find Full Text PDFIdentifying detectable peptides, known as flyers, is key in mass spectrometry-based proteomics. Peptide detectability is strongly related to peptide sequences and their resulting physicochemical properties. Moreover, the high variability in MS data challenges the development of a generic model for detectability prediction, underlining the need for customizable tools.
View Article and Find Full Text PDFProteomic workflows generate vastly complex peptide mixtures that are analyzed by liquid chromatography-tandem mass spectrometry, creating thousands of spectra, most of which are chimeric and contain fragment ions from more than one peptide. Because of differences in data acquisition strategies such as data-dependent, data-independent or parallel reaction monitoring, separate software packages employing different analysis concepts are used for peptide identification and quantification, even though the underlying information is principally the same. Here, we introduce CHIMERYS, a spectrum-centric search algorithm designed for the deconvolution of chimeric spectra that unifies proteomic data analysis.
View Article and Find Full Text PDFThis review explores state of the art machine learning and deep learning models for peptide property prediction in mass spectrometry-based proteomics, including, but not limited to, models for predicting digestibility, retention time, charge state distribution, collisional cross section, fragmentation ion intensities, and detectability. The combination of these models enables not only the in silico generation of spectral libraries but also finds many additional use cases in the design of targeted assays or data-driven rescoring. This review serves as both an introduction for newcomers and an update for experienced researchers aiming to develop accessible and reproducible models for peptide property predictions.
View Article and Find Full Text PDFJ Proteome Res
April 2025
Multibatch isobaric labeling experiments are frequently applied for clinical and pharmaceutical studies of large sample cohorts. To tackle the critical issue of missing values in such studies, we introduce the ProSIMSIt pipeline. It combines the advantages of tandem mass spectrum clustering via SIMSI-Transfer and data-driven rescoring via Prosit and Oktoberfest.
View Article and Find Full Text PDFMass spectrometry (MS)-based proteomics relies heavily on MS/MS (MS2) data, which do not fully exploit the available MS1 information. Traditional peptide identity propagation (PIP) methods, such as match-between-runs (MBR), are limited to similar runs, particularly with the same liquid chromatography (LC) gradients, thus potentially underutilizing available proteomics libraries. We introduce SWAPS, a novel and modular MS1-centric framework incorporating advances in peptide property prediction, extensive proteomics libraries, and deep-learning-based postprocessing to enable and explore PIP across more diverse experimental conditions and LC gradients.
View Article and Find Full Text PDFNat Commun
March 2025
Integration of multi-omics data can provide information on biomolecules from different layers to illustrate the complex biology systematically. Here, we build a multi-omics atlas containing 132,570 transcripts, 44,473 proteins, 19,970 phosphoproteins, and 12,427 acetylproteins across wheat vegetative and reproductive phases. Using this atlas, we elucidate transcriptional regulation network, contributions of post-translational modification (PTM) and transcript level to protein abundance, and biased homoeolog expression and PTM in wheat.
View Article and Find Full Text PDFCitrullination is a critical yet understudied post-translational modification (PTM) implicated in various biological processes. Exploring its role in health and disease requires a comprehensive understanding of the prevalence of this PTM at a proteome-wide scale. Although mass spectrometry has enabled the identification of citrullination sites in complex biological samples, it faces significant challenges, including limited enrichment tools and a high rate of false positives due to the identical mass with deamidation (+0.
View Article and Find Full Text PDFMol Cell Proteomics
March 2025
Mass spectrometry-based proteomics has revolutionized bacterial identification and elucidated many molecular mechanisms underlying bacterial growth, community formation, and drug resistance. However, most research has been focused on a few model bacteria, overlooking bacterial diversity. In this study, we present the most extensive bacterial proteomic resource to date, covering 303 species, 119 genera, and five phyla with over 636,000 unique expressed proteins, confirming the existence of over 38,700 hypothetical proteins.
View Article and Find Full Text PDFPost-translational modifications (PTMs) play pivotal roles in regulating cellular signaling, fine-tuning protein function, and orchestrating complex biological processes. Despite their importance, the lack of comprehensive tools for studying PTMs from a pathway-centric perspective has limited our ability to understand how PTMs modulate cellular pathways on a molecular level. Here, we present PTMNavigator, a tool integrated into the ProteomicsDB platform that offers an interactive interface for researchers to overlay experimental PTM data with pathway diagrams.
View Article and Find Full Text PDFThe human body contains trillions of cells, classified into specific cell types, with diverse morphologies and functions. In addition, cells of the same type can assume different states within an individual's body during their lifetime. Understanding the complexities of the proteome in the context of a human organism and its many potential states is a necessary requirement to understanding human biology, but these complexities can neither be predicted from the genome, nor have they been systematically measurable with available technologies.
View Article and Find Full Text PDFMass-spectrometry-based proteomics has advanced with the integration of experimental and predicted spectral libraries, which have significantly improved peptide identification in complex search spaces. However, challenges persist in distinguishing some peptides with close retention times and nearly identical fragmentation patterns. In this study, we conducted a theoretical assessment to quantify the prevalence of indistinguishable peptides within the human canonical proteome and immunopeptidome using state-of-the-art retention time and spectrum prediction models.
View Article and Find Full Text PDFNucleic Acids Res
September 2024
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs.
View Article and Find Full Text PDFBackstroke has been thoroughly investigated in the context of sports science. However, we have no knowledge about the nationalities of the fastest age group backstroke swimmers. Therefore, the present study intended to investigate the nationalities of the fastest backstroke swimmers.
View Article and Find Full Text PDFJ Am Soc Mass Spectrom
November 2024
Alternative splicing is a major contributor of transcriptomic complexity, but the extent to which transcript isoforms are translated into stable, functional protein isoforms is unclear. Furthermore, detection of relatively scarce isoform-specific peptides is challenging, with many protein isoforms remaining uncharted due to technical limitations. Recently, a family of advanced targeted MS strategies, termed internal standard parallel reaction monitoring (IS-PRM), have demonstrated multiplexed, sensitive detection of predefined peptides of interest.
View Article and Find Full Text PDFJ Proteomics
August 2024
The 2023 European Bioinformatics Community for Mass Spectrometry (EuBIC-MS) Developers Meeting was held from January 15th to January 20th, 2023, in Congressi Stefano Franscin at Monte Verità in Ticino, Switzerland. The participants were scientists and developers working in computational mass spectrometry (MS), metabolomics, and proteomics. The 5-day program was split between introductory keynote lectures and parallel hackathon sessions focusing on "Artificial Intelligence in proteomics" to stimulate future directions in the MS-driven omics areas.
View Article and Find Full Text PDFPlant genomics plays a pivotal role in enhancing global food security and sustainability by offering innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy and contiguity of genome assemblies improve, structural annotation of plant genomes continues to be a significant challenge due to their large size, polyploidy, and rich repeat content. In this paper, we present an overview of the current landscape in crop genomics research, highlighting the diversity of genomic characteristics across various crop species.
View Article and Find Full Text PDFRecent developments in machine-learning (ML) and deep-learning (DL) have immense potential for applications in proteomics, such as generating spectral libraries, improving peptide identification, and optimizing targeted acquisition modes. Although new ML/DL models for various applications and peptide properties are frequently published, the rate at which these models are adopted by the community is slow, which is mostly due to technical challenges. We believe that, for the community to make better use of state-of-the-art models, more attention should be spent on making models easy to use and accessible by the community.
View Article and Find Full Text PDF