Publications by authors named "Michael R Shortreed"

Motivation: Studying protein isoforms is an essential step in biomedical research; at present, the main approach for analyzing proteins is via bottom-up mass spectrometry proteomics, which return peptide identifications, that are indirectly used to infer the presence of protein isoforms. However, the detection and quantification processes are noisy; in particular, peptides may be erroneously detected, and most peptides, known as shared peptides, are associated to multiple protein isoforms. As a consequence, studying individual protein isoforms is challenging, and inferred protein results are often abstracted to the gene-level or to groups of protein isoforms.

View Article and Find Full Text PDF

The goal of proteomics is to identify and quantify peptides and proteins within a biological sample. Almost all algorithms for the identification of peptides in LC-MS/MS data employ two steps: peptide/spectrum matching and peptide-identity-propagation (PIP), also known as match-between-runs. PIP can routinely account for up to 40% of all results, with that proportion rising as high as 75% in single-cell proteomics.

View Article and Find Full Text PDF

Electrospray ionization (ESI) mass spectrometry is an essential technique for chemical analysis in a range of fields. In ESI, analytes can produce multiple charge states, which must be correctly assigned for identification. Existing approaches to charge state assignment can suffer from limited accuracy or poor speed.

View Article and Find Full Text PDF

Scientific discovery relies on innovative software as much as experimental methods, especially in proteomics, where computational tools are essential for mass spectrometer setup, data analysis, and interpretation. Since the introduction of SEQUEST, proteomics software has grown into a complex ecosystem of algorithms, predictive models, and workflows, but the field faces challenges, including the increasing complexity of mass spectrometry data, limited reproducibility due to proprietary software, and difficulties integrating with other omics disciplines. Closed-source, platform-specific tools exacerbate these issues by restricting innovation, creating inefficiencies, and imposing hidden costs on the community.

View Article and Find Full Text PDF

Alzheimer's disease (AD) is characterized by the accumulation of protein aggregates, which are thought to be influenced by posttranslational modifications (PTMs). Dehydroamino acids (DHAAs) are rarely observed PTMs that contain an electrophilic alkene capable of forming protein-protein crosslinks, which may lead to protein aggregation. We report here the discovery of DHAAs in the protein aggregates from AD, constituting an unknown and previously unsuspected source of extensive proteomic complexity.

View Article and Find Full Text PDF

The goal of proteomics is to identify and quantify peptides and proteins within a biological sample. Almost all algorithms for the identification of peptides in LC-MS/MS data employ two steps: peptide/spectrum matching and peptide-identity-propagation (PIP), also known as match-between-runs. PIP was originally envisioned as a backup method to overcome measurement stochasticity.

View Article and Find Full Text PDF
Article Synopsis
  • Studying protein isoforms is crucial for biomedical research, but current methods using bottom-up mass spectrometry often face challenges like noisy detection and shared peptides, making it hard to analyze individual isoforms.
  • A new statistical method is introduced to enhance protein isoform analysis by combining mass spectrometry and transcriptomics data in a Bayesian framework, addressing uncertainties in peptide detection and abundance allocation.
  • The method shows strong performance in simulations and real datasets, accurately inferring protein isoform presence, estimating their abundance, and detecting differences between protein and transcript levels; it is available as a free Bioconductor R package with usage examples.
View Article and Find Full Text PDF

Identification of O-glycopeptides from tandem mass spectrometry data is complicated by the near complete dissociation of O-glycans from the peptide during collisional activation and by the combinatorial explosion of possible glycoforms when glycans are retained intact in electron-based activation. The recent O-Pair search method provides an elegant solution to these problems, using a collisional activation scan to identify the peptide sequence and total glycan mass, and a follow-up electron-based activation scan to localize the glycosite(s) using a graph-based algorithm in a reduced search space. Our previous O-glycoproteomics methods with MSFragger-Glyco allowed for extremely fast and sensitive identification of O-glycopeptides from collisional activation data but had limited support for site localization of glycans and quantification of glycopeptides.

View Article and Find Full Text PDF

The identification of proteoforms by top-down proteomics requires both high quality fragmentation spectra and the neutral mass of the proteoform from which the fragments derive. Intact proteoform spectra can be highly complex and may include multiple overlapping proteoforms, as well as many isotopic peaks and charge states. The resulting lower signal-to-noise ratios for intact proteins complicates downstream analyses such as deconvolution.

View Article and Find Full Text PDF

The rapid and accurate quantification of peptides is a critical element of modern proteomics that has become increasingly challenging as proteomic data sets grow in size and complexity. We present here FlashLFQ, a computer program for high-speed label-free quantification of peptides and proteins following a search of bottom-up mass spectrometry data. FlashLFQ is approximately an order of magnitude faster than established label-free quantification methods and can quantify data-dependent analysis (DDA) search results from any proteomics search program.

View Article and Find Full Text PDF

MetaMorpheus is a free and open-source software program dedicated to the comprehensive analysis of proteomic data. In bottom-up proteomics, protein samples are digested into peptides prior to chromatographic separation and tandem mass spectrometric analysis. The resulting fragmentation spectra are subsequently analyzed with search software programs to obtain peptide identifications and infer the presence of proteins in the samples.

View Article and Find Full Text PDF

Tandem mass spectrometry (MS/MS) is widely employed for the analysis of complex proteomic samples. While protein sequence database searching and spectral library searching are both well-established peptide identification methods, each has shortcomings. Protein sequence databases lack fragment peak intensity information, which can result in poor discrimination between correct and incorrect spectrum assignments.

View Article and Find Full Text PDF

The SARS-CoV-2 omicron variant presented significant challenges to the global effort to counter the pandemic. SARS-CoV-2 is predicted to remain prevalent for the foreseeable future, making the ability to identify SARS-CoV-2 variants imperative in understanding and controlling the pandemic. The predominant variant discovery method, genome sequencing, is time-consuming, insensitive, and expensive.

View Article and Find Full Text PDF

Proteoform Suite is an interactive software program for the identification and quantification of intact proteoforms from mass spectrometry data. Proteoform Suite identifies proteoforms observed by intact-mass (MS1) analysis. In intact-mass analysis, unfragmented experimental proteoforms are compared to a database of known proteoform sequences and to one another, searching for mass differences corresponding to well-known post-translational modifications or amino acids.

View Article and Find Full Text PDF

Background: The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.

View Article and Find Full Text PDF

Human immunodeficiency virus type 1 (HIV-1) remains a deadly infectious disease despite existing antiretroviral therapies. A comprehensive understanding of the specific mechanisms of viral infectivity remains elusive and currently limits the development of new and effective therapies. Through in-depth proteomic analysis of HIV-1 virions, we discovered the novel post-translational modification of highly conserved residues within the viral matrix and capsid proteins to the dehydroamino acids, dehydroalanine and dehydrobutyrine.

View Article and Find Full Text PDF

Interpreting proteomics data remains challenging due to the large number of proteins that are quantified by modern mass spectrometry methods. Weighted gene correlation network analysis (WGCNA) can identify groups of biologically related proteins using only protein intensity values by constructing protein correlation networks. However, WGCNA is not widespread in proteomic analyses due to challenges in implementing workflows.

View Article and Find Full Text PDF

Pancreatic islets are essential for maintaining physiological blood glucose levels, and declining islet function is a hallmark of type 2 diabetes. We employ mass spectrometry-based proteomics to systematically analyze islets from 9 genetic or diet-induced mouse models representing a broad cross-section of metabolic health. Quantifying the islet proteome to a depth of >11,500 proteins, this study represents the most detailed analysis of mouse islet proteins to date.

View Article and Find Full Text PDF

Proton-transfer reactions (PTRs) have emerged as a powerful tool for the study of intact proteins. When coupled with /-selective kinetic excitation, such as parallel ion parking (PIP), one can exert exquisite control over rates of reaction with a high degree of specificity. This allows one to "concentrate", in the gas phase, nearly all the signals from an intact protein charge state envelope into a single charge state, improving the signal-to-noise ratio (S/N) by 10× or more.

View Article and Find Full Text PDF

Top-down proteomics is a key mass spectrometry-based technology for comprehensive analysis of proteoforms. Proteoforms exhibit multiple high charge states and isotopic forms in full MS scans. The dissociation behavior of proteoforms in different charge states and subjected to different collision energies is highly variable.

View Article and Find Full Text PDF

MetaMorpheus is a free, open-source software program for the identification of peptides and proteoforms from data-dependent acquisition tandem MS experiments. There is inherent uncertainty in these assignments for several reasons, including the limited overlap between experimental and theoretical peaks, the / uncertainty, and noise peaks or peaks from coisolated peptides that produce false matches. False discovery rates provide only a set-wise approximation for incorrect spectrum matches.

View Article and Find Full Text PDF

We report O-Pair Search, an approach to identify O-glycopeptides and localize O-glycosites. Using paired collision- and electron-based dissociation spectra, O-Pair Search identifies O-glycopeptides via an ion-indexed open modification search and localizes O-glycosites using graph theory and probability-based localization. O-Pair Search reduces search times more than 2,000-fold compared to current O-glycopeptide processing software, while defining O-glycosite localization confidence levels and generating more O-glycopeptide identifications.

View Article and Find Full Text PDF

Identification of proteoforms, the different forms of a protein, is important to understand biological processes. A proteoform family is the set of different proteoforms from the same gene. We previously developed the software program Proteoform Suite, which constructs proteoform families and identifies proteoforms by intact-mass analysis.

View Article and Find Full Text PDF

Proteoforms are the workhorses of the cell, and subtle differences between their amino acid sequences or post-translational modifications (PTMs) can change their biological function. To most effectively identify and quantify proteoforms in genetically diverse samples by mass spectrometry (MS), it is advantageous to search the MS data against a sample-specific protein database that is tailored to the sample being analyzed, in that it contains the correct amino acid sequences and relevant PTMs for that sample. To this end, we have developed Spritz (https://smith-chem-wisc.

View Article and Find Full Text PDF