Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5096980PMC
http://dx.doi.org/10.1021/acs.jproteome.6b00445DOI Listing

Publication Analysis

Top Keywords

sequence databases
16
sequence
9
database
9
tiered human
8
human integrated
8
databases
8
search databases
8
shotgun proteomics
8
mass spectrometry
8
protein sequence
8

Similar Publications

Comparative Analysis of COVID-19 Gene Target Dropout/Failure Results using Thermofisher TaqPath COVID-19 Combo Kit and Nextstrain CoVariants Genomic Databases.

J Healthc Sci Humanit

January 2024

Assistant Professor & Clinical Coordinator, Health Informatics Program, School of Health Professions, State University of New York Downstate Health Sciences University, 450 Clarkson Avenue, MSC 94, Brooklyn, NY 11203, (718) 270-7738, Fax: (718) 270-7739 Email:

COVID-19 variants continue to infect thousands of people even though the end of the pandemic was announced on May 11, 2023. Nextstrain CoVariants (CoVariants) genomic databases provide detailed information about more than 31 variants of COVID-19 viruses that have been identified through genomic sequencing, showing the mutations they carry. Mutated viruses may yield a negative result for a gene target using a PCR test that has a positive COVID-19 test result.

View Article and Find Full Text PDF

Introduction: Metastatic colorectal cancer (mCRC) exhibits significant heterogeneity in molecular profiles, influencing treatment response and patient outcomes. Mutations in v-raf murine sarcoma viral oncogene homolog B1 () and rat sarcoma () family genes are commonly observed in mCRC. Though originally thought to be mutually exclusive, recent data have shown that patients may present with concomitant and mutations, posing unique challenges and implications for clinical management.

View Article and Find Full Text PDF

predicts poor prognosis and modulates immune infiltration in gastric cancer: a TCGA-based bioinformatics study.

Front Genet

August 2025

Department of Gastrointestinal and Hernia Surgery, Ganzhou Hospital-Nanfang Hospital, Southern Medical University, Ganzhou, China.

Background: Gastric cancer (GC) is a leading cause of cancer-related mortality; however, biomarkers predicting its immunotherapy resistance remain scarce. Vascular cell adhesion molecule ()-, an immune cell adhesion mediator, is implicated in tumor progression; however, its prognostic and immunomodulatory roles in GC remain unclear.

Methods: In this study, we analyzed expression and its clinical relevance in GC using RNA-sequencing data from The Cancer Genome Atlas.

View Article and Find Full Text PDF

Purpose: Autoimmune thyroiditis (AIT) is the most common organ-specific autoimmune disease, and its pathogenesis is closely related to the inflammatory microenvironment driven by immune cell penetration. The role of the newly proposed concept of PANoptosis in immune-related diseases is gradually being revealed. However, there is currently a lack of reports on PANoptosis in AIT.

View Article and Find Full Text PDF

Background And Aim: is a multidrug-resistant (MDR) zoonotic pathogen increasingly implicated in infections in both humans and animals, including avian species. Raptors, particularly peregrine falcons, are vulnerable due to their exposure to diverse environments and intensive management practices. This study aimed to identify isolates from peregrine falcons in Saudi Arabia and to characterize their genomic features, phylogenetic relationships, and antimicrobial resistance (AMR) profiles using whole-genome sequencing (WGS).

View Article and Find Full Text PDF