Dataset of the frequency patterns of publications annotated to human protein-coding genes, their protein products and genetic relevance.

Data Brief

Discovery Research Coordination, Boehringer Ingelheim, 55216 Ingelheim Am Rhein, Germany.

Published: August 2019


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

We present data concerning the distribution of scientific publications for human protein-coding genes together with their protein products and genetic relevance. We annotated the gene2pubmed dataset Maglott et al., 2007 provided by the NCBI (National Center for Biotechnology Information) with publication years, genetic metadata corresponding to Online Mendelian Inheritance in Man (OMIM) Hamosh et al., 2005 entries and the frequency of their appearance in Genome-Wide Association Studies (GWAS) Buniello et al., 2019 provided by the European Bioinformatics Institute (EBI) using the KNIME Analytics Platform Berthold et al., 2008. The results of this data integration process comprise two datasets: 1) A dataset containing information on all human protein-coding genes that can be used to analyse the number of scientific publications in context of the potential disease relevance of the individual genes. 2) A table with the annual and cumulated number of PubMed entries. For further interpretation of the data presented in this article, please see the research article 'Target 2035 - probing the human proteome' by Carter et al. https://doi.org/10.1016/j.drudis.2019.06.020 Carter et al., 2019.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6702404PMC
http://dx.doi.org/10.1016/j.dib.2019.104284DOI Listing

Publication Analysis

Top Keywords

human protein-coding
12
protein-coding genes
12
genes protein
8
protein products
8
products genetic
8
genetic relevance
8
scientific publications
8
dataset frequency
4
frequency patterns
4
patterns publications
4

Similar Publications

Background: The proteome is a valuable resource for pinpointing therapeutic targets. Therefore, we conducted a proteome-wide Mendelian randomization (MR) study aimed at identifying potential protein markers and therapeutic targets for Anti-N-Methyl-D-Aspartate Receptor Encephalitis (NMDAR-E).

Methods: Protein quantitative trait loci (pQTLs) were obtained from seven published genome-wide association studies (GWASs) focusing on the plasma proteome, resulting in summary-level data for 734 circulating protein markers.

View Article and Find Full Text PDF

Vertebrate animals and many small DNA and single-stranded RNA viruses that infect vertebrates have evolved to suppress genomic CpG dinucleotides. All organisms and most viruses additionally suppress UpA dinucleotides in protein-coding RNA. Synonymously recoding viral genomes to introduce CpG or UpA dinucleotides has emerged as an approach for viral attenuation and vaccine development.

View Article and Find Full Text PDF

Genome graphs provide a powerful reference structure for representing genetic diversity. Their structure emphasizes the polymorphic regions in a collection of genomes, enabling network-based comparisons of population-level variation. However, current tools are limited in their ability to quantify and compare structural features across large genome graphs.

View Article and Find Full Text PDF

Chromosome-scale genome assembly of Sauvagesia rhodoleuca (Ochnaceae) provides insights into its genome evolution and demographic history.

DNA Res

September 2025

Key Laboratory of National Forestry and Grassland Administration on Plant Conservation and Utilization in Southern China, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China.

Sauvagesia rhodoleuca is an endangered species endemic to southern China. Due to human activities, only six fragmented populations remain in Guangdong and Guangxi. Despite considerable conservation efforts, its demographic history and evolution remain poorly understood, particularly from a genomic perspective.

View Article and Find Full Text PDF

Blood purification using immunoadsorbent columns is a therapeutic strategy for removing pathogenic autoantibodies in autoimmune diseases. Currently available columns have limitations: Trp/Phe columns offer cost-effectiveness and sterilizability, but lack antigen specificity and have limited capacity to remove diverse pathogenic autoantibodies; whereas Protein A/peptide/anti-human IgG columns target all antibodies, regardless of pathogenicity, limiting specificity, and often require sterile production due to low stability under sterilization conditions, except for peptide ligands. Full-length autoantigen-immobilized immunoadsorbent columns have great potential to specifically adsorb targeted autoantibodies, because autoantibodies recognize diverse epitopes that vary among individuals.

View Article and Find Full Text PDF