Matching curated genome databases: a non trivial task.

BMC Genomics

Institut de Génétique et Microbiologie, Université Paris Sud XI, CNRS UMR 8621, Bât, 400, 91405 Orsay Cedex, France.

Published: October 2008


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Curated databases of completely sequenced genomes have been designed independently at the NCBI (RefSeq) and EBI (Genome Reviews) to cope with non-standard annotation found in the version of the sequenced genome that has been published by databanks GenBank/EMBL/DDBJ. These curation attempts were expected to review the annotations and to improve their pertinence when using them to annotate newly released genome sequences by homology to previously annotated genomes. However, we observed that such an uncoordinated effort has two unwanted consequences. First, it is not trivial to map the protein identifiers of the same sequence in both databases. Secondly, the two reannotated versions of the same genome differ at the level of their structural annotation.

Results: Here, we propose CorBank, a program devised to provide cross-referencing protein identifiers no matter what the level of identity is found between their matching sequences. Approximately 98% of the 1,983,258 amino acid sequences are matching, allowing instantaneous retrieval of their respective cross-references. CorBank further allows detecting any differences between the independently curated versions of the same genome. We found that the RefSeq and Genome Reviews versions are perfectly matching for only 50 of the 641 complete genomes we have analyzed. In all other cases there are differences occurring at the level of the coding sequence (CDS), and/or in the total number of CDS in the respective version of the same genome.CorBank is freely accessible at http://www.corbank.u-psud.fr. The CorBank site contains also updated publication of the exhaustive results obtained by comparing RefSeq and Genome Reviews versions of each genome. Accordingly, this web site allows easy search of cross-references between RefSeq, Genome Reviews, and UniProt, for either a single CDS or a whole replicon.

Conclusion: CorBank is very efficient in rapid detection of the numerous differences existing between RefSeq and Genome Reviews versions of the same curated genome. Although such differences are acceptable as reflecting different views, we suggest that curators of both genome databases could help reducing further divergence by agreeing on a minimal dialogue and attempting to publish the point of view of the other database whenever it is technically possible.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596144PMC
http://dx.doi.org/10.1186/1471-2164-9-501DOI Listing

Publication Analysis

Top Keywords

genome reviews
20
refseq genome
16
genome
13
versions genome
12
reviews versions
12
curated genome
8
genome databases
8
protein identifiers
8
refseq
5
reviews
5

Similar Publications

Why transport matters: an update on carrier proteins in Apicomplexan parasites.

Curr Opin Microbiol

September 2025

Cryptosporidiosis Laboratory, The Francis Crick Institute, London, United Kingdom. Electronic address:

The movement of molecules across the membranous barriers of a cell is fundamental to cellular homeostasis in every living organism. This vital process is facilitated through a mechanistically diverse class of proteins, collectively known as membrane transporters. Among these are so-called carrier proteins that can function in passive and active transport mechanisms.

View Article and Find Full Text PDF

The role of Denisovan paleohabitats in shaping modern human genetic resistance to viral, bacterial, and parasitic infections.

J Hum Evol

September 2025

Sustainability Solutions Research Lab, University of Pannonia, Egyetem utca 10, H-8200, Veszprém, Hungary. Electronic address:

Denisovans contributed notably to the genomes of present-day East and Southeast Asians. However, the relationship between the inhabited paleohabitats and the adaptive genetic traits related to infections in modern humans remains underexplored. This study uses geospatial techniques to analyze climatic factors associated with three Denisovan archaeological sites linked to nine specimens.

View Article and Find Full Text PDF

Nutritional Symbiosis Between Ants and Their Symbiotic Microbes.

Annu Rev Entomol

September 2025

2Department of Entomology and Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York, USA; email:

Nutritional symbioses with microorganisms have profoundly shaped the evolutionary success of ants, enabling them to overcome dietary limitations and thrive across diverse ecological niches and trophic levels. These interactions are particularly crucial for ants with specialized diets, where microbial symbionts compensate for dietary imbalances by contributing to nitrogen metabolism, vitamin supplementation, and the catabolism of plant fibers and proteins. This review synthesizes recent advances in our understanding of ant-microbe symbioses, focusing on diversity, functional roles in host nutrition, and mechanisms of transmission of symbiotic microorganisms.

View Article and Find Full Text PDF

causes otitis media and severe diseases including pneumonia, meningitis and bacteraemia. The rise of antimicrobial resistance (AMR) in , facilitated by mobile genetic elements (MGEs), complicates infection treatment. While pneumococcal conjugate vaccine (PCV) deployment has reduced disease burden, non-vaccine serotypes (NVTs) have increased and now cause invasive disease.

View Article and Find Full Text PDF

Neural stem cells (NSCs) are multipotent stem cells with self-renewal capacity, able to differentiate into all neural lineages of the central nervous system, including neurons, oligodendrocytes, and astrocytes; thus, their proliferation and differentiation are essential for embryonic neurodevelopment and adult brain homoeostasis. Dysregulation in these processes is implicated in neurological disorders, highlighting the need to elucidate how NSCs proliferate and differentiate to clarify the mechanisms of neurogenesis and uncover potential therapeutic targets. MicroRNAs (miRNAs) are small, post-transcriptional regulators of gene expression involved in many aspects of nervous system development and function.

View Article and Find Full Text PDF