Gene duplication is a major evolutionary source of functional innovation. Following duplication events, gene copies (paralogues) may undergo various fates, including retention with functional modifications (such as subfunctionalization or neofunctionalization) or loss. When paralogues are retained, this results in complex orthology relationships, including one-to-many or many-to-many.
View Article and Find Full Text PDFA comprehensive, computable representation of the functional repertoire of all macromolecules encoded within the human genome is a foundational resource for biology and biomedical research. The Gene Ontology Consortium has been working towards this goal by generating a structured body of information about gene functions, which now includes experimental findings reported in more than 175,000 publications for human genes and genes in experimentally tractable model organisms. Here, we describe the results of a large, international effort to integrate all of these findings to create a representation of human gene functions that is as complete and accurate as possible.
View Article and Find Full Text PDFNAR Genom Bioinform
December 2024
The Quest for Orthologs (QfO) orthology benchmark service (https://orthology.benchmarkservice.org) hosts a wide range of standardized benchmarks for orthology inference evaluation.
View Article and Find Full Text PDFNucleic Acids Res
January 2025
InterPro (https://www.ebi.ac.
View Article and Find Full Text PDFGenome Biol Evol
October 2024
The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life.
View Article and Find Full Text PDFDatabase (Oxford)
August 2024
Dynamic changes in protein glycosylation impact human health and disease progression. However, current resources that capture disease and phenotype information focus primarily on the macromolecules within the central dogma of molecular biology (DNA, RNA, proteins). To gain a better understanding of organisms, there is a need to capture the functional impact of glycans and glycosylation on biological processes.
View Article and Find Full Text PDFMotivation: Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge.
View Article and Find Full Text PDFThe solution of the longstanding “protein folding problem” in 2021 showcased the transformative capabilities of AI in advancing the biomedical sciences. AI was characterized as successfully learning from protein structure , which then spurred a more general call for AI-ready datasets to drive forward medical research. Here, we argue that it is the broad availability of , not just data, that is required to fuel further advances in AI in the scientific domain.
View Article and Find Full Text PDFEpidemiologic surveillance of circulating SARS-CoV-2 variants is essential to assess impact on clinical outcomes and vaccine efficacy. Whole genome sequencing (WGS), the gold-standard to identify variants, requires significant infrastructure and expertise. We developed a digital droplet polymerase chain reaction (ddPCR) assay that can rapidly identify circulating variants of concern/interest (VOC/VOI) using variant-specific mutation combinations in the Spike gene.
View Article and Find Full Text PDFThe Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms.
View Article and Find Full Text PDFNucleic Acids Res
January 2023
The InterPro database (https://www.ebi.ac.
View Article and Find Full Text PDFDatabase (Oxford)
August 2022
Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages.
View Article and Find Full Text PDFThe Annotation Query (AnnoQ) (http://annoq.org/) is designed to provide comprehensive and up-to-date functional annotations for human genetic variants. The system is supported by an annotation database with ∼39 million human variants from the Haplotype Reference Consortium (HRC) pre-annotated with sequence feature annotations by WGSA and functional annotations to Gene Ontology (GO) and pathways in PANTHER.
View Article and Find Full Text PDFNucleic Acids Res
July 2022
The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium.
View Article and Find Full Text PDFBiochim Biophys Acta Gene Regul Mech
December 2021
Transcription plays a central role in defining the identity and functionalities of cells, as well as in their responses to changes in the cellular environment. The Gene Ontology (GO) provides a rigorously defined set of concepts that describe the functions of gene products. A GO annotation is a statement about the function of a particular gene product, represented as an association between a gene product and the biological concept a GO term defines.
View Article and Find Full Text PDFMotivation: Gene Ontology Causal Activity Models (GO-CAMs) assemble individual associations of gene products with cellular components, molecular functions and biological processes into causally linked activity flow models. Pathway databases such as the Reactome Knowledgebase create detailed molecular process descriptions of reactions and assemble them, based on sharing of entities between individual reactions into pathway descriptions.
Results: To convert the rich content of Reactome into GO-CAMs, we have developed a software tool, Pathways2GO, to convert the entire set of normal human Reactome pathways into GO-CAMs.
Mol Biol Evol
July 2021
Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene.
View Article and Find Full Text PDFGene function annotation is important for a variety of downstream analyses of genetic data. But experimental characterization of function remains costly and slow, making computational prediction an important endeavor. Phylogenetic approaches to prediction have been developed, but implementation of a practical Bayesian framework for parameter estimation remains an outstanding challenge.
View Article and Find Full Text PDFWe aim to enable the accurate and efficient transfer of knowledge about gene function gained from and other model organisms to other plant species. This knowledge transfer is frequently challenging in plants due to duplications of individual genes and whole genomes in plant lineages. Such duplications result in complex evolutionary relationships between related genes, which may have similar sequences but highly divergent functions.
View Article and Find Full Text PDFEnhancers are powerful and versatile agents of cell-type specific gene regulation, which are thought to play key roles in human disease. Enhancers are short DNA elements that function primarily as clusters of transcription factor binding sites that are spatially coordinated to regulate expression of one or more specific target genes. These regulatory connections between enhancers and target genes can therefore be characterized as enhancer-gene links that can affect development, disease, and homeostatic cellular processes.
View Article and Find Full Text PDFPANTHER (Protein Analysis Through Evolutionary Relationships, http://www.pantherdb.org) is a resource for the evolutionary and functional classification of protein-coding genes from all domains of life.
View Article and Find Full Text PDFThe InterPro database (https://www.ebi.ac.
View Article and Find Full Text PDFMotivation: A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called 'causal interaction' takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells.
View Article and Find Full Text PDFNucleic Acids Res
July 2020
The identification of orthologs-genes in different species which descended from the same gene in their last common ancestor-is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.
View Article and Find Full Text PDF