Publications by authors named "Peter N Robinson"

Motivation: Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for clinical management. Currently, manual biocuration is used to annotate MAxO terms to rare diseases.

View Article and Find Full Text PDF

Ontologies are structured frameworks for representing knowledge by systematically defining concepts, categories, and their relationships. While widely adopted in biomedicine, ontologies remain largely absent in mental health research and clinical care, where the field continues to rely heavily on existing classification systems (e.g.

View Article and Find Full Text PDF

While Research Electronic Data Capture (REDCap) has been widely adopted in rare disease research, its unconstrained data format often leads to implementations that lack native interoperability with global health data standards, limiting secondary data use. To address this, we developed and validated , an open-source framework implementing our previously-published ontology-based rare disease common data model, enabling standardised data exchange between REDCap, international registries, and downstream analysis tools. Its preconfigured pipelines interact with the local REDCap application programming interface and enable semi-automatic import or export of data to the Global Alliance for Genomics and Health (GA4GH) Phenopackets and Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) instances, conforming to the HL7 International Patient Summary and Genomics Reporting profiles.

View Article and Find Full Text PDF

Embeddings are semantically meaningful representations of words in a vector space, commonly used to enhance downstream machine learning applications. Traditional biomedical embedding techniques often replace all synonymous words representing biological or medical concepts with a unique token, ensuring consistent representation and improving embedding quality. However, the potential impact of replacing non-biomedical concept synonyms has received less attention.

View Article and Find Full Text PDF

We introduce , a Protein Language Model (PLM) that employs a multifaceted learning strategy based on transfer learning from a decoder-based Transformer, conditional learning using specific functional keywords, and fine-tuning for the modeling of enzymes. Our experiments show that significantly enhances generalist PLMs like ProGen for the prediction and generation of enzymes belonging to specific Enzyme Commission (EC) categories. Our experiments demonstrate that generated sequences can diverge from natural ones, while retaining similar predicted tertiary structure, predicted functions and the active sites of their natural counterparts.

View Article and Find Full Text PDF

Arthrogryposis multiplex congenita (AMC) represents a large, rare group of congenital conditions. This study addressed major challenges in AMC research posed by the lack of systematic frameworks for data collection and the use of inconsistent terminologies and text descriptions. We aimed to systematically review the Human Phenotype Ontology (HPO) terms, encode AMC phenotypic traits as HPO terms, and pilot test the encoding process in a cohort of children with AMC.

View Article and Find Full Text PDF

Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs-ultimately hindering the development of effective prioritisation tools.

View Article and Find Full Text PDF

Comprehensively characterizing genotype-phenotype correlations (GPCs) in Mendelian disease would create new opportunities for improving clinical management and understanding disease biology. However, heterogeneous approaches to data sharing, reuse, and analysis have hindered progress in the field. We developed Genotype Phenotype Evaluation of Statistical Association (GPSEA), a software package that leverages the Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema to represent case-level clinical and genetic data about individuals.

View Article and Find Full Text PDF

Background: Large language models (LLMs) are increasingly used in the medical field for diverse applications including differential diagnostic support. The estimated training data used to create LLMs such as the Generative Pretrained Transformer (GPT) predominantly consist of English-language texts, but LLMs could be used across the globe to support diagnostics if language barriers could be overcome. Initial pilot studies on the utility of LLMs for differential diagnosis in languages other than English have shown promise, but a large-scale assessment on the relative performance of these models in a variety of European and non-European languages on a comprehensive corpus of challenging rare-disease cases is lacking.

View Article and Find Full Text PDF

Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses.

View Article and Find Full Text PDF

Up to 80% of rare disease patients remain undiagnosed after genomic sequencing, with many probably involving pathogenic variants in yet to be discovered disease-gene associations. To search for such associations, we developed a rare variant gene burden analytical framework for Mendelian diseases, and applied it to protein-coding variants from whole-genome sequencing of 34,851 cases and their family members recruited to the 100,000 Genomes Project. A total of 141 new associations were identified, including five for which independent disease-gene evidence was recently published.

View Article and Find Full Text PDF

Although rare diseases (RDs) affect over 260 million individuals worldwide, low data quality and scarcity challenge effective care and research. This work aims to harmonise the Common Data Set by European Rare Disease Registry Infrastructure, Health Level 7 Fast Healthcare Interoperability Base Resources, and the Global Alliance for Genomics and Health Phenopacket Schema into a novel rare disease common data model (RD-CDM), laying the foundation for developing international RD-CDMs aligned with these data standards. We developed a modular-based GitHub repository and documentation to account for flexibility, extensions and further development.

View Article and Find Full Text PDF

P21-activated kinase 2 (PAK2) is a serine/threonine kinase essential for a variety of cellular processes including signal transduction, cellular survival, proliferation, and migration. A recent report proposed monoallelic PAK2 variants cause Knobloch syndrome type 2 (KNO2)-a developmental disorder primarily characterized by ocular anomalies. Here, we identified a novel de novo heterozygous missense variant in PAK2, NM_002577.

View Article and Find Full Text PDF

Genetic tests available in the prenatal setting have expanded rapidly with next generation sequencing, and fetal imaging can detect a breadth of many structural and functional abnormalities. To identify a fetal genetic disease, deep phenotyping is increasingly important to generate a differential diagnosis, choose the most appropriate genetic tests, and inform the results of those tests. The Human Phenotype Ontology (HPO) organizes and defines the features of human disease to support deep phenotyping, and ongoing efforts are being made to improve the scope of the HPO to comprehensively include fetal phenotypes.

View Article and Find Full Text PDF

Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines.

View Article and Find Full Text PDF

Gene Ontology overrepresentation analysis (GO-ORA) is a standard approach towards characterizing salient functional characteristics of sets of differentially expressed genes (DGE) in RNA sequencing (RNA-seq) experiments. GO-ORA compares the distribution of GO annotations of the DGE to that of all genes or all expressed genes. This approach has not been available to characterize differential alternative splicing (DAS).

View Article and Find Full Text PDF

Hi-C and capture Hi-C (CHi-C) both leverage paired-end sequencing of chimeric fragments to gauge the strength of interactions based on the total number of paired-end reads mapped to a common pair of restriction fragments. Mapped paired-end reads can have four relative orientations, depending on the genomic positions and strands of the two reads. We assigned one paired-end read orientation to each of the four possible re-ligations that can occur between two given restriction fragments.

View Article and Find Full Text PDF

Numerous factors regulate alternative splicing of human genes at a co-transcriptional level. However, how alternative splicing depends on the regulation of gene expression is poorly understood. We leveraged data from the Genotype-Tissue Expression (GTEx) project to show a significant association of gene expression and splicing for 6874 (4.

View Article and Find Full Text PDF
Article Synopsis
  • Clinical intuition plays a crucial role in differential diagnosis, but current algorithms for rare genetic diseases overlook this aspect and assume equal chances for all possible Mendelian diseases.
  • The new ClintLR algorithm enhances the existing LIRICAL algorithm by adjusting the pretest probabilities of related diseases based on clinical intuition.
  • Simulation results indicate that ClintLR significantly improves the ranking of accurate diagnoses in genetic sequencing, making it a valuable tool available for free online.
View Article and Find Full Text PDF
Article Synopsis
  • The GA4GH Phenopacket Schema, released in 2022 and approved as a standard by ISO, allows the sharing of clinical and genomic data, including phenotypic descriptions and genetic information, to aid in genomic diagnostics.
  • Phenopacket Store Version 0.1.19 offers a collection of 6668 phenopackets linked to various diseases and genes, making it a crucial resource for testing algorithms and software in genomic research.
  • This collection represents the first extensive case-level, standardized phenotypic information sourced from medical literature, supporting advancements in diagnostic genomics and machine learning applications.
View Article and Find Full Text PDF
Article Synopsis
  • Phenotypic data helps us understand how genomic variations affect living organisms and is vital for clinical applications like diagnosing diseases and developing treatments.
  • The field of phenomics aims to unify and analyze the vast amounts of phenotypic data collected over time, but faces challenges due to inconsistent methods and vocabularies used to record this information.
  • The Unified Phenotype Ontology (uPheno) framework offers a solution by providing a standardized system for organizing phenotype terms, allowing for better integration of data across different species and improving research on genotype-phenotype associations.
View Article and Find Full Text PDF

Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed.

View Article and Find Full Text PDF

Improving health and social equity for persons living with a rare disease (PLWRD) is increasingly recognized as a global policy priority. However, there is currently no international alignment on how to define and describe rare diseases. A global reference is needed to establish a mutual understanding to inform a wide range of stakeholders for actions.

View Article and Find Full Text PDF

Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for the clinical management of patients. Currently, manual biocuration is used to assign MAxO terms to rare diseases, enabling clinical management of rare diseases to be described computationally for use in clinical decision support and mechanism discovery.

View Article and Find Full Text PDF

The "RNA world" represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to each patient's biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are constantly produced and available from public repositories, they are scattered across different databases and a centralized, uniform, and semantically consistent representation of the "RNA world" is still lacking. We propose RNA-KG, a knowledge graph (KG) encompassing biological knowledge about RNAs gathered from more than 60 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts.

View Article and Find Full Text PDF

Synopsis of recent research by authors named "Peter N Robinson"

  • Peter N Robinson's recent research focuses on the integration of genomic data and clinical phenotyping to improve the understanding and diagnosis of rare genetic diseases, utilizing tools like the LIRICAL algorithm and the GA4GH Phenopacket Schema.
  • His studies investigate the relationship between alternative splicing and gene expression, highlighting the need for a deeper understanding of co-transcriptional regulation and its implications for variably expressed genes.
  • Robinson also explores the utilization of innovative technologies, including artificial intelligence and comprehensive ontologies, to enhance diagnostic accuracy and biocuration processes in medical research and clinical practices.