Publications by Peter N Robinson | LitMetric

Publications by authors named "Peter N Robinson"

Page 1 of 14

Leveraging generative AI to assist biocuration of medical actions for rare disease.

Enock Niyonkuru , J Harry Caufield , Leigh C Carmody , Michael A Gargano , Sabrina Toro , Peter N Robinson

Bioinform Adv

June 2025

Motivation: Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for clinical management. Currently, manual biocuration is used to annotate MAxO terms to rare diseases.

View Article and Find Full Text PDF

Integrating Knowledge: The Power of Ontologies in Psychiatric Research and Clinical Informatics.

Melvin G McInnis , Ben Coleman , Eric Hurwitz , Peter N Robinson , Andrew E Williams

Biol Psychiatry

August 2025

Ontologies are structured frameworks for representing knowledge by systematically defining concepts, categories, and their relationships. While widely adopted in biomedicine, ontologies remain largely absent in mental health research and clinical care, where the field continues to rely heavily on existing classification systems (e.g.

View Article and Find Full Text PDF

Linking international registries to FHIR and Phenopackets with RareLink: a scalable REDCap-based framework for rare disease data interoperability.

Adam S L Graefe , Filip Rehburg , Samer Alkarkoukly , Daniel Danis , Ana Grönke , Peter N Robinson

medRxiv

May 2025

While Research Electronic Data Capture (REDCap) has been widely adopted in rare disease research, its unconstrained data format often leads to implementations that lack native interoperability with global health data standards, limiting secondary data use. To address this, we developed and validated , an open-source framework implementing our previously-published ontology-based rare disease common data model, enabling standardised data exchange between REDCap, international registries, and downstream analysis tools. Its preconfigured pipelines interact with the local REDCap application programming interface and enable semi-automatic import or export of data to the Global Alliance for Genomics and Health (GA4GH) Phenopackets and Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) instances, conforming to the HL7 International Patient Summary and Genomics Reporting profiles.

View Article and Find Full Text PDF

Replacing non-biomedical concepts improves embedding of biomedical concepts.

Enock Niyonkuru , Mauricio Soto Gomez , Elena Casarighi , Stephan Antogiovanni , Hannah Blau , Peter N Robinson

PLoS One

May 2025

Embeddings are semantically meaningful representations of words in a vector space, commonly used to enhance downstream machine learning applications. Traditional biomedical embedding techniques often replace all synonymous words representing biological or medical concepts with a unique token, ensuring consistent representation and improving embedding quality. However, the potential impact of replacing non-biomedical concept synonyms has received less attention.

View Article and Find Full Text PDF

Fine-tuning of conditional Transformers improves enzyme prediction and generation.

Marco Nicolini , Emanuele Saitto , Ruben Emilio Jimenez Franco , Emanuele Cavalleri , Aldo Javier Galeano Alfonso , Peter N Robinson

Comput Struct Biotechnol J

March 2025

We introduce , a Protein Language Model (PLM) that employs a multifaceted learning strategy based on transfer learning from a decoder-based Transformer, conditional learning using specific functional keywords, and fine-tuning for the modeling of enzymes. Our experiments show that significantly enhances generalist PLMs like ProGen for the prediction and generation of enzymes belonging to specific Enzyme Commission (EC) categories. Our experiments demonstrate that generated sequences can diverge from natural ones, while retaining similar predicted tertiary structure, predicted functions and the active sites of their natural counterparts.

View Article and Find Full Text PDF

Human Phenotype Ontology Annotations for Rare Congenital Conditions: Application to Arthrogryposis Multiplex Congenita.

Shahrzad Nematollahi , Reggie C Hamdy , Harold van Bosse , Joyce Li , Daniel Blanshay-Goldberg , Peter N Robinson

Am J Med Genet A

August 2025

Arthrogryposis multiplex congenita (AMC) represents a large, rare group of congenital conditions. This study addressed major challenges in AMC research posed by the lack of systematic frameworks for data collection and the use of inconsistent terminologies and text descriptions. We aimed to systematically review the Human Phenotype Ontology (HPO) terms, encode AMC phenotypic traits as HPO terms, and pilot test the encoding process in a cohort of children with AMC.

View Article and Find Full Text PDF

Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework.

Yasemin Bridges , Vinicius de Souza , Katherina G Cortes , Melissa Haendel , Nomi L Harris , Peter N Robinson

BMC Bioinformatics

March 2025

Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs-ultimately hindering the development of effective prioritisation tools.

View Article and Find Full Text PDF

GA4GH Phenopacket-Driven Characterization of Genotype-Phenotype Correlations in Mendelian Disorders.

Lauren Rekerle , Daniel Danis , Filip Rehburg , Adam Sl Graefe , Viktor Bily , Peter N Robinson

medRxiv

March 2025

Comprehensively characterizing genotype-phenotype correlations (GPCs) in Mendelian disease would create new opportunities for improving clinical management and understanding disease biology. However, heterogeneous approaches to data sharing, reuse, and analysis have hindered progress in the field. We developed Genotype Phenotype Evaluation of Statistical Association (GPSEA), a software package that leverages the Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema to represent case-level clinical and genetic data about individuals.

View Article and Find Full Text PDF

Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases.

Leonardo Chimirri , J Harry Caufield , Yasemin Bridges , Nicolas Matentzoglu , Michael Gargano , Peter N Robinson

medRxiv

February 2025

Background: Large language models (LLMs) are increasingly used in the medical field for diverse applications including differential diagnostic support. The estimated training data used to create LLMs such as the Generative Pretrained Transformer (GPT) predominantly consist of English-language texts, but LLMs could be used across the globe to support diagnostics if language barriers could be overcome. Initial pilot studies on the utility of LLMs for differential diagnosis in languages other than English have shown promise, but a large-scale assessment on the relative performance of these models in a variety of European and non-European languages on a comprehensive corpus of challenging rare-disease cases is lacking.

View Article and Find Full Text PDF

The Unified Phenotype Ontology : a framework for cross-species integrative phenomics.

Nicolas Matentzoglu , Susan M Bello , Ray Stefancsik , Sarah M Alghamdi , Anna V Anagnostopoulos , Peter N Robinson

Genetics

March 2025

Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses.

View Article and Find Full Text PDF

Rare disease gene association discovery in the 100,000 Genomes Project.

Valentina Cipriani , Letizia Vestito , Emma F Magavern , Julius O B Jacobsen , Gavin Arno , Peter N Robinson

Nature

February 2025

Up to 80% of rare disease patients remain undiagnosed after genomic sequencing, with many probably involving pathogenic variants in yet to be discovered disease-gene associations. To search for such associations, we developed a rare variant gene burden analytical framework for Mendelian diseases, and applied it to protein-coding variants from whole-genome sequencing of 34,851 cases and their family members recruited to the 100,000 Genomes Project. A total of 141 new associations were identified, including five for which independent disease-gene evidence was recently published.

View Article and Find Full Text PDF

An ontology-based rare disease common data model harmonising international registries, FHIR, and Phenopackets.

Adam S L Graefe , Miriam R Hübner , Filip Rehburg , Steffen Sander , Sophie A I Klopfenstein , Peter N Robinson

Sci Data

February 2025

Although rare diseases (RDs) affect over 260 million individuals worldwide, low data quality and scarcity challenge effective care and research. This work aims to harmonise the Common Data Set by European Rare Disease Registry Infrastructure, Health Level 7 Fast Healthcare Interoperability Base Resources, and the Global Alliance for Genomics and Health Phenopacket Schema into a novel rare disease common data model (RD-CDM), laying the foundation for developing international RD-CDMs aligned with these data standards. We developed a modular-based GitHub repository and documentation to account for flexibility, extensions and further development.

View Article and Find Full Text PDF

Phenotypic Expansion of Knobloch Syndrome Type 2 in an Individual With a De Novo PAK2 Variant.

Elizabeth A Werren , Louisa Kalsner , Jessica M Ewald , Michael Peracchio , Cameron King , Peter N Robinson

Am J Med Genet A

June 2025

P21-activated kinase 2 (PAK2) is a serine/threonine kinase essential for a variety of cellular processes including signal transduction, cellular survival, proliferation, and migration. A recent report proposed monoallelic PAK2 variants cause Knobloch syndrome type 2 (KNO2)-a developmental disorder primarily characterized by ocular anomalies. Here, we identified a novel de novo heterozygous missense variant in PAK2, NM_002577.

View Article and Find Full Text PDF

Fetal imaging, phenotyping, and genomic testing in modern prenatal diagnosis.

Matthew A Shear , Peter N Robinson , Teresa N Sparks

Best Pract Res Clin Obstet Gynaecol

February 2025

Genetic tests available in the prenatal setting have expanded rapidly with next generation sequencing, and fetal imaging can detect a breadth of many structural and functional abnormalities. To identify a fetal genetic disease, deep phenotyping is increasingly important to generate a differential diagnosis, choose the most appropriate genetic tests, and inform the results of those tests. The Human Phenotype Ontology (HPO) organizes and defines the features of human disease to support deep phenotyping, and ongoing efforts are being made to improve the scope of the HPO to comprehensively include fetal phenotypes.

View Article and Find Full Text PDF

Intrinsic-dimension analysis for guiding dimensionality reduction and data fusion in multi-omics data processing.

Jessica Gliozzo , Mauricio Soto-Gomez , Valentina Guarino , Arturo Bonometti , Alberto Cabri , Peter N Robinson

Artif Intell Med

February 2025

Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines.

View Article and Find Full Text PDF

IsopretGO-analysing and visualizing the functional consequences of differential splicing.

Guy Karlebach , Peter Hansen , Kristin Köhler , Peter N Robinson

NAR Genom Bioinform

December 2024

Gene Ontology overrepresentation analysis (GO-ORA) is a standard approach towards characterizing salient functional characteristics of sets of differentially expressed genes (DGE) in RNA sequencing (RNA-seq) experiments. GO-ORA compares the distribution of GO annotations of the DGE to that of all genes or all expressed genes. This approach has not been available to characterize differential alternative splicing (DAS).

View Article and Find Full Text PDF

Using paired-end read orientations to assess technical biases in capture Hi-C.

Peter Hansen , Hannah Blau , Jochen Hecht , Guy Karlebach , Alexander Krannich , Peter N Robinson

NAR Genom Bioinform

December 2024

Hi-C and capture Hi-C (CHi-C) both leverage paired-end sequencing of chimeric fragments to gauge the strength of interactions based on the total number of paired-end reads mapped to a common pair of restriction fragments. Mapped paired-end reads can have four relative orientations, depending on the genomic positions and strands of the two reads. We assigned one paired-end read orientation to each of the four possible re-ligations that can occur between two given restriction fragments.

View Article and Find Full Text PDF

Alternative splicing is coupled to gene expression in a subset of variably expressed genes.

Guy Karlebach , Robin Steinhaus , Daniel Danis , Maeva Devoucoux , Olga Anczuków , Peter N Robinson

NPJ Genom Med

November 2024

Numerous factors regulate alternative splicing of human genes at a co-transcriptional level. However, how alternative splicing depends on the regulation of gene expression is poorly understood. We leveraged data from the Genotype-Tissue Expression (GTEx) project to show a significant association of gene expression and splicing for 6874 (4.

View Article and Find Full Text PDF

Leveraging clinical intuition to improve accuracy of phenotype-driven prioritization.

Martha A Beckwith , Daniel Danis , Yasemin Bridges , Julius O B Jacobsen , Damian Smedley , Peter N Robinson

Genet Med

January 2025

Article Synopsis

Clinical intuition plays a crucial role in differential diagnosis, but current algorithms for rare genetic diseases overlook this aspect and assume equal chances for all possible Mendelian diseases.
The new ClintLR algorithm enhances the existing LIRICAL algorithm by adjusting the pretest probabilities of related diseases based on clinical intuition.
Simulation results indicate that ClintLR significantly improves the ranking of accurate diagnoses in genetic sequencing, making it a valuable tool available for free online.

View Article and Find Full Text PDF

A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery.

Daniel Danis , Michael J Bamshad , Yasemin Bridges , Andrés Caballero-Oteyza , Pilar Cacheiro , Peter N Robinson

HGG Adv

January 2025

Article Synopsis

The GA4GH Phenopacket Schema, released in 2022 and approved as a standard by ISO, allows the sharing of clinical and genomic data, including phenotypic descriptions and genetic information, to aid in genomic diagnostics.
Phenopacket Store Version 0.1.19 offers a collection of 6668 phenopackets linked to various diseases and genes, making it a crucial resource for testing algorithms and software in genomic research.
This collection represents the first extensive case-level, standardized phenotypic information sourced from medical literature, supporting advancements in diagnostic genomics and machine learning applications.

View Article and Find Full Text PDF

The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics.

Nicolas Matentzoglu , Susan M Bello , Ray Stefancsik , Sarah M Alghamdi , Anna V Anagnostopoulos , Peter N Robinson

bioRxiv

September 2024

Article Synopsis

Phenotypic data helps us understand how genomic variations affect living organisms and is vital for clinical applications like diagnosing diseases and developing treatments.
The field of phenomics aims to unify and analyze the vast amounts of phenotypic data collected over time, but faces challenges due to inconsistent methods and vocabularies used to record this information.
The Unified Phenotype Ontology (uPheno) framework offers a solution by providing a standardized system for organizing phenotype terms, allowing for better integration of data across different species and improving research on genotype-phenotype associations.

View Article and Find Full Text PDF

Leaving no patient behind! Expert recommendation in the use of innovative technologies for diagnosing rare diseases.

Clara D M van Karnebeek , Anne O'Donnell-Luria , Gareth Baynam , Anaïs Baudot , Tudor Groza , Peter N Robinson

Orphanet J Rare Dis

September 2024

Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed.

View Article and Find Full Text PDF

Operational description of rare diseases: a reference to improve the recognition and visibility of rare diseases.

Chiuhui Mary Wang , Amy Heagle Whiting , Ana Rath , Roberta Anido , Diego Ardigò , Peter N Robinson

Orphanet J Rare Dis

September 2024

Improving health and social equity for persons living with a rare disease (PLWRD) is increasingly recognized as a global policy priority. However, there is currently no international alignment on how to define and describe rare diseases. A global reference is needed to establish a mutual understanding to inform a wide range of stakeholders for actions.

View Article and Find Full Text PDF

Leveraging Generative AI to Accelerate Biocuration of Medical Actions for Rare Disease.

Enock Niyonkuru , J Harry Caufield , Leigh C Carmody , Michael A Gargano , Sabrina Toro , Peter N Robinson

medRxiv

August 2024

Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for the clinical management of patients. Currently, manual biocuration is used to assign MAxO terms to rare diseases, enabling clinical management of rare diseases to be described computationally for use in clinical decision support and mechanism discovery.

View Article and Find Full Text PDF

An ontology-based knowledge graph for representing interactions involving RNA molecules.

Emanuele Cavalleri , Alberto Cabri , Mauricio Soto-Gomez , Sara Bonfitto , Paolo Perlasca , Peter N Robinson

Sci Data

August 2024

The "RNA world" represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to each patient's biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are constantly produced and available from public repositories, they are scattered across different databases and a centralized, uniform, and semantically consistent representation of the "RNA world" is still lacking. We propose RNA-KG, a knowledge graph (KG) encompassing biological knowledge about RNAs gathered from more than 60 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts.

View Article and Find Full Text PDF

Peter N Robinson's recent research focuses on the integration of genomic data and clinical phenotyping to improve the understanding and diagnosis of rare genetic diseases, utilizing tools like the LIRICAL algorithm and the GA4GH Phenopacket Schema.
His studies investigate the relationship between alternative splicing and gene expression, highlighting the need for a deeper understanding of co-transcriptional regulation and its implications for variably expressed genes.
Robinson also explores the utilization of innovative technologies, including artificial intelligence and comprehensive ontologies, to enhance diagnostic accuracy and biocuration processes in medical research and clinical practices.