Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs-ultimately hindering the development of effective prioritisation tools.
View Article and Find Full Text PDFComprehensively characterizing genotype-phenotype correlations (GPCs) in Mendelian disease would create new opportunities for improving clinical management and understanding disease biology. However, heterogeneous approaches to data sharing, reuse, and analysis have hindered progress in the field. We developed Genotype Phenotype Evaluation of Statistical Association (GPSEA), a software package that leverages the Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema to represent case-level clinical and genetic data about individuals.
View Article and Find Full Text PDFPhenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses.
View Article and Find Full Text PDFUp to 80% of rare disease patients remain undiagnosed after genomic sequencing, with many probably involving pathogenic variants in yet to be discovered disease-gene associations. To search for such associations, we developed a rare variant gene burden analytical framework for Mendelian diseases, and applied it to protein-coding variants from whole-genome sequencing of 34,851 cases and their family members recruited to the 100,000 Genomes Project. A total of 141 new associations were identified, including five for which independent disease-gene evidence was recently published.
View Article and Find Full Text PDFWhole genome sequencing has transformed rare disease research; however, 50-80% of rare disease patients remain undiagnosed after such testing. Regular reanalysis can identify new diagnoses, especially in newly discovered disease-gene associations, but efficient tools are required to support clinical interpretation. Exomiser, a phenotype-driven variant prioritisation tool, fulfils this role; within the 100,000 Genomes Project (100kGP), diagnoses were identified after reanalysis in 463 (2%) of 24,015 unsolved patients after previous analysis for variants in known disease genes.
View Article and Find Full Text PDFThe Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms.
View Article and Find Full Text PDFBackground: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating.
View Article and Find Full Text PDFAdvances in sequencing and imaging technologies enable enhanced assessment in the prenatal space, with a goal to diagnose and predict the natural history of disease, to direct targeted therapies, and to implement clinical management, including transfer of care, election of supportive care, and selection of surgical interventions. The current lack of standardization and aggregation stymies variant interpretation and gene discovery, which hinders the provision of prenatal precision medicine, leaving clinicians and patients without an accurate diagnosis. With large amounts of data generated, it is imperative to establish standards for data collection, processing, and aggregation.
View Article and Find Full Text PDFBridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research.
View Article and Find Full Text PDFNucleic Acids Res
January 2024
The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English.
View Article and Find Full Text PDFBackground: A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery.
View Article and Find Full Text PDFThe Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis.
View Article and Find Full Text PDFBrief Bioinform
September 2022
Yuan et al. recently described an independent evaluation of several phenotype-driven gene prioritization methods for Mendelian disease on two separate, clinical datasets. Although they attempted to use default settings for each tool, we describe three key differences from those we currently recommend for our Exomiser and PhenIX tools.
View Article and Find Full Text PDFStructural variants (SVs) are implicated in the etiology of Mendelian diseases but have been systematically underascertained owing to sequencing technology limitations. Long-read sequencing enables comprehensive detection of SVs, but approaches for prioritization of candidate SVs are needed. Structural variant Annotation and analysis (SvAnna) assesses all classes of SVs and their intersection with transcripts and regulatory sequences, relating predicted effects on gene function with clinical phenotype data.
View Article and Find Full Text PDFPurpose: Genomic test results, regardless of laboratory variant classification, require clinical practitioners to judge the applicability of a variant for medical decisions. Teaching and standardizing clinical interpretation of genomic variation calls for a methodology or tool.
Methods: To generate such a tool, we distilled the Clinical Genome Resource framework of causality and the American College of Medical Genetics/Association of Molecular Pathology and Quest Diagnostic Laboratory scoring of variant deleteriousness into the Clinical Variant Analysis Tool (CVAT).
Rare disease diagnostics and disease gene discovery have been revolutionized by whole-exome and genome sequencing but identifying the causative variant(s) from the millions in each individual remains challenging. The use of deep phenotyping of patients and reference genotype-phenotype knowledge, alongside variant data such as allele frequency, segregation, and predicted pathogenicity, has proved an effective strategy to tackle this issue. Here we review the numerous tools that have been developed to automate this approach and demonstrate the power of such an approach on several thousand diagnosed cases from the 100,000 Genomes Project.
View Article and Find Full Text PDFRare disease patients are more likely to receive a rapid molecular diagnosis nowadays thanks to the wide adoption of next-generation sequencing. However, many cases remain undiagnosed even after exome or genome analysis, because the methods used missed the molecular cause in a known gene, or a novel causative gene could not be identified and/or confirmed. To address these challenges, the RD-Connect Genome-Phenome Analysis Platform (GPAP) facilitates the collation, discovery, sharing, and analysis of standardized genome-phenome data within a collaborative environment.
View Article and Find Full Text PDFAm J Hum Genet
September 2021
A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation.
View Article and Find Full Text PDFHuman Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result.
View Article and Find Full Text PDF