Evaluating resources composing the PheMAP knowledge base to enhance high-throughput phenotyping.

Nicholas C Wan , Ali A Yaqoob , Henry H Ong , Juan Zhao , Wei-Qi Wei

J Am Med Inform Assoc

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

Published: February 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objective: A previous study, PheMAP, combined independent, online resources to enable high-throughput phenotyping (HTP) using electronic health records (EHRs). However, online resources offer distinct quality descriptions of diseases which may affect phenotyping performance. We aimed to evaluate the phenotyping performance of single resource-based PheMAPs and investigate an optimized strategy for HTP.

Materials And Methods: We compared how each resource produced top-ranked concept unique identifiers (CUIs) by term frequency-inverse document frequency with Jaccard matrices comparing single resources and the original PheMAP. We correlated top-ranked concepts from each resource to features used in established Phenotype KnowledgeBase (PheKB) algorithms for hypothyroidism, type II diabetes mellitus (T2DM), and dementias. Using resources separately, we calculated multiple phenotype risk scores for individuals from Vanderbilt University Medical Center's BioVU DNA Biobank and compared phenotyping performance against rule-based eMERGE algorithms. Lastly, we implemented an ensemble strategy which classified patient case/control status based upon PheMAP resource agreement.

Results: Jaccard similarity matrices indicate that the similarity of CUIs comprising single resource-based PheMAPs varies. Single resource-based PheMAPs generated from MedlinePlus and MedicineNet outperformed others but only encompass 81.6% of overall disease phenotypes. We propose the PheMAP-Ensemble which provides higher average accuracy and precision than the combined average accuracy and precision of single resource-based PheMAPs. While offering complete phenotype coverage, PheMAP-Ensemble significantly increases phenotyping recall compared to the original iteration.

Conclusions: Resources comprising the PheMAP produce different phenotyping performance when implemented individually. The ensemble method significantly improves the quality of PheMAP by fully utilizing dissimilar resources to capture accurate phenotyping data from EHRs.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9933070	PMC
http://dx.doi.org/10.1093/jamia/ocac234	DOI Listing

Publication Analysis

Top Keywords

phenotyping performance

single resource-based

resource-based phemaps

phenotyping

high-throughput phenotyping

online resources

average accuracy

accuracy precision

phemap

resources

A PHP Error was encountered