98%
921
2 minutes
20
Importance: Researchers commonly use counts of diagnostic codes from EHR-linked biobanks to infer phenotypic status. However, these approaches overlook temporal changes in EHR data, such as the discontinuation or "dropout" of diagnostic codes, which may exacerbate disparities in genomics research, as EHR data quality can be confounded with demographic attributes.
Objective: To address this, we propose modeling diagnostic code dropout in EHR data to inform phenotyping for schizophrenia in genomic analyses.
Design: We develop and test our diagnostic dropout model by analyzing EHR data from individuals with prior schizophrenia diagnoses. We further validate model performance on a subset of patients whose diagnoses were attained through chart review. Using PRS-CS and existing GWAS summary statistics, we first extrapolate polygenic weights. Then, we apply our dropout model's outputs to construct a data-driven filter defining our target cohort for measuring polygenic score performance.
Setting: Our analysis utilizes EHR and genomic data from the Million Veteran Program.
Participants: To model diagnostic dropout in schizophrenia, we leverage data from 12,739 patients with a history of schizophrenia, after excluding outliers. For polygenic score analyses, we incorporate data from a potential pool of 8,385 European ancestry and 6,806 African ancestry patients with a history of schizophrenia.
Main Outcomes And Measures: We compare the performance of our diagnostic dropout model with alternative methodologies both in predicting diagnostic dropout on a holdout set, as well as on chart review labeled data. Using the top differential diagnosis predictors in our model, we select relevant cases by filtering out patients with a prior history of mood or anxiety disorders. We then test the impact of applying different filters for measuring polygenic score performance.
Results: When evaluated on chart review-labeled data, our model improves the area under the precision-recall curve (AUPRC) by 9.6% compared to competing methods. By applying our data-driven filter for schizophrenia, we achieve a 62% increase in the association effect size when transferring a European polygenic score to an African ancestry target cohort.
Conclusions And Relevance: These findings highlight the potential of modeling diagnostic code dropout to enhance the phenotypic quality of EHR-linked biobank data, advancing more equitable and accurate genomics research across diverse populations.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11838988 | PMC |
http://dx.doi.org/10.1101/2025.01.19.25320806 | DOI Listing |
PLoS One
September 2025
Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
Capturing the dynamic changes in patients' internal states as they approach death due to fatal diseases remains a major challenge in understanding individual pathologies and improving end-of-life care. However, existing methods primarily focus on specific test values or organ dysfunction markers, failing to provide a comprehensive view of the evolving internal state preceding death. To address this, we analyzed electronic health record (EHR) data from a single institution, including 8,976 cancer patients and 77 laboratory parameters, by constructing continuous mortality prediction models based on gradient-boosting decision trees and leveraging them for temporal analyses.
View Article and Find Full Text PDFPLoS One
September 2025
Department of Medicine, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, Malaysia.
There is a lack of longitudinal data on type 2 diabetes (T2D) in low- and middle-income countries. We leveraged the electronic health records (EHR) system of a publicly funded academic institution to establish a retrospective cohort with longitudinal data to facilitate benchmarking, surveillance, and resource planning of a multi-ethnic T2D population in Malaysia. This cohort included 15,702 adults aged ≥ 18 years with T2D who received outpatient care (January 2002-December 2020) from Universiti Malaya Medical Centre (UMMC), Kuala Lumpur, Malaysia.
View Article and Find Full Text PDFJMIR Cancer
September 2025
Department of Health Outcomes and Biomedical Informatics, University of Florida, 1889 Museum Road, Suite 7000, Gainesville, FL, 32611, United States, 1 352 294-5969.
Background: Disparities in cancer burden between transgender and cisgender individuals remain an underexplored area of research.
Objective: This study aimed to examine the cumulative incidence and associated risk factors for cancer and precancerous conditions among transgender individuals compared with matched cisgender individuals.
Methods: We conducted a retrospective cohort study using patient-level electronic health record (EHR) data from the University of Florida Health Integrated Data Repository between 2012 and 2023.
Ann Intern Med
September 2025
Department of Medicine, Johns Hopkins University School of Medicine, and Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland (J.B.S.).
Electronic health record (EHR) data are increasingly used to develop prediction models that guide clinical decision making at the point of care. These include algorithms that use high-frequency data, like in sepsis prediction, as well as simpler equations, such as the Pooled Cohort Equations for cardiovascular outcome prediction. Although EHR data used in prediction models are often highly granular and more current than other data, there is systematic and nonsystematic missingness in EHR data as there is with most data.
View Article and Find Full Text PDFJAMIA Open
October 2025
Applied Clinical Research Center, Children's Hospital of Philadelphia, Philadelphia, PA 19104, United States.
Objective: To develop a natural language processing (NLP) pipeline for unstructured electronic health record (EHR) data to identify symptoms and functional impacts associated with Long COVID in children.
Materials And Methods: We analyzed 48 287 outpatient progress notes from 10 618 pediatric patients from 12 institutions. We evaluated notes obtained 28 to 179 days after a COVID-19 diagnosis or positive test.