Native Hawaiian and Pacific Islander (NHPI) populations are often aggregated into broad racial categories, obscuring potential disparities. This study leverages an expanded race/ethnicity lexicon and natural language processing (NLP) to identify documentation of NHPI subgroups to address gaps in electronic health records' (EHRs) recorded race. Results demonstrate the potential of NLP to classify NHPI documentation, disaggregate legacy categories, and improve health equity by incorporating more detailed subgroup data into standardized healthcare data sets.
View Article and Find Full Text PDFStud Health Technol Inform
August 2025
Balancing operational feasibility with the performance of natural language processing (NLP) systems is a significant challenge. This study presents a hybrid strategy to integrate manually curated rules, small language model (SLM), and large language model (LLM) for cohort identification tasks. This approach demonstrates superior performance in terms of both computational efficiency and NLP validity, as shown here in two separate tasks using large number of clinical notes from the US Department of Veteran Affairs (VA) Healthcare system.
View Article and Find Full Text PDFThe widespread adoption of real-world data has given rise to numerous healthcare-distributed research networks, but multi-site analyses still face administrative burdens and data privacy challenges. In response, we developed a Collaborative One-shot Lossless Algorithm for Generalized Linear Mixed Models (COLA-GLMM), the first-ever algorithm that achieves both lossless and one-shot properties. COLA-GLMM ensures accuracy against the gold standard of pooled data while requiring only summary statistics and completes within a single communication round, eliminating the usual back-and-forth overhead.
View Article and Find Full Text PDFDiabetes Care
August 2025
Objective: To assess the association between glucagon-like peptide 1 receptor agonist (GLP-1RA) use and risk of incident thyroid tumors.
Research Design And Methods: The retrospective, active-comparator new-user cohort study used international administrative claims and electronic health record databases. Participants included patients with type 2 diabetes mellitus (T2DM) with prior metformin therapy initiating a GLP-1RA versus new users of sodium-glucose cotransporter 2 inhibitors (SGLT2is), dipeptidyl peptidase 4 inhibitors (DPP-4is), and sulfonylureas (SUs).
Importance: Pharmacogenetics can improve medication-related outcomes by optimizing efficacy and minimizing adverse effects. It is unknown whether the presence of drug-gene interactions (DGIs) at the time of surgery results in adverse outcomes in the postoperative setting.
Objective: To determine the association of active DGIs on postsurgical outcomes following vascular surgery procedures.
CYP2C19 loss-of-function (LOF) alleles decrease the antiplatelet effect of clopidogrel following percutaneous coronary intervention (PCI) in patients presenting with acute coronary syndrome (ACS). The impact of genotype in patients undergoing PCI for stable ischemic heart disease (SIHD) in real-world populations is less clear. We determined time to major adverse cardiac event (MACE), defined as the composite of cardiovascular death, stroke, or myocardial infarction, within 12āmonths following PCI in the VA Million Veteran Program (MVP) participants treated with clopidogrel from 1/1/2009 to 9/30/2017.
View Article and Find Full Text PDFInt J Radiat Oncol Biol Phys
October 2025
Purpose: This study aims to develop a robust methodology using structured and semistructured health data to identify patients who have undergone radiation therapy, thereby facilitating future research on treatment outcomes.
Methods And Materials: In this retrospective cohort study, we identified Veterans receiving radiation oncology care through documentation of referrals, encounters, and billing codes from 2014 to 2023. We classified administrative codes based on the process of care and type of radiation received and then analyzed utilization patterns.
Importance: Semaglutide, a glucagonlike peptide-1 receptor agonist (GLP-1RA), has recently been implicated in cases of nonarteritic anterior ischemic optic neuropathy (NAION), raising safety concerns in the treatment of type 2 diabetes (T2D).
Objective: To investigate the potential association between semaglutide and NAION in the Observational Health Data Sciences and Informatics (OHDSI) network.
Design, Setting, And Participants: This was a retrospective study across 14 databases (6 administrative claims and 8 electronic health records).
Background: Fluoroquinolones (FQs) are commonly used to treat urinary tract infections (UTIs), but some studies have suggested they may increase the risk of aortic aneurysm or dissection (AA/AD). However, no large-scale international study has thoroughly assessed this risk.
Methods: A retrospective cohort study was conducted using a large, distributed network analysis across 14 databases from 5 countries (United States, South Korea, Japan, Taiwan, and Australia).
JCO Clin Cancer Inform
February 2025
Purpose: Despite the frequency with which patients with cancer receive radiotherapy, integrating radiation oncology data with other aspects of the clinical record remains challenging because of siloed and variable software systems, high data complexity, and inconsistent data encoding. Recognizing these challenges, the Veterans Affairs (VA) National Radiation Oncology Program (NROP) is developing Granular Radiotherapy Information Database (GRID), a platform and pipeline to combine radiotherapy data across the VA with the goal of both better understanding treatment patterns and outcomes and enhancing research and data analysis capabilities.
Methods: This study represents a proof-of-principle retrospective cohort analysis and review of select radiation treatment data from the VA Radiation Oncology Quality Surveillance Program (VAROQS) initiative.
JCO Clin Cancer Inform
February 2025
Purpose: This study introduces an integrated approach using structured and unstructured data from an electronic health record to identify and characterize patient utilization of hereditary cancer genetic testing among patients with metastatic castration-resistant prostate cancer (mCRPC). Secondary objectives were to describe factors associated with the receipt of testing.
Methods: This retrospective cohort study included a cohort of Veterans diagnosed with mCRPC from January 2016 to December 2021.
Context.ā: Quality communication between clinicians and pathologists is required for optimal cancer care. The College of American Pathologists provides anatomic site-specific cancer protocols that facilitate synoptic reporting for efficient communication, contributing to accuracy and completeness of cancer staging.
View Article and Find Full Text PDFBackground: Fibrosis-4 (FIB4) is a recommended noninvasive test to assess hepatic fibrosis among patients with metabolic dysfunction-associated steatotic liver disease (MASLD). Here, we used FIB4 trajectory over time (ie, "slope" of FIB4) as a surrogate marker of liver fibrosis progression and examined if FIB4 slope is associated with clinical and genetic factors among individuals with clinically defined MASLD within the Million Veteran Program Cohort.
Methods: In this retrospective cohort study, FIB4 slopes were estimated through linear regression for participants with clinically defined MASLD and FIB4 <2.
Background: This study aims to assess the impact of healthy lifestyle on prostate cancer (PCa) risk in a diverse population.
Methods: Data for 281,923 men from the Million Veteran Program (MVP), a nationwide, health system-based cohort study, were analyzed. Self-reported information at enrollment included smoking status, exercise, diet, family history of PCa, and race/ethnicity.
Background: The US government considers veterans to have been exposed to Agent Orange if they served in Vietnam while the carcinogen was in use, and these veterans are often deemed at high risk of prostate cancer (PCa). Here, we assess whether presumed Agent Orange exposure is independently associated with increased risk of any metastatic or fatal PCa in a diverse Veteran cohort still alive in the modern era (at least 2011), when accounting for race/ethnicity, family history, and genetic risk.
Patients And Methods: Participants in the Million Veteran Program (MVP; enrollment began in 2011) who were on active duty during the Vietnam War era (August 1964-April 1975) were included (n = 301,470).
Objective: To use natural language processing (NLP) of clinical notes to augment existing structured electronic health record (EHR) data for classification of a patient's menopausal status.
Materials And Methods: A rule-based NLP system was designed to capture evidence of a patient's menopause status including dates of a patient's last menstrual period, reproductive surgeries, and postmenopause diagnosis as well as their use of birth control and menstrual interruptions. NLP-derived output was used in combination with structured EHR data to classify a patient's menopausal status.
medRxiv
February 2024
Stud Health Technol Inform
January 2024
Electronic Nicotine Delivery Systems (ENDS) use has increased substantially in the United States since 2010. To date, there is limited evidence regarding the nature and extent of ENDS documentation in the clinical note. In this work we investigate the effectiveness of different approaches to identify a patient's documented ENDS use.
View Article and Find Full Text PDFStud Health Technol Inform
January 2024
Standardized operational definitions are an important tool to improve reproducibility of research using secondary real-world healthcare data. This approach was leveraged for studies evaluating the effectiveness of AZD7442 as COVID-19 pre-exposure prophylaxis across multiple healthcare systems. Value sets were defined, grouped, and mapped.
View Article and Find Full Text PDFStud Health Technol Inform
January 2024
Natural language processing (NLP) tools can automate the identification of cancer patients eligible for specific pathways. We developed and validated a cancer agnostic, rules-based NLP framework to extract the dimensions and measurements of several concepts from pathology and radiology reports. This framework was then efficiently and cost-effectively deployed to identify patients eligible for breast, lung, and prostate cancers clinical pathways.
View Article and Find Full Text PDFJ Natl Cancer Inst
May 2024
Pharmacogenetic (PGx) testing before initiation of thiopurine treatment and CBC monitoring post-initiation helps avoid adverse events and ensure patient safety. This study aims to evaluate trends in PGx testing and CBC monitoring among Veterans prescribed azathioprine, thioguanine, or mercaptopurine to demonstrate VA's efforts to improve medication safety after an adverse event. To assess testing patterns, we used VA electronic health report data to identify 20,524 Veterans who first began thiopurine treatment between January 1, 2010, to December 31, 2021.
View Article and Find Full Text PDF