Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI's ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability. We applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (e.g., clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (i.e. not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information. We developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates. NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10541019PMC
http://dx.doi.org/10.3389/fphar.2023.1180962DOI Listing

Publication Analysis

Top Keywords

electronic health
12
machine learning
8
real-world data
8
health records
8
variables curated
8
extraction
6
variables
6
approach machine
4
learning extraction
4
extraction real-world
4

Similar Publications

Introduction: Medical physicists play a critical role in ensuring image quality and patient safety, but their routine evaluations are limited in scope and frequency compared to the breadth of clinical imaging practices. An electronic radiologist feedback system can augment medical physics oversight for quality improvement. This work presents a novel quality feedback system integrated into the Epic electronic medical record (EMR) at a university hospital system, designed to facilitate feedback from radiologists to medical physicists and technologist leaders.

View Article and Find Full Text PDF

Background: Recent advances in high-throughput sequencing technologies have enabled the collection and sharing of a massive amount of omics data, along with its associated metadata-descriptive information that contextualizes the data, including phenotypic traits and experimental design. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data may hinder reproducibility and reusability and limit secondary analyses.

View Article and Find Full Text PDF

Identifying levels of alcohol use disorder severity in electronic health records.

Subst Abuse Treat Prev Policy

September 2025

Centre for Interdisciplinary Addiction Research (ZIS), Department of Psychiatry and Psychotherapy, University Medical Center Hamburg-Eppendorf (UKE), Martinistraße 52, 20246, Hamburg, Germany.

Background: Alcohol use disorder (AUD) is conceptualized as a dimensional phenomenon in the DSM-5, but electronic health records (EHRs) rely on binary AUD definitions according to the ICD-10. The present study classifies AUD severity levels using EHR data and tests whether increasing AUD severity levels are linked with increased comorbidity.

Methods: Billing data from two German statutory health insurance companies in Hamburg included n = 21,954 adults diagnosed with alcohol-specific conditions between 2017 and 2021.

View Article and Find Full Text PDF

Drought stress affects plant growth and production. To cope with drought stress, plants induced physiological and metabolic changes, serving as a protective approach under drought-stress conditions. The response to drought can vary based on plant type (C3 vs.

View Article and Find Full Text PDF

This study was conducted to investigate the techniques and complications of enlarged uterine extraction during minimally invasive surgery for uterine malignancy. The electronic medical record was queried for patients with uterine malignancy and enlarged uterus (≥ 250 g) who underwent primary hysterectomy with laparoscopic or robotic approach. Statistical analysis was performed using Fisher's exact test for categorical variables and Kruskal-Wallis test for continuous variables.

View Article and Find Full Text PDF