Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Multi-center electronic health records (EHR) can support quality improvement initiatives and comparative effectiveness research in stroke care. However, limitations of EHR-based research include challenges in abstracting key clinical variables from non-structured data at scale. This is further compounded by missing data. Here we develop a natural language processing (NLP) model that automatically reads EHR notes to determine the NIH stroke scale (NIHSS) score of patients with acute stroke.

Methods: The study included notes from acute stroke patients (>= 18 years) admitted to the Massachusetts General Hospital (MGH) (2015-2022). The MGH data were divided into training (70%) and hold-out test (30%) sets. A two-stage model was developed to predict the admission NIHSS. A linear model with the least absolute shrinkage and selection operator (LASSO) was trained within the training set. For notes in the test set where the NIHSS was documented, the scores were extracted using regular expressions (stage 1), for notes where NIHSS was not documented, LASSO was used for prediction (stage 2). The reference standard for NIHSS was obtained from Get With The Guidelines Stroke Registry. The two-stage model was tested on the hold-out test set and validated in the MIMIC-III dataset (Medical Information Mart for Intensive Care-MIMIC III 2001-2012) v1.4, using root mean squared error (RMSE) and Spearman correlation (SC).

Results: We included 4,163 patients (MGH = 3,876; MIMIC = 287); average age of 69 [SD 15] years; 53% male, and 72% white. 90% patients had ischemic stroke and 10% hemorrhagic stroke. The two-stage model achieved a RMSE [95% CI] of 3.13 [2.86-3.41] (SC = 0.90 [0.88-0. 91]) in the MGH hold-out test set and 2.01 [1.58-2.38] (SC = 0.96 [0.94-0.97]) in the MIMIC validation set.

Conclusions: The automatic NLP-based model can enable large-scale stroke severity phenotyping from EHR and therefore support real-world quality improvement and comparative effectiveness studies in stroke.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10980121PMC
http://dx.doi.org/10.1101/2024.03.08.24304011DOI Listing

Publication Analysis

Top Keywords

hold-out test
12
two-stage model
12
test set
12
stroke
9
stroke severity
8
electronic health
8
health records
8
natural language
8
language processing
8
ehr support
8

Similar Publications

Human induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) are an important resource for identifying novel therapeutic targets and cardioprotective drugs. However, a key limitation of iPSC-CMs is their immature, fetal-like phenotype. Cultivation of iPSC-CMs in lipid-supplemented maturation media (MM) enhances the structural, metabolic and electrophysiological properties of iPSC-CMs.

View Article and Find Full Text PDF

Background: Identifying neuroinfectious disease (NID) cases using International Classification of Diseases billing codes is often imprecise, while manual chart reviews are labor-intensive. Machine learning models can leverage unstructured electronic health records to detect subtle NID indicators, process large data volumes efficiently, and reduce misclassification. While accurate NID classification is needed for research and clinical decision support, using unstructured notes for this purpose remains underexplored.

View Article and Find Full Text PDF

Machine Learning for Predicting Recurrent Course in Uveitis Using Baseline Clinical Characteristics.

Invest Ophthalmol Vis Sci

August 2025

Programme for Ocular Inflammation & Infection Translational Research, Department of Ophthalmology, National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore.

Purpose: We developed and evaluated machine learning models for predicting the risk of recurrent uveitis using baseline clinical characteristics, to inform clinical decision-making and risk stratification.

Methods: A retrospective analysis was conducted using the Ocular Autoimmune Systemic Inflammatory Infectious Study registry, including 966 patients (1432 eyes) with uveitis. Three machine learning classifiers-random Forest, eXtreme Gradient Boosting, and radial basis function support vector classifier-were trained on preprocessed baseline demographic and clinical data.

View Article and Find Full Text PDF

Purpose: To quantify population-specific differences in prostate cancer (PCa) presentation between African American (AA) and White (W) men on MRI using radiomics.

Materials And Methods: We identified N = 149 men with PCa who underwent 3T MRI, a confirmatory biopsy and for whom self-reported race was available. Patient studies were partitioned into training (D) and hold-out test set (D).

View Article and Find Full Text PDF

Prediction model for intrapartum cesarean delivery among women with gestational diabetes mellitus.

Arch Gynecol Obstet

August 2025

Department of Obstetrics and Gynecology, Lis Hospital for Women's Health, Tel Aviv Sourasky Medical Center, 6 Weizmann St, 6423906, Tel Aviv, Israel.

Purpose: To identify risk factors and to develop a predictive model for cesarean delivery (CD) in women with gestational diabetes mellitus (GDM).

Study Design: A retrospective cohort study, in a single university-affiliated tertiary medical center, was performed. All women with GDM and a singleton pregnancy who had a trial of labor between 2011 and 2023 were included.

View Article and Find Full Text PDF