Modest performance of text mining to extract health outcomes may be almost sufficient for high-quality prognostic model development.

Zwierd Grotenhuis , Pablo J Mosteiro , Artuur M Leeuwenberg

Comput Biol Med

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, The Netherlands. Electronic address:

Published: March 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Across medicine, prognostic models are used to estimate patient risk of certain future health outcomes (e.g., cardiovascular or mortality risk). To develop (or train) prognostic models, historic patient-level training data is needed containing both the predictive factors (i.e., features) and the relevant health outcomes (i.e., labels). Sometimes, when the health outcomes are not recorded in structured data, these are first extracted from textual notes using text mining techniques. Because there exist many studies utilizing text mining to obtain outcome data for prognostic model development, our aim is to study the impact of the text mining quality on downstream prognostic model performance.

Methods: We conducted a simulation study charting the relationship between text mining quality and prognostic model performance using an illustrative case study about in-hospital mortality prediction in intensive care unit patients. We repeatedly developed and evaluated a prognostic model for in-hospital mortality, using outcome data extracted by multiple text mining models of varying quality.

Results: Interestingly, we found in our case study that a relatively low-quality text mining model (F1 score ≈ 0.50) could already be used to train a prognostic model with quite good discrimination (area under the receiver operating characteristic curve of around 0.80). The calibration of the risks estimated by the prognostic model seemed unreliable across the majority of settings, even when text mining models were of relatively high quality (F1 ≈ 0.80).

Discussion: Developing prognostic models on text-extracted outcomes using imperfect text mining models seems promising. However, it is likely that prognostic models developed using this approach may not produce well-calibrated risk estimates, and require recalibration in (possibly a smaller amount of) manually extracted outcome data.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.compbiomed.2024.108014	DOI Listing

Publication Analysis

Top Keywords

text mining

prognostic model

health outcomes

prognostic models

outcome data

mining models

prognostic

text

mining

model

Similar Publications

Marburg Virus Disease in Rwanda, 2024 - Public Health and Clinical Responses.

N Engl J Med

September 2025

Rwanda Biomedical Center, Kigali.

Sabin Nsanzimana , Eric Remera , Menelas Nkeshimana , Ryan P Westergaard , Tsion Firew

Background: On September 27, 2024, Rwanda reported an outbreak of Marburg virus disease (MVD), after a cluster of cases of viral hemorrhagic fever was detected at two urban hospitals.

Methods: We report key aspects of the epidemiology, clinical manifestations, and treatment of MVD during this outbreak, as well as the overall response to the outbreak. We performed a retrospective epidemiologic and clinical analysis of data compiled across all pillars of the outbreak response and a case-series analysis to characterize clinical features, disease progression, and outcomes among patients who received supportive care and investigational therapeutic agents.

View Article and Find Full Text PDF

Similar Publications

Predicting Unplanned Readmission Risk in Patients With Cirrhosis: Complication-Aware Dynamic Classifier Selection Approach.

JMIR Med Inform

September 2025

College of Medical Informatics, Chongqing Medical University, 1 Yixueyuan Road, Yuzhong District, Chongqing, 400016, China, 86 13500303273.

Zixin Shi , Linjun Huang , Xiaomei Xu , Kexue Pu , Qingpeng Zhang

Background: Cirrhosis is a leading cause of noncancer deaths in gastrointestinal diseases, resulting in high hospitalization and readmission rates. Early identification of high-risk patients is vital for proactive interventions and improving health care outcomes. However, the quality and integrity of real-world electronic health records (EHRs) limit their utility in developing risk assessment tools.

View Article and Find Full Text PDF

Similar Publications

YOLOv11-WBD: A wavelet-bidirectional network with dilated perception for robust metal surface defect detection.

PLoS One

September 2025

Department of Smart Manufacturing, Industrial Perception and Intelligent Manufacturing Equipment Engineering Research Center of Jiangsu Province, Nanjing Vocational University of Industry Technology, Nanjing, Jiangsu, China.

Li Guan , Haitao Zhang , Yijun Zhou , Xinyu Du , Mingxuan Li

In the field of quality control, metal surface defect detection is an important yet challenging task. Although YOLO models perform well in most object detection scenarios, metal surface images under operational conditions often exhibit coexisting high-frequency noise components and spectral aliasing background textures, and defect targets typically exhibit characteristics such as small scale, weak contrast, and multi-class coexistence, posing challenges for automatic defect detection systems. To address this, we introduce concepts including wavelet decomposition, cross-attention, and U-shaped dilated convolution into the YOLO framework, proposing the YOLOv11-WBD model to enhance feature representation capability and semantic mining effectiveness.

View Article and Find Full Text PDF

Similar Publications

Impact of digital addiction on youth health: A systematic review and meta-analysis.

J Behav Addict

September 2025

1School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China.

Blen Dereje Shiferaw , Jie Tang , Yingxue Wang , Yihan Wang , Yuhao Wang

Background And Aims: Digital addiction among youth, characterized by excessive and compulsive use of digital devices such as smartphones, computers, and social media platforms, has become a global concern. The present study aimed to investigate the association between digital addiction subtypes in youth and various health outcomes using "digital addiction" as an umbrella term.

Methods: We comprehensively reviewed articles reporting health outcomes related to digital addiction in youth from the Chinese National Knowledge Infrastructure (CNKI), Wanfang, PubMed, and Web of Science databases using a targeted search strategy and assessed them using predefined inclusion and exclusion criteria.

View Article and Find Full Text PDF

Similar Publications

Thirty years of SPM-BrainMap synergy: making and mining coordinate-based literature.

Cereb Cortex

August 2025

Research Imaging Institute, University of Texas Health Science Center at San Antonio, 8403 Floyd Curl Drive, San Antonio, TX 78229, United States.

Peter T Fox

Statistical Parametric Mapping (SPM) adheres to rigorous methodological standards, including: spatial normalization, inter-subject averaging, voxel-wise contrasts, and coordinate reporting. This rigor ensures that a thematically diverse literature is amenable to meta-analysis. BrainMap is a community database (www.

View Article and Find Full Text PDF

Similar Publications