LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models.

WonJin Yoon , Shan Chen , Yanjun Gao , Zhanzhan Zhao , Dmitriy Dligach , Danielle S Bitterman , Majid Afshar , Timothy Miller

medRxiv

Published: July 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objective: The application of Natural Language Processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models (LMs) capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent.

Materials And Methods: To address this issue, we propose LCD benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of MIMIC-IV and statewide death data. We evaluated this benchmark dataset using baseline models, from bag-of-words and CNN to instruction-tuned large language models. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations.

Results And Discussion: Baseline models showed 28.9% for best-performing supervised models and 32.2% for GPT-4 in F1-metrics. Notes in our dataset have a median word count of 1687. Our analysis of the model outputs showed that our dataset is challenging for both models and human experts, but the models can find meaningful signals from the text.

Conclusion: We expect our LCD benchmark to be a resource for the development of advanced supervised models, or prompting methods, tailored for clinical text. The benchmark dataset is available at https://github.com/Machine-Learning-for-Medical-Language/long-clinical-doc.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10996733	PMC
http://dx.doi.org/10.1101/2024.03.26.24304920	DOI Listing

Publication Analysis

Top Keywords

lcd benchmark

language models

benchmark datasets

models

benchmark

long clinical

clinical document

benchmark dataset

baseline models

analysis model

Similar Publications

A novel lung cancer diagnosis model using hybrid convolution (2D/3D)-based adaptive DenseUnet with attention mechanism.

Network

August 2025

Department of CSE, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences (SIMATS), Chennai, India.

J Deepa , Liya Badhu Sasikala , P Indumathy , A Jerrin Simla

Existing Lung Cancer Diagnosis (LCD) models have difficulty in detecting early-stage lung cancer due to the asymptomatic nature of the disease which leads to an increased death rate of patients. Therefore, it is important to diagnose lung disease at an early stage to save the lives of affected persons. Hence, the research work aims to develop an efficient lung disease diagnosis using deep learning techniques for the early and accurate detection of lung cancer.

View Article and Find Full Text PDF

Similar Publications

Surveillance of hoof disorders in Korean dairy cattle and the correlation of farm condition risk factors to their prevalence.

BMC Vet Res

March 2025

Department of Animal Science and Technology, Sunchon National University, 255 Jungang-ro, Suncheon-si, Jeollanam-do, 57922, South Korea.

Hector M Espiritu , Seok-Won Kwon , Sang-Suk Lee , Yong-Il Cho

Background: This study investigated the prevalence of hoof disorders (HDs) in intensive dairy farms in Korea and their association with farm conditions. A total of 877 cattle from 15 farms were examined for infectious, noninfectious, and non-lesion HDs at the animal, foot, and farm levels. Risk factors such as bedding depth, floor wetness, floor elevation transitions, and aggressive hoof treatment were evaluated.

View Article and Find Full Text PDF

Similar Publications

Benchmarking residue-resolution protein coarse-grained models for simulations of biomolecular condensates.

PLoS Comput Biol

January 2025

Department of Physical-Chemistry, Complutense University of Madrid, Madrid, Spain.

Alejandro Feito , Ignacio Sanchez-Burgos , Ignacio Tejero , Eduardo Sanz , Antonio Rey

Intracellular liquid-liquid phase separation (LLPS) of proteins and nucleic acids is a fundamental mechanism by which cells compartmentalize their components and perform essential biological functions. Molecular simulations play a crucial role in providing microscopic insights into the physicochemical processes driving this phenomenon. In this study, we systematically compare six state-of-the-art sequence-dependent residue-resolution models to evaluate their performance in reproducing the phase behaviour and material properties of condensates formed by seven variants of the low-complexity domain (LCD) of the hnRNPA1 protein (A1-LCD)-a protein implicated in the pathological liquid-to-solid transition of stress granules.

View Article and Find Full Text PDF

Similar Publications

LCD benchmark: long clinical document benchmark on mortality prediction for language models.

J Am Med Inform Assoc

February 2025

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02215, United States.

WonJin Yoon , Shan Chen , Yanjun Gao , Zhanzhan Zhao , Dmitriy Dligach

Objectives: The application of natural language processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent.

View Article and Find Full Text PDF

Similar Publications

Improving laryngeal cancer detection using chaotic metaheuristics integration with squeeze-and-excitation resnet model.

Health Inf Sci Syst

December 2024

Department of Computer Science, Applied College, Prince Sattam Bin Abdulaziz University, Kharj, Saudi Arabia.

Sana Alazwari , Mashael Maashi , Jamal Alsamri , Mohammad Alamgeer , Shouki A Ebad

Laryngeal cancer (LC) represents a substantial world health problem, with diminished survival rates attributed to late-stage diagnoses. Correct treatment for LC is complex, particularly in the final stages. This kind of cancer is a complex malignancy inside the head and neck region of patients.

View Article and Find Full Text PDF

Similar Publications