The DRAGON benchmark for clinical NLP.

Joeran S Bosma , Koen Dercksen , Luc Builtjes , Romain André , Christian Roest , Stefan J Fransen , Constant R Noordman , Mar Navarro-Padilla , Judith Lefkes , Natália Alves , Max J J de Grauw , Leander van Eekelen , Joey M A Spronck , Megan Schuurmans , Bram de Wilde , Ward Hendrix , Witali Aswolinskiy , Anindo Saha , Jasper J Twilt , Daan Geijs

NPJ Digit Med

Diagnostic Image Analysis Group, Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands.

Published: May 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Artificial Intelligence can mitigate the global shortage of medical diagnostic personnel but requires large-scale annotated datasets to train clinical algorithms. Natural Language Processing (NLP), including Large Language Models (LLMs), shows great potential for annotating clinical data to facilitate algorithm development but remains underexplored due to a lack of public benchmarks. This study introduces the DRAGON challenge, a benchmark for clinical NLP with 28 tasks and 28,824 annotated medical reports from five Dutch care centers. It facilitates automated, large-scale, cost-effective data annotation. Foundational LLMs were pretrained using four million clinical reports from a sixth Dutch care center. Evaluations showed the superiority of domain-specific pretraining (DRAGON 2025 test score of 0.770) and mixed-domain pretraining (0.756), compared to general-domain pretraining (0.734, p < 0.005). While strong performance was achieved on 18/28 tasks, performance was subpar on 10/28 tasks, uncovering where innovations are needed. Benchmark, code, and foundational LLMs are publicly available.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12084576	PMC
http://dx.doi.org/10.1038/s41746-025-01626-x	DOI Listing

Publication Analysis

Top Keywords

benchmark clinical

clinical nlp

dutch care

clinical

dragon benchmark

nlp artificial

artificial intelligence

intelligence mitigate

mitigate global

global shortage

Similar Publications

Variations in Dentists' Thresholds for Restorative Treatment of Active Non-Cavitated Carious Lesions: A Multinational Cross-sectional Study.

J Dent

September 2025

Department of Endodontics, Recep Tayyip Erdogan University, Turkey. Electronic address:

Ömer Hatipoğlu , Nessrin Taha , Mohmed Isaqali Karobari , Thiyezen Abdullah Aldhelai , Daoud M Ayyad

Objectives: To assess patterns across 21 countries in dentists' thresholds for initiating operative treatment of active non-cavitated carious lesions and to evaluate the influence of caries risk, clinician characteristics, and geographic variation on decision-making in accordance with current guidelines.

Methods: A cross-sectional, vignette-style web-based survey was conducted between June and October 2023 across 21 countries. A standardized questionnaire, comprising theoretical radiographic scenarios of occlusal and approximal active non-cavitated carious lesions at four progressive stages (E1,E2,EDJ,D1), was distributed to general dentists and specialists.

View Article and Find Full Text PDF

Similar Publications

Cross-cultural adaptation, validity and reliability of the Dutch version of the Lymphedema Symptom Intensity and Distress Survey-Head and Neck version 2.0 (LSIDS-H&N v2.0) in head and neck cancer patients.

Disabil Rehabil

September 2025

Department of Rehabilitation Sciences and Physiotherapy, University of Leuven, Leuven, Belgium.

Kaat Van Aperen , Sandra Nuyts , Nele Devoogdt , Thierry Troosters , Tessa De Vrieze

Purpose: This study aims to cross-culturally validate the Dutch version of the Lymphedema Symptom Intensity and Distress Survey-Head and Neck version 2.0 (LSIDS-H&N v2.0).

View Article and Find Full Text PDF

Similar Publications

Textbook Outcomes and Minimally Invasive Techniques in Resectable Gallbladder Cancer: A Global Cohort Study.

Eur J Surg Oncol

July 2025

General Surgery Unit, Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, PISA, Italy.

Simone Cremona , Benedetto Ielpo , Marcello Di Martino , Mauro Podda , Gregorio Di Franco

Introduction: Surgery for resectable gallbladder cancer (GbC) encompasses complex operative management, and evaluating surgical quality through textbook outcome (TO) is crucial. This study aimed to assess TO incidence and impact in a global cohort, identify independent predictors, and evaluate TO rates of minimally invasive (MI) techniques, including robotic (ROB) and laparoscopic (LPS).

Materials And Methods: This cohort study included patients undergoing curative-intent hepatectomy and lymphadenectomy for GbC (T1b-T3) from 2012 to 2023 in 41 hospitals.

View Article and Find Full Text PDF

Similar Publications

Performance comparison of germline variant calling tools in sporadic disease cohorts.

Mol Genet Genomics

September 2025

Human Phenome Institute, MOE Key Laboratory of Contemporary Anthropology, Zhangjiang Fudan International Innovation Center, Fudan University, 825 Zhangheng Road, Shanghai, 201203, China.

Qiaofeng Song , Jinglan Zhai , Changshui Chen , Haibo Li , Aihua Cao

Accurate variant calling is essential for next-generation sequencing (NGS)-based diagnosis of rare diseases, yet most benchmarking studies have focused on standard cell lines or trio-based samples, with limited relevance to sporadic cases. Here, we systematically compared the performance of DeepVariant and GATK HaplotypeCaller in two Chinese cohorts of patients with sporadic epilepsy (EP) and autism spectrum disorder (ASD). DeepVariant exhibited higher precision and sensitivity in detecting single nucleotide variants (SNVs), while GATK showed a distinct advantage in identifying rare variants, which are often key to understanding the genetic basis of rare diseases.

View Article and Find Full Text PDF

Similar Publications

Comparison of a guidewire with an uninsulated tip versus dedicated radiofrequency wire with discrete electrode for energy-based transseptal puncture: a pre-clinical study.

J Interv Card Electrophysiol

September 2025

Texas Cardiac Arrhythmia Institute, St. David's Medical Center, 3000 N Interstate 35, Suite 700, Austin, TX, 78705, USA.

Amin Al-Ahmad , Pamela Horton Embrey , Rodney Horton , Christian Balkovec , Rhodaba Ebady

Background: Dedicated radiofrequency (RF) needles and wires for transseptal puncture (TSP) achieve better outcomes vs. electrified open-ended needles and guidewires due to optimized electrode design and energy delivery. This study benchmarked TSP performance between the dedicated VersaCross wire system (VC; Boston Scientific) and an electrified guidewire with an alternative electrode configuration similar to commercially available devices.

View Article and Find Full Text PDF

Similar Publications