Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Artificial Intelligence can mitigate the global shortage of medical diagnostic personnel but requires large-scale annotated datasets to train clinical algorithms. Natural Language Processing (NLP), including Large Language Models (LLMs), shows great potential for annotating clinical data to facilitate algorithm development but remains underexplored due to a lack of public benchmarks. This study introduces the DRAGON challenge, a benchmark for clinical NLP with 28 tasks and 28,824 annotated medical reports from five Dutch care centers. It facilitates automated, large-scale, cost-effective data annotation. Foundational LLMs were pretrained using four million clinical reports from a sixth Dutch care center. Evaluations showed the superiority of domain-specific pretraining (DRAGON 2025 test score of 0.770) and mixed-domain pretraining (0.756), compared to general-domain pretraining (0.734, p < 0.005). While strong performance was achieved on 18/28 tasks, performance was subpar on 10/28 tasks, uncovering where innovations are needed. Benchmark, code, and foundational LLMs are publicly available.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12084576PMC
http://dx.doi.org/10.1038/s41746-025-01626-xDOI Listing

Publication Analysis

Top Keywords

benchmark clinical
8
clinical nlp
8
dutch care
8
clinical
5
dragon benchmark
4
nlp artificial
4
artificial intelligence
4
intelligence mitigate
4
mitigate global
4
global shortage
4

Similar Publications

Objectives: To assess patterns across 21 countries in dentists' thresholds for initiating operative treatment of active non-cavitated carious lesions and to evaluate the influence of caries risk, clinician characteristics, and geographic variation on decision-making in accordance with current guidelines.

Methods: A cross-sectional, vignette-style web-based survey was conducted between June and October 2023 across 21 countries. A standardized questionnaire, comprising theoretical radiographic scenarios of occlusal and approximal active non-cavitated carious lesions at four progressive stages (E1,E2,EDJ,D1), was distributed to general dentists and specialists.

View Article and Find Full Text PDF

Purpose: This study aims to cross-culturally validate the Dutch version of the Lymphedema Symptom Intensity and Distress Survey-Head and Neck version 2.0 (LSIDS-H&N v2.0).

View Article and Find Full Text PDF

Textbook Outcomes and Minimally Invasive Techniques in Resectable Gallbladder Cancer: A Global Cohort Study.

Eur J Surg Oncol

July 2025

General Surgery Unit, Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, PISA, Italy.

Introduction: Surgery for resectable gallbladder cancer (GbC) encompasses complex operative management, and evaluating surgical quality through textbook outcome (TO) is crucial. This study aimed to assess TO incidence and impact in a global cohort, identify independent predictors, and evaluate TO rates of minimally invasive (MI) techniques, including robotic (ROB) and laparoscopic (LPS).

Materials And Methods: This cohort study included patients undergoing curative-intent hepatectomy and lymphadenectomy for GbC (T1b-T3) from 2012 to 2023 in 41 hospitals.

View Article and Find Full Text PDF

Performance comparison of germline variant calling tools in sporadic disease cohorts.

Mol Genet Genomics

September 2025

Human Phenome Institute, MOE Key Laboratory of Contemporary Anthropology, Zhangjiang Fudan International Innovation Center, Fudan University, 825 Zhangheng Road, Shanghai, 201203, China.

Accurate variant calling is essential for next-generation sequencing (NGS)-based diagnosis of rare diseases, yet most benchmarking studies have focused on standard cell lines or trio-based samples, with limited relevance to sporadic cases. Here, we systematically compared the performance of DeepVariant and GATK HaplotypeCaller in two Chinese cohorts of patients with sporadic epilepsy (EP) and autism spectrum disorder (ASD). DeepVariant exhibited higher precision and sensitivity in detecting single nucleotide variants (SNVs), while GATK showed a distinct advantage in identifying rare variants, which are often key to understanding the genetic basis of rare diseases.

View Article and Find Full Text PDF

Background: Dedicated radiofrequency (RF) needles and wires for transseptal puncture (TSP) achieve better outcomes vs. electrified open-ended needles and guidewires due to optimized electrode design and energy delivery. This study benchmarked TSP performance between the dedicated VersaCross wire system (VC; Boston Scientific) and an electrified guidewire with an alternative electrode configuration similar to commercially available devices.

View Article and Find Full Text PDF