RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions.

Gregory Kell , Angus Roberts , Serge Umansky , Yuti Khare , Najma Ahmed , Nikhil Patel , Chloe Simela , Jack Coumbe , Julian Rozario , Ryan-Rhys Griffiths , Iain J Marshall

AMIA Annu Symp Proc

King's College London, London, Greater London, United Kingdom.

Published: May 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Clinical question answering systems have the potential to provide clinicians with relevant and timely answers to their questions. Nonetheless, despite the advances that have been made, adoption of these systems in clinical settings has been slow. One issue is a lack of question-answering datasets which reflect the real-world needs of health professionals. In this work, we present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM. We describe the process for generating and verifying the QA pairs and assess several QA models on BioASQ and RealMedQA to assess the relative difficulty of matching answers to questions. We show that the LLM is more cost-efficient for generating "ideal" QA pairs. Additionally, we achieve a lower lexical similarity between questions and answers than BioASQ which provides an additional challenge to the top two QA models, as per the results. We release our code and our dataset publicly to encourage further research.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12099375	PMC

Publication Analysis

Top Keywords

question answering

dataset realistic

realistic clinical

clinical questions

answers questions

questions

realmedqa pilot

pilot biomedical

biomedical question

answering dataset

Similar Publications

Representing subpopulations with latent profile analysis: a non-technical introduction using exercisers' goal orientation adoption profiles.

J Behav Med

September 2025

Department of Psychology, University of Wisconsin-La Crosse, La Crosse, WI, USA.

E Whitney G Moore , Alessandro Quartiroli

Latent profile analysis (LPA) is in the finite mixture model analysis family and identifies subgroups by participants' responses to continuous variables (i.e., indicators); participants' probable membership in each subgroup is based on the similarity between the subgroup's prototypical responses and the person's unique responses.

View Article and Find Full Text PDF

Similar Publications

Comparative performance of neurosurgery-specific, peer-reviewed versus general AI chatbots in bilingual board examinations: evaluating accuracy, consistency, and error minimization strategies.

Acta Neurochir (Wien)

September 2025

Department of Neurosurgery, Istinye University, Istanbul, Turkey.

Mahmut Çamlar , Umut Tan Sevgi , Gökberk Erol , Furkan Karakaş , Yücel Doğruel

Background: Recent studies suggest that large language models (LLMs) such as ChatGPT are useful tools for medical students or residents when preparing for examinations. These studies, especially those conducted with multiple-choice questions, emphasize that the level of knowledge and response consistency of the LLMs are generally acceptable; however, further optimization is needed in areas such as case discussion, interpretation, and language proficiency. Therefore, this study aimed to evaluate the performance of six distinct LLMs for Turkish and English neurosurgery multiple-choice questions and assess their accuracy and consistency in a specialized medical context.

View Article and Find Full Text PDF

Similar Publications

Time Allotted for Examination Item Types in Nursing Education.

J Nurs Educ

September 2025

Wolters Kluwer Health, New York, New York; and.

Vicki Moran , Sheila Chery , Heidi Israel , Olivia Moran

Background: Examinations are used widely in nursing education to evaluate knowledge attainment. New item types were initiated in April 2023 by the National Council of State Boards of Nursing (NCSBN) for use on the Next Generation National Council Licensure Examination for Registered Nurses (NGN NCLEX-RN). Little evidence exists for how much time is needed for exams that use the new item types.

View Article and Find Full Text PDF

Similar Publications

The impact of curricular revision on student performance in pharmacology assessments.

Br J Clin Pharmacol

September 2025

University of South Carolina, School of Medicine Greenville, Greenville, SC, USA.

Erin R Weeda , Kelly M Quesnelle

Aims: We implemented changes to a medical school curriculum aimed at boosting active learning and integrated instruction. Using the second level of Kirkpatrick's model, we describe the impact of the curricular revision on student performance in pharmacology assessments.

Methods: The analysis was divided into legacy (n = 105) and new (n = 110) curriculum students.

View Article and Find Full Text PDF

Similar Publications

Is There Harm in Asking? Relative Distress and Cost-Benefit of Sexual Assault-Focused Research Participation Among College-Aged Women.

J Interpers Violence

September 2025

Case Western Reserve University, Cleveland, OH, USA.

Elsa K Mattson , Jenna M Bagley , Cailan Splaine , Lori A Zoellner , Norah C Feeny

Given concerns about possible "retraumatization" among individuals who participate in research examining the impact of sexual assault (SA), this study explored college student perceptions of participation in a longitudinal SA-focused study. Participants ( = 124) were college women who had (21%) or had not (79%) experienced SA in the past 12 months. At each of five timepoints (baseline through 12-month follow-up), they reported sexual trauma history and trauma-related psychopathology, completed a written narrative of their SA (if endorsed), and answered three questions about distress and cost-benefit of their participation.

View Article and Find Full Text PDF

Similar Publications