A Dataset of Medical Questions Paired with Automatically Generated Answers and Evidence-supported References.

Deepak Gupta , Davis Bartels , Dina Demner-Fushman

Sci Data

National Library of Medicine, National Institutes of Health, HHS, Bethesda, MD, USA.

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

New Large Language Models (LLM)-based approaches to medical Question Answering show unprecedented improvements in the fluency, grammaticality, and other qualities of the generated answers. However, the systems occasionally produce coherent, topically relevant, and plausible answers that are not based on facts and may be misleading and even harmful. New types of datasets are needed to evaluate the truthfulness of generated answers and develop reliable approaches for detecting answers that are not supported by evidence. The MedAESQA (Medical Attributable and Evidence Supported Question Answering) dataset presented in this work is designed for developing, fine-tuning, and evaluating language generation models for their ability to attribute or support the stated facts by linking the statements to the relevant passages of reliable sources. The dataset comprises 40 naturally occurring aggregated deidentified questions. Each question has 30 human and LLM-generated answers in which each statement is linked to a scientific abstract that supports it. The dataset provides manual judgments on the accuracy of the statements and the relevancy of the scientific papers.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12179289	PMC
http://dx.doi.org/10.1038/s41597-025-05233-z	DOI Listing

Publication Analysis

Top Keywords

generated answers

question answering

answers

dataset

dataset medical

medical questions

questions paired

paired automatically

automatically generated

answers evidence-supported

Similar Publications

Performance of GPT-4o combined with retrieval-augmented generation on nutritionist licensing exam questions.

Endocr J

September 2025

Institute of Liberal Arts and Science, Kanazawa University, Kanazawa, Japan.

Yu Ishikawa , Akitaka Higashi , Nozomu Arai , Daisuke Ozo , Wataru Hasegawa

GPT-4o, a general-purpose large language model, has a Retrieval-Augmented Variant (GPT-4o-RAG) that can assist in dietary counseling. However, research on its application in this field remains lacking. To bridge this gap, we used the Japanese National Examination for Registered Dietitians as a standardized benchmark for evaluation.

View Article and Find Full Text PDF

Similar Publications

Rethinking Standards for Minimum Short-Term Follow-up Duration for Clinical Outcome in Orthopaedic and Sports Medicine Studies.

Arthroscopy

September 2025

Adnan Saithna , Matthew Salzler , Elizabeth Matzkin , Michael J Rossi

A 2-year minimum follow-up period has generally been preferred in orthopaedic studies. This minimum standard aids comparisons across the literature and helps to ensure methodological rigor. However, in some situations these minimum durations are not required to answer specific research questions and strictly enforcing these requirements poses unnecessary barriers to research by adding cost and complexity, increasing the risk of loss to follow-up, and potentially restricting early dissemination of clinically important findings.

View Article and Find Full Text PDF

Similar Publications

Changes in antral follicle dynamics following weight loss in women with polycystic ovary syndrome.

Hum Reprod

September 2025

Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA.

Faith E Carter , Brittany Y Jarrett , Noah D Lee , Nabiha Zaman , Alexandra M Reich

Study Question: Does weight loss from a hypocaloric dietary intervention improve antral follicle dynamics in women with PCOS?

Summary Answer: During a 3-month hypocaloric dietary intervention, women with PCOS who experienced clinically meaningful weight loss showed more organized antral follicle development including fewer recruitment events, but no change in the overall frequency of selection, dominance, or ovulation.

What Is Known Already: There is a spectrum of disordered antral follicle development in women with PCOS including excessive follicle recruitment and turnover, decreased frequency of selection and dominance, and failure of ovulation. Lifestyle intervention aimed at weight loss is recommended to improve metabolic health in women with PCOS yet benefits on ovarian follicle development and ovulation are unclear.

View Article and Find Full Text PDF

Similar Publications

Evaluation of the accuracy of ChatGPT in answering asthma-related questions.

J Bras Pneumol

September 2025

. Divisão de Pneumologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo (SP) Brasil.

Bruno Pellozo Cerqueira , Vinicius Cappellette da Silva Leite , Carla Gonzaga França , Fernando Sergio Leitão Filho , Sonia Maria Faresin

Objective: To evaluate the quality of ChatGPT answers to asthma-related questions, as assessed from the perspectives of asthma specialists and laypersons.

Methods: Seven asthma-related questions were asked to ChatGPT (version 4) between May 3, 2024 and May 4, 2024. The questions were standardized with no memory of previous conversations to avoid bias.

View Article and Find Full Text PDF

Similar Publications

Universal Wilson Loop Bound of Quantum Geometry.

Phys Rev Lett

August 2025

Princeton University, Department of Physics, Princeton, New Jersey 08544, USA.

Jiabin Yu , Jonah Herzog-Arbeitman , B Andrei Bernevig

We define the absolute Wilson loop winding and prove that it bounds the (integrated) quantum metric from below. This Wilson loop lower bound naturally reproduces the known Chern and Euler bounds of the integrated quantum metric and provides an explicit lower bound of the integrated quantum metric due to the time-reversal protected Z_{2} index, answering a hitherto open question. In general, the Wilson loop lower bound can be applied to any other topological invariants characterized by Wilson loop winding, such as the particle-hole Z_{2} index.

View Article and Find Full Text PDF

Similar Publications