Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

New Large Language Models (LLM)-based approaches to medical Question Answering show unprecedented improvements in the fluency, grammaticality, and other qualities of the generated answers. However, the systems occasionally produce coherent, topically relevant, and plausible answers that are not based on facts and may be misleading and even harmful. New types of datasets are needed to evaluate the truthfulness of generated answers and develop reliable approaches for detecting answers that are not supported by evidence. The MedAESQA (Medical Attributable and Evidence Supported Question Answering) dataset presented in this work is designed for developing, fine-tuning, and evaluating language generation models for their ability to attribute or support the stated facts by linking the statements to the relevant passages of reliable sources. The dataset comprises 40 naturally occurring aggregated deidentified questions. Each question has 30 human and LLM-generated answers in which each statement is linked to a scientific abstract that supports it. The dataset provides manual judgments on the accuracy of the statements and the relevancy of the scientific papers.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12179289PMC
http://dx.doi.org/10.1038/s41597-025-05233-zDOI Listing

Publication Analysis

Top Keywords

generated answers
12
question answering
8
answers
6
dataset
4
dataset medical
4
medical questions
4
questions paired
4
paired automatically
4
automatically generated
4
answers evidence-supported
4

Similar Publications

GPT-4o, a general-purpose large language model, has a Retrieval-Augmented Variant (GPT-4o-RAG) that can assist in dietary counseling. However, research on its application in this field remains lacking. To bridge this gap, we used the Japanese National Examination for Registered Dietitians as a standardized benchmark for evaluation.

View Article and Find Full Text PDF

A 2-year minimum follow-up period has generally been preferred in orthopaedic studies. This minimum standard aids comparisons across the literature and helps to ensure methodological rigor. However, in some situations these minimum durations are not required to answer specific research questions and strictly enforcing these requirements poses unnecessary barriers to research by adding cost and complexity, increasing the risk of loss to follow-up, and potentially restricting early dissemination of clinically important findings.

View Article and Find Full Text PDF

Study Question: Does weight loss from a hypocaloric dietary intervention improve antral follicle dynamics in women with PCOS?

Summary Answer: During a 3-month hypocaloric dietary intervention, women with PCOS who experienced clinically meaningful weight loss showed more organized antral follicle development including fewer recruitment events, but no change in the overall frequency of selection, dominance, or ovulation.

What Is Known Already: There is a spectrum of disordered antral follicle development in women with PCOS including excessive follicle recruitment and turnover, decreased frequency of selection and dominance, and failure of ovulation. Lifestyle intervention aimed at weight loss is recommended to improve metabolic health in women with PCOS yet benefits on ovarian follicle development and ovulation are unclear.

View Article and Find Full Text PDF

Evaluation of the accuracy of ChatGPT in answering asthma-related questions.

J Bras Pneumol

September 2025

. Divisão de Pneumologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo (SP) Brasil.

Objective: To evaluate the quality of ChatGPT answers to asthma-related questions, as assessed from the perspectives of asthma specialists and laypersons.

Methods: Seven asthma-related questions were asked to ChatGPT (version 4) between May 3, 2024 and May 4, 2024. The questions were standardized with no memory of previous conversations to avoid bias.

View Article and Find Full Text PDF

Universal Wilson Loop Bound of Quantum Geometry.

Phys Rev Lett

August 2025

Princeton University, Department of Physics, Princeton, New Jersey 08544, USA.

We define the absolute Wilson loop winding and prove that it bounds the (integrated) quantum metric from below. This Wilson loop lower bound naturally reproduces the known Chern and Euler bounds of the integrated quantum metric and provides an explicit lower bound of the integrated quantum metric due to the time-reversal protected Z_{2} index, answering a hitherto open question. In general, the Wilson loop lower bound can be applied to any other topological invariants characterized by Wilson loop winding, such as the particle-hole Z_{2} index.

View Article and Find Full Text PDF