Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Increasing interest in applying large language models (LLMs) to medicine is due in part to their impressive performance on medical exam questions. However, these exams do not capture the complexity of real patient-doctor interactions because of factors like patient compliance, experience, and cognitive bias. We hypothesized that LLMs would produce less accurate responses when faced with clinically biased questions as compared to unbiased ones. To test this, we developed the BiasMedQA dataset, which consists of 1273 USMLE questions modified to replicate common clinically relevant cognitive biases. We assessed six LLMs on BiasMedQA and found that GPT-4 stood out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which showed large drops in performance. Additionally, we introduced three bias mitigation strategies, which improved but did not fully restore accuracy. Our findings highlight the need to improve LLMs' robustness to cognitive biases, in order to achieve more reliable applications of LLMs in healthcare.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494053PMC
http://dx.doi.org/10.1038/s41746-024-01283-6DOI Listing

Publication Analysis

Top Keywords

cognitive biases
12
language models
8
evaluation mitigation
4
cognitive
4
mitigation cognitive
4
biases medical
4
medical language
4
models increasing
4
increasing interest
4
interest applying
4

Similar Publications

GluN2A-NMDA receptor inhibition disinhibits the prefrontal cortex, reduces forced swim immobility, and impairs sensorimotor gating.

Acta Pharmacol Sin

September 2025

Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangdong-Hong Kong Joint Laboratory for Psychiatric Disorders, Guangdong Province Key Laboratory of Psychiatric Disorders, Guangdong Bas

Recent investigations into the rapid antidepressant effects of ketamine, along with studies on schizophrenia-related susceptibility genes, have highlighted the GluN2A subunit as a critical regulator of both emotion and cognition. However, the specific impacts of acute pharmacological inhibition of GluN2A-containing NMDA receptors on brain microcircuits and the subsequent behavioral consequences remain poorly understood. In this study, we first examined the effects of MPX-004, a selective GluN2A NMDA receptor inhibitor, on behavior within the dorsomedial prefrontal cortex (dmPFC).

View Article and Find Full Text PDF

ISO 21043 is a new international standard for forensic science. It provides requirements and recommendations designed to ensure the quality of the forensic process. It includes Parts on: 1 vocabulary; 2 recovery, transport, and storage of items; 3 analysis; 4 interpretation; and 5 reporting.

View Article and Find Full Text PDF

Selective Attention and Eccentricity: A Comprehensive Review.

Neurosci Biobehav Rev

September 2025

Department of Experimental and Applied Psychology, Institute for Brain and Behaviour, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam. Electronic address:

Human vision deals with two major limitations. First, vision is strongly foveated and deteriorates with eccentricity. Second, visual attention selectively prioritizes some stimuli over others.

View Article and Find Full Text PDF

The current study aimed to evaluate specific mechanisms of interventions to improve loneliness among older adults. EMBASE, MEDLINE, and PsycINFO databases were searched for articles published through June 2024. We selected randomized controlled trials (RCTs) that sought to improve loneliness in older adults, were published in English, and used previously published measures to assess loneliness.

View Article and Find Full Text PDF

The impact of obesity on human capital accumulation: Exploring the driving factors.

Econ Hum Biol

September 2025

Loyola BehLab, Universidad Loyola, Spain; Banco de España, Spain. Electronic address:

This paper examines the impact of childhood obesity on Spanish high school students' academic achievement and human capital accumulation. To address potential endogeneity concerns, we exploit exogenous variation in obesity within peer groups, using data from friendship networks. Specifically, we instrument individual obesity with the average body mass index of intransitive friendship triads.

View Article and Find Full Text PDF