98%
921
2 minutes
20
Increasing interest in applying large language models (LLMs) to medicine is due in part to their impressive performance on medical exam questions. However, these exams do not capture the complexity of real patient-doctor interactions because of factors like patient compliance, experience, and cognitive bias. We hypothesized that LLMs would produce less accurate responses when faced with clinically biased questions as compared to unbiased ones. To test this, we developed the BiasMedQA dataset, which consists of 1273 USMLE questions modified to replicate common clinically relevant cognitive biases. We assessed six LLMs on BiasMedQA and found that GPT-4 stood out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which showed large drops in performance. Additionally, we introduced three bias mitigation strategies, which improved but did not fully restore accuracy. Our findings highlight the need to improve LLMs' robustness to cognitive biases, in order to achieve more reliable applications of LLMs in healthcare.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494053 | PMC |
http://dx.doi.org/10.1038/s41746-024-01283-6 | DOI Listing |
Acta Pharmacol Sin
September 2025
Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangdong-Hong Kong Joint Laboratory for Psychiatric Disorders, Guangdong Province Key Laboratory of Psychiatric Disorders, Guangdong Bas
Recent investigations into the rapid antidepressant effects of ketamine, along with studies on schizophrenia-related susceptibility genes, have highlighted the GluN2A subunit as a critical regulator of both emotion and cognition. However, the specific impacts of acute pharmacological inhibition of GluN2A-containing NMDA receptors on brain microcircuits and the subsequent behavioral consequences remain poorly understood. In this study, we first examined the effects of MPX-004, a selective GluN2A NMDA receptor inhibitor, on behavior within the dorsomedial prefrontal cortex (dmPFC).
View Article and Find Full Text PDFSci Justice
September 2025
Department of Analytical, Environmental and Forensic Sciences, King's College London, London, UK.
ISO 21043 is a new international standard for forensic science. It provides requirements and recommendations designed to ensure the quality of the forensic process. It includes Parts on: 1 vocabulary; 2 recovery, transport, and storage of items; 3 analysis; 4 interpretation; and 5 reporting.
View Article and Find Full Text PDFNeurosci Biobehav Rev
September 2025
Department of Experimental and Applied Psychology, Institute for Brain and Behaviour, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam. Electronic address:
Human vision deals with two major limitations. First, vision is strongly foveated and deteriorates with eccentricity. Second, visual attention selectively prioritizes some stimuli over others.
View Article and Find Full Text PDFAnn N Y Acad Sci
September 2025
Department of Psychiatry, University of California San Diego, La Jolla, California, USA.
The current study aimed to evaluate specific mechanisms of interventions to improve loneliness among older adults. EMBASE, MEDLINE, and PsycINFO databases were searched for articles published through June 2024. We selected randomized controlled trials (RCTs) that sought to improve loneliness in older adults, were published in English, and used previously published measures to assess loneliness.
View Article and Find Full Text PDFEcon Hum Biol
September 2025
Loyola BehLab, Universidad Loyola, Spain; Banco de España, Spain. Electronic address:
This paper examines the impact of childhood obesity on Spanish high school students' academic achievement and human capital accumulation. To address potential endogeneity concerns, we exploit exogenous variation in obesity within peer groups, using data from friendship networks. Specifically, we instrument individual obesity with the average body mass index of intransitive friendship triads.
View Article and Find Full Text PDF