Assessing the performance of ChatGPT in medical ethical decision-making: a comparative study with USMLE-based scenarios.

Ali A Khan , Ali R Khan , Saminah Munshi , Hari Dandapani , Mohamed Jimale , Franck M Bogni , Hussain Khawaja

J Med Ethics

Warren Alpert Medical School, Brown University, Providence, Rhode Island, USA.

Published: January 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Introduction: The integration of artificial intelligence (AI) into healthcare introduces innovative possibilities but raises ethical, legal and professional concerns. Assessing the performance of AI in core components of the United States Medical Licensing Examination (USMLE), such as communication skills, ethics, empathy and professionalism, is crucial. This study evaluates how well ChatGPT versions 3.5 and 4.0 handle complex medical scenarios using USMLE-Rx, AMBOSS and UWorld question banks, aiming to understand its ability to navigate patient interactions according to medical ethics and standards.

Methods: We compiled 273 questions from AMBOSS, USMLE-Rx and UWorld, focusing on communication, social sciences, healthcare policy and ethics. GPT-3.5 and GPT-4 were tasked with answering and justifying their choices in new chat sessions to minimise model interference. Responses were compared against question bank rationales and average student performance to evaluate AI effectiveness in medical ethical decision-making.

Results: GPT-3.5 answered 38.9% correctly in AMBOSS, 54.1% in USMLE-Rx and 57.4% in UWorld, with rationale accuracy rates of 83.3%, 90.0% and 87.0%, respectively. GPT-4 answered 75.9% correctly in AMBOSS, 64.9% in USMLE-Rx and 79.6% in UWorld, with rationale accuracy rates of 85.4%, 88.9%, and 98.8%, respectively. Both versions generally scored below average student performance, except GPT-4 in UWorld.

Conclusion: ChatGPT, particularly version 4.0, shows potential in navigating ethical and interpersonal medical scenarios. However, human reasoning currently surpasses AI in average performance. Continued development and training of AI systems can enhance proficiency in these critical healthcare aspects.

Download full-text PDF	Source
http://dx.doi.org/10.1136/jme-2024-110240	DOI Listing

Publication Analysis

Top Keywords

assessing performance

medical ethical

medical scenarios

average student

student performance

correctly amboss

uworld rationale

rationale accuracy

accuracy rates

medical

A PHP Error was encountered