Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

ChatGPT has garnered attention as a multifaceted AI chatbot with potential applications in medicine. Despite intriguing preliminary findings in areas such as clinical management and patient education, there remains a substantial knowledge gap in comprehensively understanding the chances and limitations of ChatGPT's capabilities, especially in medical test-taking and education. A total of n = 2,729 USMLE Step 1 practice questions were extracted from the Amboss question bank. After excluding 352 image-based questions, a total of 2,377 text-based questions were further categorized and entered manually into ChatGPT, and its responses were recorded. ChatGPT's overall performance was analyzed based on question difficulty, category, and content with regards to specific signal words and phrases. ChatGPT achieved an overall accuracy rate of 55.8% in a total number of n = 2,377 USMLE Step 1 preparation questions obtained from the Amboss online question bank. It demonstrated a significant inverse correlation between question difficulty and performance with r = -0.306; p < 0.001, maintaining comparable accuracy to the human user peer group across different levels of question difficulty. Notably, ChatGPT outperformed in serology-related questions (61.1% vs. 53.8%; p = 0.005) but struggled with ECG-related content (42.9% vs. 55.6%; p = 0.021). ChatGPT achieved statistically significant worse performances in pathophysiology-related question stems. (Signal phrase = "what is the most likely/probable cause"). ChatGPT performed consistent across various question categories and difficulty levels. These findings emphasize the need for further investigations to explore the potential and limitations of ChatGPT in medical examination and education.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169536PMC
http://dx.doi.org/10.1038/s41598-024-63997-7DOI Listing

Publication Analysis

Top Keywords

usmle step
12
question bank
8
question difficulty
8
question
5
questions
5
in-depth analysis
4
analysis chatgpt's
4
chatgpt's performance
4
performance based
4
based specific
4

Similar Publications

Outcomes were to compare the accuracy of 2 large-language models-GPT-4o and o3-Mini-against medical-student performance on otolaryngology-focused, USMLE-style multiple-choice questions. With permission from AMBOSS, we extracted 146 Step 2 CK questions tagged "Otolaryngology" and stratified them by AMBOSS difficulty (levels 1-5). Each item was presented verbatim to GPT-4o and o3-Mini through their official APIs; outputs were scored correct/incorrect.

View Article and Find Full Text PDF

Purpose: This study examined the impact of exam sequence and timing on the performance of osteopathic medical students on the COMLEX-USA Level 1 and Level 2 and USMLE Step 1 and Step 2 examinations.

Methods: Two cohorts were analyzed: 364 osteopathic medical students who completed both COMLEX-USA Level 1 and USMLE Step 1 between 2020 and 2022 (prior to the implementation of pass/fail grading), and 734 osteopathic medical students who completed both COMLEX-USA Level 2 and USMLE Step 2 between 2021 and 2025. Student performance was evaluated based on the sequence of examinations and intervals between them.

View Article and Find Full Text PDF

Introduction: Securing a residency position in the United States remains a significant challenge for International Medical Graduates (IMGs), particularly those from African countries. Although African IMGs contribute to approximately 25% of the U.S.

View Article and Find Full Text PDF

Entrance to neurological surgery residency is highly competitive due to the large number of applicants vying for a limited number of spots. The process has become even more competitive in recent years, with a significant increase in applicants but a consistent number of available residency positions. Program director (PD) surveys offer valuable insights into the selection process and expectations for neurosurgical residency, guiding prospective candidates to navigate the challenging training path.

View Article and Find Full Text PDF

The performance of ChatGPT on medical image-based assessments and implications for medical education.

BMC Med Educ

August 2025

Department of Neurosurgery, West China Hospital, Sichuan University, No. 37 Guo Xue Xiang Alley, Wu Hou Distract, Chengdu, Sichuan Province, 610037, China.

Background: Generative artificial intelligence (AI) tools like ChatGPT (OpenAI) have garnered significant attention for their potential in fields such as medical education; however, their performance of large language and vision models on medical test items involving images remains underexplored, limiting their broader educational utility. This study aims to evaluate the performance of GPT-4 and GPT-4 Omni (GPT-4o), accessed via the ChatGPT platform, on image-based United States Medical Licensing Examination (USMLE) sample items, to explore their implications for medical education.

Methods: We identified all image-based questions from the USMLE Step 1 and Step 2 Clinical Knowledge sample item sets.

View Article and Find Full Text PDF