A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 197

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 317
Function: require_once

A multi-model longitudinal assessment of ChatGPT performance on medical residency examinations. | LitMetric

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Introduction: ChatGPT, a generative artificial intelligence, has potential applications in numerous fields, including medical education. This potential can be assessed through its performance on medical exams. Medical residency exams, critical for entering medical specialties, serve as a valuable benchmark.

Materials And Methods: This study aimed to assess the accuracy of ChatGPT-4 and GPT-4o in responding to 1,041 medical residency questions from Brazil, examining overall accuracy and performance across different medical areas, based on evaluations conducted in 2023 and 2024. The questions were classified into higher and lower cognitive levels according to Bloom's taxonomy. Additionally, questions answered incorrectly by both models were tested using the recent GPT models that use chain-of-thought reasoning (e.g., o1-preview, o3, o4-mini-high) with evaluations carried out in both 2024 and 2025.

Results: GPT-4 achieved 81.27% accuracy (95% CI: 78.89-83.64%), while GPT-4o reached 85.88% (95% CI: 83.76-88.00%), significantly outperforming GPT-4 ( < 0.05). Both models showed reduced accuracy on higher-order thinking questions. On questions that both models failed, GPT o1-preview achieved 53.26% accuracy (95% CI: 42.87-63.65%), GPT o3 47.83% (95% CI: 37.42-58.23%) and o4-mini-high 35.87% (95% CI: 25.88-45.86%), with all three models performing better on higher-order questions.

Conclusion: Artificial intelligence could be a beneficial tool in medical education, enhancing residency exam preparation, helping to understand complex topics, and improving teaching strategies. However, careful use of artificial intelligence is essential due to ethical concerns and potential limitations in both educational and clinical practice.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12411524PMC
http://dx.doi.org/10.3389/frai.2025.1614874DOI Listing

Publication Analysis

Top Keywords

performance medical
12
medical residency
12
medical
7
multi-model longitudinal
4
longitudinal assessment
4
assessment chatgpt
4
chatgpt performance
4
residency examinations
4
examinations introduction
4
introduction chatgpt
4

Similar Publications