A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 197

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 317
Function: require_once

Comparative Analysis of LLMs' Performance On a Practice Radiography Certification Exam. | LitMetric

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Purpose: To compare the performance of multiple large language models (LLMs) on a practice radiography certification exam.

Method: Using an exploratory, nonexperimental approach, 200 multiple-choice question stems and options (correct answers and distractors) from a practice radiography certification exam were entered into 5 LLMs: ChatGPT (OpenAI), Claude (Anthropic), Copilot (Microsoft), Gemini (Google), and Perplexity (Perplexity AI). Responses were recorded as correct or incorrect, and overall accuracy rates were calculated for each LLM. McNemar tests determined if there were significant differences between accuracy rates. Performance also was evaluated and aggregated by content categories and subcategories.

Results: ChatGPT had the highest overall accuracy of 83.5%, followed by Perplexity (78.9%), Copilot (78.0%), Gemini (75.0%), and Claude (71.0%). ChatGPT had a significantly higher accuracy rate than did Claude (P , .001) and Gemini (P 5 .02). Regarding content categories, ChatGPT was the only LLM to correctly answer all 38 patient care questions. In addition, ChatGPT had the highest number of correct responses in the areas of safety (38/48, 79.2%) and procedures (50/59, 84.7%). Copilot had the highest number of correct responses in the area of image production (43/55, 78.2%). ChatGPT also achieved superior accuracy in 4 of the 8 subcategories.

Discussion: Findings from this study provide valuable insights into the performance of multiple LLMs in answering practice radiography certification exam questions. Although ChatGPT emerged as the most accurate LLM for this practice exam, caution should be exercised when using generative artificial intelligence (AI) models. Because LLMs can generate false and incorrect information, responses must be checked for accuracy, and the models should be corrected when inaccurate responses are given.

Conclusion: Among the 5 LLMs compared in this study, ChatGPT was the most accurate model. As interest in generative AI continues to increase and new language applications become readily available, users should understand the limitations of LLMs and check responses for accuracy. Future research could include additional practice exams in other primary pathways, including magnetic resonance imaging, nuclear medicine technology, radiation therapy, and sonography.

Download full-text PDF

Source

Publication Analysis

Top Keywords

practice radiography
16
radiography certification
16
certification exam
12
performance multiple
8
models llms
8
chatgpt
8
accuracy rates
8
content categories
8
chatgpt highest
8
highest number
8

Similar Publications