A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 197

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 317
Function: require_once

Using Pretrained Large Language Models for AI-Driven Assessment in Medical Education. | LitMetric

Using Pretrained Large Language Models for AI-Driven Assessment in Medical Education.

Acad Med

R. Cole is associate professor, Department of Military and Emergency Medicine and Department of Health Professions Education, Uniformed Services University of the Health Sciences, Bethesda, Maryland.

Published: August 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

ProblemAssessing students in competency-based medical education can be time-consuming and demanding for faculty, especially with large classes and complex topics. Traditional methods can lead to inconsistencies and a lack of targeted feedback. Innovative and accessible solutions to improve the efficiency, objectivity, and effectiveness of assessment in medical education are needed.ApproachFrom September 2024-February 2025, the authors piloted the use of large language models (LLMs) with retrieval-augmented generation to assess students' understanding of moral injury. The authors selected and uploaded 6 seminal articles on moral injury within military and veteran [patient?] populations to Google Gemini 1.5 Pro. They tasked the same LLM with creating a grading rubric based on these articles to assess 165 student responses in a military medical ethics course (Uniformed Services University of the Health Sciences). The authors uploaded both the generated rubric and the student responses to each of 3 LLMs (Google Gemini 1.5 Pro, Google Gemini 2.0 Flash, and OpenAI ChatGPT-4o) with a prompt to generate scores for the student responses.OutcomesIn the authors' expert opinion, an LLM (Google Gemini 1.5 Pro) successfully generated a grading rubric that captured the nuances of moral injury and its implications for military medical practice. The LLMs' scoring accuracy was compared against 2 experienced educators to generate validity evidence. The best performing model, OpenAI ChatGPT-4o, demonstrated an interrater reliability of 0.77 and 0.68 for Reviewers 1 and 2, respectively, indicating a higher level of agreement between the LLM and both individual reviewers than between the 2 reviewers (0.57).Next StepsWhile this approach shows promise, faculty oversight is necessary to ensure ethical accountability and address potential biases. Further research is needed to optimize the integration of AI and human capabilities in assessment to ultimately enhance the quality of health care professional education and improve patient outcomes.

Download full-text PDF

Source
http://dx.doi.org/10.1097/ACM.0000000000006207DOI Listing

Publication Analysis

Top Keywords

google gemini
16
medical education
12
moral injury
12
gemini pro
12
large language
8
language models
8
assessment medical
8
grading rubric
8
student responses
8
military medical
8

Similar Publications