AI-generated multiple-choice questions in health science education: Stakeholder perspectives and implementation considerations.

Curr Res Physiol

University of Toronto, Department of Physiology, Temerty Faculty of Medicine, Medical Sciences Building 3rd Floor, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada.

Published: August 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Multiple-choice questions (MCQs) are widely used in health science education because they are an efficient way to evaluate knowledge from simple recall to complex clinical reasoning. The creation of high-quality MCQs, however, can be time-consuming and requires expertise in question composition. Advancements in artificial intelligence (AI), especially large language models (LLMs), offer the potential to allow for the rapid generation of high-quality, consistent, and course-specific MCQs. Here we discuss the potential benefits and drawbacks of the use of this technology in the generation of MCQs, including ensuring the accuracy and fairness of questions, along with technical, ethical, and privacy considerations. We offer practical guiding principles for the implementation of AI-generated MCQs and outline future research areas related to their impact on student learning and educational quality.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12340502PMC
http://dx.doi.org/10.1016/j.crphys.2025.100160DOI Listing

Publication Analysis

Top Keywords

multiple-choice questions
8
health science
8
science education
8
mcqs
5
ai-generated multiple-choice
4
questions health
4
education stakeholder
4
stakeholder perspectives
4
perspectives implementation
4
implementation considerations
4

Similar Publications

Individual and group reflection in lecture-based large groups lead to comparable learning success.

Wien Klin Wochenschr

September 2025

Medizinische Klinik und Poliklinik IV, LMU-Klinikum München, München, Germany.

Objective: The study aims to elucidate a possible effect of individual reflection (IR) or group reflection (GR) on short-term and long-term memory retention in a large group lecture-based environment.

Methods: In this quasi-experimental study 656 medical students were enrolled to compare the impact of IR and GR directly after the lectures and 2 months later. Students were divided into two groups and given two different lectures using IR or GR in a cross-over fashion.

View Article and Find Full Text PDF

Background: Recent studies suggest that large language models (LLMs) such as ChatGPT are useful tools for medical students or residents when preparing for examinations. These studies, especially those conducted with multiple-choice questions, emphasize that the level of knowledge and response consistency of the LLMs are generally acceptable; however, further optimization is needed in areas such as case discussion, interpretation, and language proficiency. Therefore, this study aimed to evaluate the performance of six distinct LLMs for Turkish and English neurosurgery multiple-choice questions and assess their accuracy and consistency in a specialized medical context.

View Article and Find Full Text PDF

Time Allotted for Examination Item Types in Nursing Education.

J Nurs Educ

September 2025

Wolters Kluwer Health, New York, New York; and.

Background: Examinations are used widely in nursing education to evaluate knowledge attainment. New item types were initiated in April 2023 by the National Council of State Boards of Nursing (NCSBN) for use on the Next Generation National Council Licensure Examination for Registered Nurses (NGN NCLEX-RN). Little evidence exists for how much time is needed for exams that use the new item types.

View Article and Find Full Text PDF

Previous research found that occupational therapy practitioners desired more training in assistive technology. This study provides further evidence on which assistive technology categories should be included in the education of occupational therapists in the United States, based on the practice setting. Participants were recruited through snowball sampling and were included if they were certified occupational therapists practicing in the United States.

View Article and Find Full Text PDF

Outcomes were to compare the accuracy of 2 large-language models-GPT-4o and o3-Mini-against medical-student performance on otolaryngology-focused, USMLE-style multiple-choice questions. With permission from AMBOSS, we extracted 146 Step 2 CK questions tagged "Otolaryngology" and stratified them by AMBOSS difficulty (levels 1-5). Each item was presented verbatim to GPT-4o and o3-Mini through their official APIs; outputs were scored correct/incorrect.

View Article and Find Full Text PDF