Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Introduction: ChatGPT has attracted a lot of interest worldwide for its versatility in a range of natural language tasks, including in the education and evaluation industry. It can automate time- and labor-intensive tasks with clear economic and efficiency gains.

Methods: This study evaluated the potential of ChatGPT to automate psychometric analysis of test questions from the 2020 Portuguese National Residency Selection Exam (PNA). ChatGPT was queried 100 times with the 150 MCQ from the exam. Using ChatGPT's responses, difficulty indices were calculated for each question based on the proportion of correct answers. The predicted difficulty levels were compared to the actual difficulty levels of the 2020 exam MCQ's using methods from classical test theory.

Results: ChatGPT's predicted item difficulty indices positively correlated with the actual item difficulties (r (148) = -0.372, < .001), suggesting a general consistency between the real and the predicted values. There was also a moderate significant negative correlation between the difficulty index predicted by ChatGPT and the number of challenges (r (148) = -0.302, < .001), highlighting ChatGPT's potential for identifying less problematic questions.

Conclusion: These findings unveiled ChatGPT's potential as a tool for assessment development, proving its capability to predict the psychometric characteristics of high-stakes test items in automated item calibration without pre-testing in real-life scenarios.

Download full-text PDF

Source
http://dx.doi.org/10.1080/0142159X.2024.2376205DOI Listing

Publication Analysis

Top Keywords

item calibration
8
difficulty indices
8
difficulty levels
8
chatgpt's potential
8
chatgpt
5
difficulty
5
chatgpt item
4
calibration tool
4
tool psychometric
4
psychometric insights
4

Similar Publications

Measurement appropriateness concerns the question of whether the test or survey scale under consideration can provide a valid measure for a specific individual. An aberrant item response pattern would provide internal counterevidence against using the test/scale for this person, whereas a more typical item response pattern would imply a fit of the measure to the person. Traditional approaches, including the popular Lz person fit statistic, are hampered by their two-stage estimation procedure and the fact that the fit for the person is determined based on the model calibrated on data that include the misfitting persons.

View Article and Find Full Text PDF

Purpose: To adapt a West and Central African version of the widely used ABILHAND-Kids questionnaire for measuring manual ability in children with cerebral palsy (CP).

Materials And Methods: This cross-sectional study included 136 children with CP from Benin ( = 67) and Cameroon ( = 69). Data were collected from parents using an experimental version with 64 items.

View Article and Find Full Text PDF

Background: Dysphagia is a common complication in elderly patients with frailty, affecting their prognosis and quality of life. Constructing a risk prediction model can help with early screening and intervention.

Objective: To investigate the current status of dysphagia in hospitalized elderly patients with frailty, analyze its influencing factors, and construct a risk prediction model for dysphagia in hospitalized elderly patients with frailty.

View Article and Find Full Text PDF

In this study, we explore parameter estimation for a joint count-time data model with a two-factor latent trait structure, representing accuracy and speed. Each count-time variable pair corresponds to a specific item on a measurement instrument, where each item consists of a fixed number of tasks. The count variable represents the number of successfully completed tasks and is modeled using a Beta-binomial distribution to account for potential over-dispersion.

View Article and Find Full Text PDF

Automatic- and Transformer-Based Automatic Item Generation: A Critical Review.

J Intell

August 2025

Department of Psychology, University of Graz, Universitätsplatz 2, 8010 Graz, Austria.

This article provides a critical review of conceptually different approaches to automatic and transformer-based automatic item generation. Based on a discussion of the current challenges that have arisen due to changes in the use of psychometric tests in recent decades, we outline the requirements that these approaches should ideally fulfill. Subsequently, each approach is examined individually to determine the extent to which it can contribute to meeting the challenges.

View Article and Find Full Text PDF