Concordance between humans and GPT-4 in appraising the methodological quality of case reports and case series using the Murad tool.

Zin Tarakji , Adel Kanaan , Samer Saadi , Mohammed Firwana , Adel Kabbara Allababidi , Mohamed F Abusalih , Rami Basmaci , Tamim I Rajjo , Zhen Wang , M Hassan Murad , Bashar Hasan

BMC Med Res Methodol

Evidence-based Practice Center, Kern Center for the Science of Healthcare Delivery, Mayo Clinic, 200 1st Street SW, Rochester, MN, 55905, USA.

Published: November 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Assessing the methodological quality of case reports and case series is challenging due to human judgment variability and time constraints. We evaluated the agreement in judgments between human reviewers and GPT-4 when applying a standard methodological quality assessment tool designed for case reports and series.

Methods: We searched Scopus for systematic reviews published in 2023-2024 that cited the appraisal tool by Murad et al. A GPT-4 based agent was developed to assess the methodological quality using the 8 signaling questions of the tool. Observed agreement and agreement coefficient were estimated comparing published judgments of human reviewers to GPT-4 assessment.

Results: We included 797 case reports and series. The observed agreement ranged between 41.91% and 80.93% across the eight questions (agreement coefficient ranged from 25.39 to 79.72%). The lowest agreement was noted in the first signaling question about selection of cases. The agreement was similar in articles published in journals with impact factor < 5 vs. ≥ 5, and when excluding systematic reviews that did not use 3 causality questions. Repeating the analysis using the same prompts demonstrated high agreement between the two GPT-4 attempts except for the first question about selection of cases.

Conclusions: The study demonstrates a moderate agreement between GPT-4 and human reviewers in assessing the methodological quality of case series and reports using the Murad tool. The current performance of GPT-4 seems promising but unlikely to be sufficient for the rigor of a systematic review and pairing the model with a human reviewer is required.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11533388	PMC
http://dx.doi.org/10.1186/s12874-024-02372-6	DOI Listing

Publication Analysis

Top Keywords

methodological quality

case reports

quality case

reports case

case series

judgments human

human reviewers

reviewers gpt-4

observed agreement

agreement coefficient

Similar Publications

Swelling Management in Total Knee Arthroplasty: A Systematic Review.

JBJS Rev

September 2025

Joondalup Health Campus, Joondalup, Australia.

Luke McGarry , Jack Kearney , Jessica Rotaru , Rajitha Gunaratne

Background: Postoperative swelling is a common complication after total knee arthroplasty (TKA), associated with pain, limited mobility, and delayed recovery. This study aimed to systematically review the literature on interventions that reduce postoperative swelling, categorized into preoperative, intraoperative, and postoperative phases.

Methods: A Preferred Reporting Items for Systematic Reviews and Meta-Analyses-compliant search of PubMed, Medline, Embase, and Cochrane databases was performed for clinical studies evaluating interventions to reduce swelling after primary TKA.

View Article and Find Full Text PDF

Similar Publications

Effects of exercise on fatigue and quality of life in multiple sclerosis: a network meta-analysis and systematic review.

J Neurol

September 2025

College of Physical Education, China West Normal University, Nanchong, China.

Jiangxi Yang , Huangyan Li , Yeting Zhang , Shiliang Hu , Zuoyin Yu

Objective: This study aimed to evaluate the effects of various physical therapy interventions on fatigue and quality of life in patients with multiple sclerosis (MS) using a network meta-analysis of randomized controlled trials (RCTs).

Methods: A comprehensive literature search was conducted in PubMed, Web of Science, and Cochrane databases through April 1, 2025. Eligible RCTs compared different exercise interventions in MS patients, focusing on fatigue and quality of life outcomes.

View Article and Find Full Text PDF

Similar Publications

Virtual reality simulation training for health professions trainees in gastrointestinal endoscopy.

Cochrane Database Syst Rev

September 2025

Division of Gastroenterology, Hepatology, and Nutrition, SickKids Research Institute and SickKids Learning Institute, The Hospital for Sick Children, Toronto, Ontario, Canada.

Nasruddin Sabrie , Rishad Khan , Joanne Plahouras , Bradley C Johnston , Michael A Scaffidi

Background: Training in endoscopy has traditionally been based upon an apprenticeship model, where novices develop their skills on real patients under the supervision of experienced endoscopists. In an effort to prioritise patient safety, simulation training has emerged as a means to allow novices to practice in a risk-free environment. This is the second update of the review, which was first published in 2012 and updated in 2018.

View Article and Find Full Text PDF

Similar Publications

The Impact of Mini-Screws and Micro-Implants on Orthodontic Clinical Outcomes: An Umbrella Meta-Analysis.

Clin Exp Dent Res

October 2025

Drug Applied Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.

Abdolreza Jamilian , Helen Jamloo , Kurosh Majidi , Meysam Zarezadeh

Objectives: This umbrella meta-analysis aimed to answer the clinical question: Do mini-screws and micro-implants improve specific orthodontic outcomes such as intermolar width, interpremolar width, suture expansion, molar movement, and skeletal width compared to conventional anchorage methods?

Materials And Methods: A systematic search was performed in PubMed, Scopus, ISI Web of Science, and Google Scholar up to October 2024. Systematic reviews and meta-analyses on mini-screws and micro-implants in orthodontic treatment were included. Methodological quality was assessed using AMSTAR 2, and a random-effects model was used to calculate effect sizes (ESs) and 95% confidence intervals (CIs).

View Article and Find Full Text PDF

Similar Publications

Effectiveness of herbal medicine as an add-on to antipsychotics in patients with schizophrenia spectrum disorders accompanied by depression: A systematic review and meta-analysis.

Integr Med Res

March 2026

KM Science Research Division, Korea Institute of Oriental Medicine, South Korea.

Chan-Young Kwon , Kyoung-Eun Lee , Min-Jae Kim , Ji-Won Kim , Ji-Won Oh

Background: Depression is a common comorbidity of schizophrenia spectrum disorder (SSDs) that affects functional outcomes and quality of life. This systematic review and meta-analysis evaluated the effectiveness of herbal medicine as an adjunct therapy to antipsychotics in patients with SSDs and comorbid depression.

Methods: Eight databases were searched from inception to January 2025 for randomized controlled trials (RCTs) evaluating herbal medicine combined with antipsychotics vs antipsychotics alone in patients with SSDs and comorbid depression.

View Article and Find Full Text PDF

Similar Publications