A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 197

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 317
Function: require_once

How Artificial Intelligence Differs From Humans in Peer Review. | LitMetric

How Artificial Intelligence Differs From Humans in Peer Review.

J Oral Maxillofac Surg

Senior Faculty, Department of Oral and Maxillofacial Surgery, Goldschleger School of Dentistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel; Attending Surgeon, Oral and Maxillofacial Surgery Unit, Samson Assuta University Hospital, Ashdod, Israel.

Published: August 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: The peer review process faces challenges of reviewer fatigue and bias. Artificial intelligence (AI) may help address these issues, but its application in the oral and maxillofacial surgery peer review process remains unexplored.

Purpose: The purpose of the study was to measure and compare manuscript review performance among 4 large language models and human reviewers. large language models are AI systems trained on vast text datasets that can generate human-like responses.

Study Design/setting/sample: In this cross-sectional study, we evaluated original research articles submitted to the Journal of Oral and Maxillofacial Surgery between January and December 2023. Manuscripts were randomly selected from all submissions that received at least one external peer review.

Predictor Variable: The predictor variable was source of review: human reviewers or AI models. We tested 4 AI models: Generative Pretrained Transformer-4o and Generative Pretrained Transformer-o1 (OpenAI, San Francisco, CA), Claude (version 3.5; Anthropic, San Francisco, CA), and Gemini (version 1.5; Google, Mountain View, CA). These models will be referred to by their architectural design characteristics, ie, dense transformers, sparse-expert, multimodal, and base transformer, to highlight their technical differences rather than their commercial identities.

Outcome Variables: Primary outcomes included reviewer recommendations (accept = 3 to reject = 0) and responses to 6 Journal of Oral and Maxillofacial Surgery editor questions. Secondary outcomes comprised temporal stability (consistency of AI evaluations over time) analysis, domain-specific assessments (methodology, statistical analysis, clinical relevance, originality, and presentation clarity; 1 to 5 scale), and model clustering patterns.

Analyses: Agreement between AI and human recommendations was assessed using weighted Cohen's kappa. Intermodel reliability and temporal stability (24-hour interval) were evaluated using intraclass correlation coefficients. Domain scoring patterns were analyzed using multivariate analysis of variance with post hoc comparisons and hierarchical clustering.

Results: From 22 manuscripts, human reviewers rejected 15 (68.2%), while AI rejection rates were statistically significantly lower (0 to 9.1%, P < .001). AI models demonstrated high consistency in their evaluations over time (intraclass correlation coefficient = 0.88, P < .001) and showed moderate agreement with human decisions (κ = 0.38 to 0.46).

Conclusions: While AI models showed reliable internal consistency, they were less likely to recommend rejection than human reviewers. This suggests their optimal use is as screening tools complementing expert human review rather than as replacements.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.joms.2025.03.015DOI Listing

Publication Analysis

Top Keywords

human reviewers
16
peer review
12
oral maxillofacial
12
maxillofacial surgery
12
artificial intelligence
8
review process
8
large language
8
language models
8
journal oral
8
generative pretrained
8

Similar Publications