Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries.

Noha Taymour , Shaimaa M Fouda , Hams H Abdelrahaman , Mohamed G Hassan

J Prosthet Dent

Postdoctoral Research Associate, Division of Bone and Mineral Diseases, Department of Internal Medicine, School of Medicine, Washington University in St. Louis, St. Louis, MO; and Lecturer, Department of Orthodontics, Faculty of Dentistry, Assiut University, Assiut, Egypt.

Published: January 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Statement Of Problem: Artificial intelligence (AI) chatbots have been proposed as promising resources for oral health information. However, the quality and readability of existing online health-related information is often inconsistent and challenging.

Purpose: This study aimed to compare the reliability and usefulness of dental implantology-related information provided by the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models (LLMs).

Material And Methods: A total of 75 questions were developed covering various dental implant domains. These questions were then presented to 3 different LLMs: ChatGPT-3.5, ChatGPT-4, and Google Gemini. The responses generated were recorded and independently assessed by 2 specialists who were blinded to the source of the responses. The evaluation focused on the accuracy of the generated answers using a modified 5-point Likert scale to measure the reliability and usefulness of the information provided. Additionally, the ability of the AI-chatbots to offer definitive responses to closed questions, provide reference citation, and advise scheduling consultations with a dental specialist was also analyzed. The Friedman, Mann Whitney U and Spearman Correlation tests were used for data analysis (α=.05).

Results: Google Gemini exhibited higher reliability and usefulness scores compared with ChatGPT-3.5 and ChatGPT-4 (P<.001). Google Gemini also demonstrated superior proficiency in identifying closed questions (25 questions, 41%) and recommended specialist consultations for 74 questions (98.7%), significantly outperforming ChatGPT-4 (30 questions, 40.0%) and ChatGPT-3.5 (28 questions, 37.3%) (P<.001). A positive correlation was found between reliability and usefulness scores, with Google Gemini showing the strongest correlation (ρ=.702).

Conclusions: The 3 AI Chatbots showed acceptable levels of reliability and usefulness in addressing dental implant-related queries. Google Gemini distinguished itself by providing responses consistent with specialist consultations.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.prosdent.2024.12.016	DOI Listing

Publication Analysis

Top Keywords

chatgpt-35 chatgpt-4

google gemini

chatgpt-4 google

gemini large

large language

language models

performance chatgpt-35

chatgpt-4

google

gemini

A PHP Error was encountered