Reliability of Large Language Model-Based Chatbots Versus Clinicians as Sources of Information on Orthodontics: A Comparative Analysis.

Stefano Martina , Davide Cannatà , Teresa Paduano , Valentina Schettino , Francesco Giordano , Marzio Galdi

Dent J (Basel)

Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana", University of Salerno, Via Allende, 84081 Baronissi, Italy.

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

: The present cross-sectional analysis aimed to investigate whether Large Language Model-based chatbots can be used as reliable sources of information in orthodontics by evaluating chatbot responses and comparing them to those of dental practitioners with different levels of knowledge. : Eight true and false frequently asked orthodontic questions were submitted to five leading chatbots (ChatGPT-4, Claude-3-Opus, Gemini 2.0 Flash Experimental, Microsoft Copilot, and DeepSeek). The consistency of the answers given by chatbots at four different times was assessed using Cronbach's α. Chi-squared test was used to compare chatbot responses with those given by two groups of clinicians, i.e., general dental practitioners (GDPs) and orthodontic specialists (Os) recruited in an online survey via social media, and differences were considered significant when < 0.05. Additionally, chatbots were asked to provide a justification for their dichotomous responses using a chain-of-through prompting approach and rating the educational value according to the Global Quality Scale (GQS). : A high degree of consistency in answering was found for all analyzed chatbots (α > 0.80). When comparing chatbot answers with GDP and O ones, statistically significant differences were found for almost all the questions ( < 0.05). When evaluating the educational value of chatbot responses, DeepSeek achieved the highest GQS score (median 4.00; interquartile range 0.00), whereas CoPilot had the lowest one (median 2.00; interquartile range 2.00). : Although chatbots yield somewhat useful information about orthodontics, they can provide misleading information when dealing with controversial topics.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12385111	PMC
http://dx.doi.org/10.3390/dj13080343	DOI Listing

Publication Analysis

Top Keywords

chatbot responses

large language

language model-based

model-based chatbots

sources orthodontics

dental practitioners

interquartile range

chatbots

reliability large

chatbots versus

Similar Publications

A Study of Academics' Perceptions of Ethical Implications of Generative Artificial Intelligence on Scientific Research and Publishing.

J Empir Res Hum Res Ethics

September 2025

TOBB ETU School of Medicine, History of Medicine and Ethics Department, Ankara, Turkey.

Perihan Elif Ekmekci , Banu Buruk , Başak Akar , Nazife Yasemin Ardıçoğlu Akışın

This study investigates how scientists, educators, and ethics committee members in Türkiye perceive the opportunities and risks posed by generative AI and the ethical implications for science and education. This study uses a 22-question survey developed by the EOSC-Future and RDA AIDV Working Group. The responses were gathered from 62 universities across 208 universities in Türkiye, with a completion rate of 98.

View Article and Find Full Text PDF

Similar Publications

Ethical and Legal Governance of Generative AI in Chinese Healthcare.

J Multidiscip Healthc

September 2025

School of Law, Xi'an Jiaotong University, Xi'an, Shaanxi Province, People's Republic of China.

Jinrun Jia , Shiqiao Zhao

The application of generative artificial intelligence (AI) technology in the healthcare sector can significantly enhance the efficiency of China's healthcare services. However, risks persist in terms of accuracy, transparency, data privacy, ethics, and bias. These risks are manifested in three key areas: first, the potential erosion of human agency; second, issues of fairness and justice; and third, questions of liability and responsibility.

View Article and Find Full Text PDF

Similar Publications

Comparing physician and artificial intelligence chatbot responses to posthysterectomy questions posted to a public social media forum.

AJOG Glob Rep

August 2025

Department of Obstetrics, Gynecology & Women's Health, University of Hawaii, Honolulu, HI (Kho).

Shadae K Beale , Natalie Cohen , Beatrice Secheli , Donald McIntire , Kimberly A Kho

Background: Within public online forums, patients often seek reassurance and guidance from the community regarding postoperative symptoms and expectations, and when to seek medical assistance. Others are using artificial intelligence in the form of online search engines or chatbots such as ChatGPT or Perplexity. Artificial intelligence chatbot assistants have been growing in popularity; however, clinicians may be hesitant to use them because of concerns about accuracy.

View Article and Find Full Text PDF

Similar Publications

Assessing the diagnostic and treatment accuracy of Large Language Models (LLMs) in Peri-Implant Diseases: a clinical experimental study.

J Dent

September 2025

Dental Clinic Post-Graduate Program, University Center of State of Pará, Belém, Pará, Brazil. Electronic address:

Igor Amador Barbosa , Mauro Sergio Almeida Alves , Paloma Rayse Zagalo de Almeida , Patricia de Almeida Rodrigues , Roberta Pimentel de Oliveira

Objective: This study evaluated the coherence, consistency, and diagnostic accuracy of eight AI-based chatbots in clinical scenarios related to dental implants.

Methods: A double-blind, clinical experimental study was carried out between February and March 2025, to evaluate eight AI-based chatbots using six fictional cases simulating peri-implant mucositis and peri-implantitis. Each chatbot answered five standardized clinical questions across three independent runs per case, generating 720 binary outputs.

View Article and Find Full Text PDF

Similar Publications

Evaluation of ophthalmic large language models: quantitative vs. qualitative methods.

Curr Opin Ophthalmol

September 2025

Singapore National Eye Centre, Singapore Eye Research Institute, Singapore, Singapore.

Ting Fang Tan , Arun J Thirunavukarasu , Chrystie Quek , Daniel S W Ting

Purpose Of Review: Alongside the development of large language models (LLMs) and generative artificial intelligence (AI) applications across a diverse range of clinical applications in Ophthalmology, this review highlights the importance of evaluation of LLM applications by discussing evaluation metrics commonly adopted.

Recent Findings: Generative AI applications have demonstrated encouraging performance in clinical applications of Ophthalmology. Beyond accuracy, evaluation in the form of quantitative and qualitative metrics facilitate a more nuanced assessment of LLM output responses.

View Article and Find Full Text PDF

Similar Publications