Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Mandibular angle osteotomy (MAO) is one of the most effective ways to correct square facial contours. With the development of Artificial Intelligence (AI) technology, particularly in medicine, more patients are seeking medical queries from online websites. This study compared the performance of 2 AI platforms, ChatGPT-4o and DeepSeek in answering questions about MAO.

Methods: Twenty frequently asked questions about MAO were selected and answered by ChatGPT-4o and DeepSeek. The responses from 2 platforms were graded by 9 experienced craniomaxillofacial plastic surgeons from 2 different hospitals. The relevance, accuracy, completeness, and readability of responses were evaluated. The 20 questions were divided into 4 categories: general conception, surgery process, complication, and other topics. Statistical analysis, including the 2-sided t test and Kruskal-Wallis test was applied to compare metrics.

Results: Both ChatGPT-4o and DeepSeek provided high-quality information about MAO. However, ChatGPT-4o outperformed in giving more thorough answers (4.4945±0.03089 vs. 4.4315±0.02519, P=0.048), and DeepSeek outperformed in giving answers more easily to read (4.2960±0.04717 vs. 4.1965±0.03986, P=0.026). Also, although ChatGPT performed well in answering all kinds of questions, DeepSeek had weak performance in answering questions regarding surgery process of MAO.

Conclusions: Both platforms offered reliable information. Compared to DeepSeek, ChatGPT-4o provided more thorough responses and was more aligned with clinical practice. This study discovered the potential of AI platforms in addressing patient education and providing medical information in craniomaxillofacial plastic surgery field.

Download full-text PDF

Source
http://dx.doi.org/10.1097/SCS.0000000000011698DOI Listing

Publication Analysis

Top Keywords

chatgpt-4o deepseek
16
deepseek responses
8
mandibular angle
8
angle osteotomy
8
answering questions
8
craniomaxillofacial plastic
8
surgery process
8
deepseek
7
chatgpt-4o
6
questions
6

Similar Publications

Background: Patients with T1 colorectal cancer (CRC) often show poor adherence to guideline-recommended treatment strategies after endoscopic resection. To address this challenge and improve clinical decision-making, this study aims to compare the accuracy of surgical management recommendations between large language models (LLMs) and clinicians.

Methods: This retrospective study enrolled 202 patients with T1 CRC who underwent endoscopic resection at three hospitals.

View Article and Find Full Text PDF

Purpose: The rise of artificial intelligence (AI) based large language models (LLMs) had a profound impact on medical education. Given the widespread use of multiple-choice questions (MCQs) in anatomy education, it is likely that such queries are commonly directed to AI tools. The current study compared the accuracy level of different AI platforms for solving MCQs from various subtopics in Anatomy.

View Article and Find Full Text PDF

PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, "Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery.

View Article and Find Full Text PDF

Diagnostic performance of newly developed large language models in critical illness cases: A comparative study.

Int J Med Inform

December 2025

Department of Intensive Care Medicine, Affiliated Hospital of Southwest Jiaotong University, The Third People's Hospital of Chengdu, Chengdu, Sichuan, China.

Background: Large language models (LLMs) are increasingly used in clinical decision support, and newly developed models have demonstrated promising potential, yet their diagnostic performance for critically ill patients in intensive care unit (ICU) settings remains underexplored. This study evaluated the diagnostic accuracy, differential diagnosis quality, and response quality in critical illness cases of four newly developed LLMs.

Methods: In this cross-sectional comparative study, four newly developed LLMs-ChatGPT-4o, ChatGPT-o3, DeepSeek-V3, and DeepSeek-R1-were evaluated using 50 critical illness cases in ICU settings from published literature.

View Article and Find Full Text PDF