Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Large language models (LLMs) are increasingly used in healthcare settings to provide patient education and answer medical inquiries. However, their reliability in delivering accurate, clear, and unbiased information remains uncertain. This study aims to evaluate the quality of responses generated by LLMs to common patient questions regarding facial plastic surgery.

Methods: A total of 60 patient-oriented questions related to facial plastic surgery will be selected from professional bodies, patient support groups, and social media platforms. These questions will be categorized into six main topics: fundamental knowledge, preoperative considerations, surgical procedures, procedural risks and postoperative complications, preparation and recovery, and miscellaneous concerns. Seven LLMs - ChatGPT 4o, Claude, Copilot, DeepSeek, Gemini, Grok, and OpenEvidence - will be tested by inputting each question twice using the "New Chat" feature to assess response consistency. Responses will be evaluated by ten American board-certified plastic surgeons using a structured scoring rubric covering four criteria: accuracy, clarity, completeness, and appropriateness. A standardized scoring system will be employed, and inter-rater reliability will be measured to ensure consistency among evaluators.

Discussion: By systematically assessing the responses of multiple LLMs to patient inquiries on facial plastic surgery, this study will provide insights into their reliability and clinical applicability. Findings may help refine LLM-based tools for patient education and identify areas requiring improvement to ensure safe and effective AI-assisted communication in plastic surgery.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12373109PMC
http://dx.doi.org/10.1097/SP9.0000000000000052DOI Listing

Publication Analysis

Top Keywords

facial plastic
16
plastic surgery
16
patient education
12
large language
8
language models
8
questions facial
8
will
7
patient
6
plastic
6
evaluating large
4

Similar Publications

Facial feminization surgery (FFS) reshapes masculine facial attributes to align with feminine norms, yet normative anthropometric data for Asian populations remain sparse. We therefore quantified sex-related 3-dimensional (3D) facial metrics in healthy Asian adults to delineate dimorphic benchmarks for surgical planning. We prospectively recruited 40 healthy Asian adults (20 males, 20 females; age 18 to 45 years, mean 28.

View Article and Find Full Text PDF

Background: Facial transplantation offers transformative solutions for patients with severe facial disfigurements. Minimizing ischemia time is critical for preserving tissue viability, and prioritizing facial allograft recovery during multi-organ procurement aims to optimize outcomes. This study evaluates whether prioritizing face allograft procurement affects the outcomes of non-vascularized composite allotransplantation (non-VCA) organ transplants.

View Article and Find Full Text PDF

Purpose: To objectively quantify, in East Asians and Caucasians, the width and distribution of the retro-orbicularis oculi and frontalis fat (ROOF) pad, subcutaneous fat, and orbicularis oculi muscle (OOM) at the superior orbital rim margin as well as 5 mm superior and inferior to this point.

Methods: Thirty adults were studied by high-resolution, surface coil MRI. In the quasi-sagittal image through the globe center, the ROOF, subcutaneous fat, and OOM thickness were measured anterior to the orbital septum, at 3 points: at the superior orbital rim, and 5 mm superior, and 5 mm inferior to the rim.

View Article and Find Full Text PDF