A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 197

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 317
Function: require_once

Advancement of Generative Pre-trained Transformer Chatbots in Answering Clinical Questions in the Practical Rhinoplasty Guideline. | LitMetric

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: The Generative Pre-trained Transformer (GPT) series, which includes ChatGPT, is an artificial large language model that provides human-like text dialogue. This study aimed to evaluate the performance of artificial intelligence chatbots in answering clinical questions based on practical rhinoplasty guidelines.

Methods: Clinical questions (CQs) developed from the guidelines were used as question sources. For each question, we asked GPT-4 and GPT-3.5 (ChatGPT), developed by OpenAI, to provide answers for the CQs, Policy Level, Aggregate Evidence Quality, Level of Confidence in Evidence, and References. We compared the performance of the two types of artificial intelligence (AI) chatbots.

Results: A total of 10 questions were included in the final analysis, and the AI chatbots correctly answered 90.0% of these. GPT-4 demonstrated a lower accuracy rate than GPT-3.5 in answering CQs, although without statistically significant difference (86.0% vs. 94.0%; p = 0.05), whereas GPT-4 showed significantly higher accuracy for the level of confidence in Evidence than GPT-3.5 (52.0% vs. 28.0%; p < 0.01). No statistical differences were observed in Policy Level, Aggregate Evidence Quality, and Reference Match. In addition, GPT-4 rated significantly higher in presenting existing references than GPT-3.5 (36.9% vs. 24.1%; p = 0.01).

Conclusions: The overall performance of GPT-4 was similar to that of GPT-3.5. However, GPT-4 provided existing references at a higher rate than GPT-3.5. GPT-4 has the potential to provide a more accurate reference in professional fields, including rhinoplasty.

Level Of Evidence V: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00266-024-04377-4DOI Listing

Publication Analysis

Top Keywords

clinical questions
12
generative pre-trained
8
pre-trained transformer
8
chatbots answering
8
answering clinical
8
practical rhinoplasty
8
artificial intelligence
8
level confidence
8
confidence evidence
8
advancement generative
4

Similar Publications