Severity: Warning
Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 197
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML
File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 317
Function: require_once
98%
921
2 minutes
20
Background: Large language models (LLMs), such as ChatGPT, have demonstrated potential in synthesizing complex clinical information, yet concerns persist regarding their accuracy and reliability in specialized domains. The rationale of this study is to address a gap in the literature by evaluating ChatGPT-4o's capabilities and limitations in terms of accuracy and reliability on oral and maxillofacial traumatology.
Material And Methods: A total of 188 oral and maxillofacial trauma-related questions were selected from a comprehensive resource. Thirty questions were randomly chosen and submitted to ChatGPT-4o resetting to "new chat" mode every repetition to eliminate potential memory bias. Accuracy was scored using a 3-point Likert scale. Reliability was assessed with weighted kappa (κ) and Intraclass Correlation Coefficient (ICC), and internal consistency was evaluated using both Cronbach's alpha (α) and McDonald's omega (ω).
Results: The accuracy rates for comprehensive and adequate responses were calculated as 38% (95% CI: 32.5% - 43.5%) and 58% (95% CI: 52.1% - 63.3%), respectively. Weighted kappa (κ = 0.469) and ICC (0.503) indicated moderate reliability. Internal consistency metrics revealed excellent and good reliability, respectively (α = 0.904, ω = 0.860).
Conclusions: ChatGPT-4o demonstrated promising results as an adjunct tool in providing supplementary educational content, verifying critical information, and supporting the decision-making processes in oral and maxillofacial traumatology. Current limitations warrant further research. Future enhancements in LLMs and prompt engineering may assist in the optimization of their clinical applicability and alignment with evidence-based standards.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395565 | PMC |
http://dx.doi.org/10.4317/medoral.27229 | DOI Listing |