Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background Large language models (LLMs) have emerged as powerful tools capable of processing and generating human-like text. These LLMs, such as ChatGPT (OpenAI Incorporated, Mission District, San Francisco, United States), Google Bard (Alphabet Inc., CA, US), and Microsoft Bing (Microsoft Corporation, WA, US), have been applied across various domains, demonstrating their potential to assist in solving complex tasks and improving information accessibility. However, their application in solving case vignettes in physiology has not been explored. This study aimed to assess the performance of three LLMs, namely, ChatGPT (3.5; free research version), Google Bard (Experiment), and Microsoft Bing (precise), in answering cases vignettes in Physiology. Methods This cross-sectional study was conducted in July 2023. A total of 77 case vignettes in physiology were prepared by two physiologists and were validated by two other content experts. These cases were presented to each LLM, and their responses were collected. Two physiologists independently rated the answers provided by the LLMs based on their accuracy. The ratings were measured on a scale from 0 to 4 according to the structure of the observed learning outcome (pre-structural = 0, uni-structural = 1, multi-structural = 2, relational = 3, extended-abstract). The scores among the LLMs were compared by Friedman's test and inter-observer agreement was checked by the intraclass correlation coefficient (ICC). Results The overall scores for ChatGPT, Bing, and Bard in the study, with a total of 77 cases, were found to be 3.19±0.3, 2.15±0.6, and 2.91±0.5, respectively, p<0.0001. Hence, ChatGPT 3.5 (free version) obtained the highest score, Bing (Precise) had the lowest score, and Bard (Experiment) fell in between the two in terms of performance. The average ICC values for ChatGPT, Bing, and Bard were 0.858 (95% CI: 0.777 to 0.91, p<0.0001), 0.975 (95% CI: 0.961 to 0.984, p<0.0001), and 0.964 (95% CI: 0.944 to 0.977, p<0.0001), respectively. Conclusion ChatGPT outperformed Bard and Bing in answering case vignettes in physiology. Hence, students and teachers may think about choosing LLMs for their educational purposes accordingly for case-based learning in physiology. Further exploration of their capabilities is needed for adopting those in medical education and support for clinical decision-making.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10475852PMC
http://dx.doi.org/10.7759/cureus.42972DOI Listing

Publication Analysis

Top Keywords

google bard
12
vignettes physiology
12
large language
8
language models
8
chatgpt bing
8
case vignettes
8
llms chatgpt
8
microsoft bing
8
llms
5
performance large
4

Similar Publications

Background Artificial intelligence (AI) increasingly impacts medicine and medical specialties, including nephrology. Technologies such as large language models (LLMs), decision-support AI, and machine learning-powered predictive analytics enhance clinical care. These AI-driven tools show great potential in areas such as predicting the risk of chronic kidney disease, managing dialysis, supporting kidney transplantation, and treating CKD and diabetes-related kidney issues.

View Article and Find Full Text PDF

Artificial intelligence (AI) is increasingly being utilized as an informational resource, with chatbots attracting users for their ability to generate instantaneous responses. This study evaluates the understandability, actionability, readability, quality, and misinformation in medical information provided by four prominent chatbots - Bard, ChatGPT 3.5, Claude 2.

View Article and Find Full Text PDF

Objective: Large language models (LLMs) have advanced rapidly, but their utility in pediatric surgery remains uncertain. This study assessed the performance of three AI models-DeepSeek, Microsoft Copilot (GPT-4) and Google Bard-on the European Pediatric Surgery In-Training Examination (EPSITE).

Methods: We evaluated model performance using 294 EPSITE questions from 2021 to 2023.

View Article and Find Full Text PDF

Objectives: Generative AI interfaces like ChatGPT offer a new way to access health information, but it is unclear if information presented is credible compared to traditional search engines. This study aimed to compare the credibility of vaccination information across generative AI interfaces and traditional search engines.

Study Design: Cross sectional content analysis and comparison.

View Article and Find Full Text PDF