98%
921
2 minutes
20
Background And Objective: In the transformative era of artificial intelligence, its integration into various spheres, especially healthcare, has been promising. The objective of this study was to analyze the performance of ChatGPT, as open-source Large Language Model (LLM), in its different versions on the recent European Board of Urology (EBU) in-service assessment questions.
Design And Setting: We asked multiple choice questions of the official EBU test books to ChatGPT-3.5 and ChatGPT-4 for the following exams: exam 1 (2017-2018), exam 2 (2019-2020) and exam 3 (2021-2022). Exams were passed with ≥60% correct answers.
Results: ChatGPT-4 provided significantly more correct answers in all exams compared to the prior version 3.5 (exam 1: ChatGPT-3.5 64.3% vs. ChatGPT-4 81.6%; exam 2: 64.5% vs. 80.5%; exam 3: 56% vs. 77%, p < 0.001, respectively). Test exam 3 was the only exam ChatGPT-3.5 did not pass. Within the different subtopics, there were no significant differences of provided correct answers by ChatGPT-3.5. Concerning ChatGPT-4, the percentage in test exam 3 was significantly decreased in the subtopics Incontinence (exam 1: 81.6% vs. exam 3: 53.6%; p = 0.026) and Transplantation (exam 1: 77.8% vs. exam 3: 0%; p = 0.020).
Conclusion: Our findings indicate that ChatGPT, especially ChatGPT-4, has the general ability to answer complex medical questions and might pass FEBU exams. Nevertheless, there is still the indispensable need for human validation of LLM answers, especially concerning health care issues.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s00345-024-05137-4 | DOI Listing |
Int J Cardiovasc Imaging
September 2025
Klinikum Fürth, Friedrich-Alexander-University Erlangen- Nürnberg, Fürth, Germany.
Myocarditis is an inflammation of heart tissue. Cardiovascular magnetic resonance imaging (CMR) has emerged as an important non-invasive imaging tool for diagnosing myocarditis, however, interpretation remains a challenge for novice physicians. Advancements in machine learning (ML) models have further improved diagnostic accuracy, demonstrating good performance.
View Article and Find Full Text PDFActa Neurochir (Wien)
September 2025
Department of Neurosurgery, Istinye University, Istanbul, Turkey.
Background: Recent studies suggest that large language models (LLMs) such as ChatGPT are useful tools for medical students or residents when preparing for examinations. These studies, especially those conducted with multiple-choice questions, emphasize that the level of knowledge and response consistency of the LLMs are generally acceptable; however, further optimization is needed in areas such as case discussion, interpretation, and language proficiency. Therefore, this study aimed to evaluate the performance of six distinct LLMs for Turkish and English neurosurgery multiple-choice questions and assess their accuracy and consistency in a specialized medical context.
View Article and Find Full Text PDFEur J Clin Invest
September 2025
Department of Internal Medicine and Medical Specialties (DiMI), Università degli Studi di Genova, Genoa, Italy.
This pilot study evaluated the influence of medical background on the diagnostic quality of ChatGPT-4's responses in Internal Medicine. Third-year students, residents and specialists summarised five complex NEJM clinical cases before querying ChatGPT-4. Diagnostic ranking, assessed by independent experts, revealed that residents significantly outperformed students (OR 2.
View Article and Find Full Text PDFArch Osteoporos
September 2025
Department of Family Medicine, Chang-Gung Memorial Hospital, Linkou Branch, Taoyuan City, Taiwan.
Unlabelled: The study assesses the performance of AI models in evaluating postmenopausal osteoporosis. We found that ChatGPT-4o produced the most appropriate responses, highlighting the potential of AI to enhance clinical decision-making and improve patient care in osteoporosis management.
Purpose: The rise of artificial intelligence (AI) offers the potential for assisting clinical decisions.
Arch Environ Contam Toxicol
September 2025
Ecole Polytechnique Fédérale de Lausanne (EPFL), School of Architecture, Civil and Environmental Engineering, 1015, Lausanne, Switzerland.
Pollution from past industrial activities can remain unnoticed for years or even decades because the pollutant has only recently gained attention or been identified by measurements. Modeling the emission history of pollution is essential for estimating population exposure and apportioning potential liability among stakeholders. This paper proposes a novel approach for reconstructing the history of polychlorinated dibenzo-p-dioxin (PCDD) and polychlorinated dibenzofuran (PCDF) pollution from municipal solid waste incinerators (MSWIs) with unknown past emissions.
View Article and Find Full Text PDF