98%
921
2 minutes
20
Objective: To evaluate the accuracy and parsing ability of GPT 4.0 for Japanese medical practitioner qualification examinations in a multidimensional way to investigate its response accuracy and comprehensiveness to medical knowledge.
Methods: We evaluated the performance of the GPT 4.0 on Japanese Medical Licensing Examination (JMLE) questions (2021-2023). Questions are categorized by difficulty and type, with distinctions between general and clinical parts, as well as between single-choice (MCQ1) and multiple-choice (MCQ2) questions. Difficulty levels were determined on the basis of correct rates provided by the JMLE Preparatory School. The accuracy and quality of the GPT 4.0 responses were analyzed via an improved Global Qualily Scale (GQS) scores, considering both the chosen options and the accompanying analysis. Descriptive statistics and Pearson Chi-square tests were used to examine performance across exam years, question difficulty, type, and choice. GPT 4.0 ability was evaluated via the GQS, with comparisons made via the Mann-Whitney U or Kruskal-Wallis test.
Results: The correct response rate and parsing ability of the GPT4.0 to the JMLE questions reached the qualification level (80.4%). In terms of the accuracy of the GPT4.0 response to the JMLE, we found significant differences in accuracy across both difficulty levels and option types. According to the GQS scores for the GPT 4.0 responses to all the JMLE questions, the performance of the questionnaire varied according to year and choice type.
Conclusion: GTP4.0 performs well in providing basic support in medical education and medical research, but it also needs to input a large amount of medical-related data to train its model and improve the accuracy of its medical knowledge output. Further integration of ChatGPT with the medical field could open new opportunities for medicine.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s11596-024-2932-9 | DOI Listing |
JMIR Med Inform
August 2025
Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan, 81 3-3815-5411.
Background: Recent advances in large language models have highlighted the need for high-quality multilingual medical datasets. Although Japan is a global leader in computed tomography (CT) scanner deployment and use, the absence of large-scale Japanese radiology datasets has hindered the development of specialized language models for medical imaging analysis. Despite the emergence of multilingual models and language-specific adaptations, the development of Japanese-specific medical language models has been constrained by a lack of comprehensive datasets, particularly in radiology.
View Article and Find Full Text PDFStud Health Technol Inform
August 2025
Graduate School of Public Health, St.Luke's International University, Tokyo, Japan.
Medication errors significantly challenge healthcare, necessitating innovative analytical methods. This study explored generative pre-trained language models (LLMs) for Named Entity Recognition (NER) in Japanese medical incident reports. We assessed four LLMs-Llama-3-ELYZA, BioMistral-7B, GPT-4.
View Article and Find Full Text PDFJMIR Med Educ
August 2025
Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga, 321-0293, Japan, 81 282861111.
Background: The medical interview remains a cornerstone of clinical training. There is growing interest in applying generative artificial intelligence (AI) in medical education, including medical interview training. However, its utility in culturally and linguistically specific contexts, including Japanese, remains underexplored.
View Article and Find Full Text PDFJMIR Med Educ
July 2025
Department of Surgery, Tohoku University Graduate School of Medicine, Sendai, Japan.
Background: Artificial intelligence and large language models (LLMs)-particularly GPT-4 and GPT-4o-have demonstrated high correct-answer rates in medical examinations. GPT-4o has enhanced diagnostic capabilities, advanced image processing, and updated knowledge. Japanese surgeons face critical challenges, including a declining workforce, regional health care disparities, and work-hour-related challenges.
View Article and Find Full Text PDFRadiol Phys Technol
September 2025
Department of Radiology, Shiga University of Medical Science Hospital, Otsu, Shiga, Japan.
Recent advances in large language models (LLMs) enable domain-specific question answering using external knowledge. However, addressing information that is not included in training data remains a challenge, particularly in nuclear medicine, where examination protocols are frequently updated and vary across institutions. In this study, we developed a retrieval-augmented generation (RAG) system using 40 internal manuals from a single Japanese hospital, each corresponding to a different examination in nuclear medicine.
View Article and Find Full Text PDF