Rationale And Objectives: This study evaluates the performance, cost, and processing time of OpenAI's reasoning large language models (LLMs) (o1-preview, o1-mini) and their base models (GPT-4o, GPT-4o-mini) on Japanese radiology board examination questions.
Materials And Methods: A total of 210 questions from the 2022-2023 official board examinations of the Japan Radiological Society were presented to each of the four LLMs. Performance was evaluated by calculating the percentage of correctly answered questions within six predefined radiology subspecialties.