Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: The use of artificial intelligence platforms by medical residents as an educational resource is increasing. Within orthopaedic surgery, older Chat Generative Pre-trained Transformer (ChatGPT) models performed worse than resident physicians on practice examinations and rarely answered questions with images correctly. The newer ChatGPT-4o was designed to improve these deficiencies but has not been evaluated. This study analyzed (1) ChatGPT-4o's ability to correctly answer Orthopaedic In-Training Examination (OITE) questions and (2) the educational quality of the answer explanations that it presents to our orthopaedic surgery trainees.

Methods: The 2020 to 2022 OITEs were uploaded into ChatGPT-4o. Annual score reports were used to compare the chatbot's raw score with that of ACGME-accredited orthopaedic residents. ChatGPT-4o's answer explanations were then compared with those provided by the American Academy of Orthopaedic Surgeons (AAOS) and categorized based on (1) the chatbot's answer (correct/incorrect) and (2) the chatbot's answer explanation when compared with the explanation provided by AAOS subject-matter experts (classified as consistent, disparate, or nonexistent). Overall ChatGPT-4o response quality was then simplified into 3 groups. An "ideal" response combined a correct answer with a consistent explanation. "Inadequate" responses provided a correct answer but no explanation. "Unacceptable" responses provided an incorrect answer or disparate explanation.

Results: ChatGPT-4o scored 68.8%, 63.4%, and 70.1% on the 2020, 2021, and 2022 OITEs, respectively. These raw scores corresponded with ACGME-accredited postgraduate year-5 (PGY-5), PGY2-3, and PGY-4 resident physicians. Pediatrics and Spine were the only subspecialties whereby ChatGPT-4o consistently performed better than a junior resident (≥PGY-3). The quality of responses provided by ChatGPT-4o was ideal, inadequate, or unacceptable in 58.7%, 6.9%, and 34.4% of questions, respectively. ChatGPT-4o scored significantly lower on media-related questions when compared with nonmedia questions (60.0% versus 73.1%, p < 0.001).

Conclusions: ChatGPT-4o performed inconsistently on the OITE. Moreover, the responses it provided trainees were not always ideal. Its limited performance on media-based orthopaedic surgery questions also persisted. The use of ChatGPT by resident physicians while studying orthopaedic surgery concepts remains unvalidated.

Level Of Evidence: Level IV. See Instructions for Authors for a complete description of levels of evidence.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12417002PMC
http://dx.doi.org/10.2106/JBJS.OA.25.00112DOI Listing

Publication Analysis

Top Keywords

orthopaedic surgery
20
responses provided
16
resident physicians
12
chatgpt-4o
9
orthopaedic
8
answer
8
answer explanations
8
2022 oites
8
chatbot's answer
8
answer explanation
8

Similar Publications

Osteoporotic hip fractures are a considerable cause of pain and disability particularly among the elderly. Osteoporosis causes loss of bone stability, which in turn leads to an increased risk of fractures especially in metaphyseal bone. Moreover, the body's capacity for healing is diminished, resulting in prolonged recovery times following these fractures.

View Article and Find Full Text PDF

Background: Pressure injuries are common, difficult to manage, and carry a high economic burden. They are challenging to physicians and a burden to society.

Case Report: An 89-year-old male, who had previously undergone internal fixation with screws and rods for a right intertrochanteric fracture, developed a deep circular open ulcer measuring 11 cm × 7.

View Article and Find Full Text PDF

Background: Diabetic foot ulcers (DFUs) are a major clinical challenge, particularly among patients with refractory ulcers, that often lead to severe complications such as infection, amputation, and high mortality. Innovations supported by strong clinical evidence have the potential to improve healing outcomes, enhance quality of life, and reduce the economic burden on individuals and health care systems.

Objective: To describe the design of the concurrent optical and magnetic stimulation (COMS) therapy Investigational Device Exemption (IDE) study for refractory DFUs (MAVERICKS) trial.

View Article and Find Full Text PDF

Functional recovery after total knee arthroplasty (TKA) varies widely among individuals, and traditional assessments often fail to detect subtle changes in real-world walking ability. Wearable sensors offer continuous and objective tracking of gait outside of clinical settings. In this prospective, longitudinal study, thirty-one patients undergoing unilateral TKA wore thigh-mounted accelerometers continuously from 2 weeks before surgery through 90 days postoperatively.

View Article and Find Full Text PDF

Purpose: This study aimed to investigate the relationship between tissue bridges and bladder and bowel outcomes in chronic cervical spinal cord injury (SCI).

Methods: Between July 2020 and January 2024, 44 patients with chronic cervical SCI were retrospectively included in this cross-sectional study at a specialized SCI center. Lesion severity was assessed by tissue bridges, lesion length, lesion width, and lesion area.

View Article and Find Full Text PDF