Kellgren-Lawrence grading of knee osteoarthritis using deep learning: Diagnostic performance with external dataset and comparison with four readers.

Elias Vaattovaara , Egor Panfilov , Aleksei Tiulpin , Tuukka Niinimäki , Jaakko Niinimäki , Simo Saarakkala , Mika T Nevalainen

Osteoarthr Cartil Open

Research Unit of Health Sciences and Technology, University of Oulu, Oulu, Finland.

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objective: To evaluate the performance of a deep learning (DL) model in an external dataset to assess radiographic knee osteoarthritis using Kellgren-Lawrence (KL) grades against versatile human readers.

Materials And Methods: Two-hundred-eight knee anteroposterior conventional radiographs (CRs) were included in this retrospective study. Four readers (three radiologists, one orthopedic surgeon) assessed the KL grades and consensus grade was derived as the mean of these. The DL model was trained using all the CRs from Multicenter Osteoarthritis Study (MOST) and validated on Osteoarthritis Initiative (OAI) dataset and then tested on our external dataset. To assess the agreement between the graders, Cohen's quadratic kappa (k) with 95 % confidence intervals were used. Diagnostic performance was measured using confusion matrices and receiver operating characteristic (ROC) analyses.

Results: The multiclass (KL grades from 0 to 4) diagnostic performance of the DL model was multifaceted: sensitivities were between 0.372 and 1.000, specificities 0.691-0.974, PPVs 0.227-0.879, NPVs 0.622-1.000, and AUCs 0.786-0.983. The overall balanced accuracy was 0.693, AUC 0.886, and kappa 0.820. If only dichotomous KL grading (i.e. KL0-1 vs. KL2-4) was utilized, superior metrics were seen with an overall balanced accuracy of 0.902 and AUC of 0.967. A substantial agreement between each reader and DL model was found: the inter-rater agreement was 0.737 [0.685-0.790] for the radiology resident, 0.761 [0.707-0.816] for the musculoskeletal radiology fellow, 0.802 [0.761-0.843] for the senior musculoskeletal radiologist, and 0.818 [0.775-0.860] for the orthopedic surgeon.

Conclusion: In an external dataset, our DL model can grade knee osteoarthritis with diagnostic accuracy comparable to highly experienced human readers.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11876873	PMC
http://dx.doi.org/10.1016/j.ocarto.2025.100580	DOI Listing

Publication Analysis

Top Keywords

external dataset

knee osteoarthritis

diagnostic performance

deep learning

dataset assess

balanced accuracy

osteoarthritis

dataset

model

kellgren-lawrence grading

Similar Publications

A generative adversarial network to improve integrated mode proton imaging resolution using paired proton-carbon data.

Med Phys

September 2025

Department of Medical Physics and Biomedical Engineering, University College London, London, UK.

Mikaël Simard , Ryan Fullarton , Lennart Volz , Christoph Schuy , Savanna Chung

Background: Integrated mode proton imaging is a clinically accessible method for proton radiographs (pRads), but its spatial resolution is limited by multiple Coulomb scattering (MCS). As the amplitude of MCS decreases with increasing particle charge, heavier ions such as carbon ions produce radiographs with better resolution (cRads). Improving image resolution of pRads may thus be achieved by transferring individual proton pencil beam images to the equivalent carbon ion data using a trained image translation network.

View Article and Find Full Text PDF

Similar Publications

Identification and Exploration of Novel B Cell Infiltration-Related Biomarkers in Endometriosis.

Am J Reprod Immunol

September 2025

Department of Laboratory Animal Science, Kunming Medical University, Kunming, China.

Chunyang Zhao , Shuwei Zhang , Baosu Zhang , Hang Tian , Guojun Yan

Objective: To explore B cell infiltration-related genes in endometriosis (EM) and investigate their potential as diagnostic biomarkers.

Methods: Gene expression data from the GSE51981 dataset, containing 77 endometriosis and 34 control samples, were analyzed to detect differentially expressed genes (DEGs). The xCell algorithm was applied to estimate the infiltration levels of 64 immune and stromal cell types, focusing on B cells and naive B cells.

View Article and Find Full Text PDF

Similar Publications

Deep Learning-Based Detection of Reticular Pseudodrusen in Age-Related Macular Degeneration.

Clin Exp Ophthalmol

September 2025

Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, Victoria, Australia.

Himeesh Kumar , Yelena Bagdasarova , Scott Song , Doron G Hickey , Amy C Cohn

Background: Reticular pseudodrusen (RPD) signify a critical phenotype driving vision loss in age-related macular degeneration (AMD). This study sought to develop and externally test a deep learning (DL) model to detect RPD on optical coherence tomography (OCT) scans with expert-level performance.

Methods: RPD were manually segmented in 9800 OCT B-scans from individuals enrolled in a multicentre randomised trial.

View Article and Find Full Text PDF

Similar Publications

Artificial Intelligence Predicts GBA1 Mutated Status in Parkinson's Disease Patients.

Mov Disord Clin Pract

September 2025

Neurology Unit, Neuromotor and Rehabilitation Department, Azienda USL-IRCCS di Reggio Emilia, Reggio Emilia, Italy.

Giulia Di Rauso , Alessandro Ghibellini , Sara Grisanti , Valentina Fioravanti , Edoardo Monfrini

Background: GBA1 variants are the major genetic risk factor for Parkinson's Disease (PD) and account for 5-30% of PD cases depending on the population and age at onset of the disease.

Objectives: The aim of this study was to assess whether Artificial Intelligence (AI) could predict GBA1-mutated genotype in PD (GBA1-PD). Particularly, the main objective was to identify a Machine Learning (ML) model capable of accurately providing a pre-test estimate of GBA1-mutated status, relying on the clinical and demographic variables with the highest predictive value.

View Article and Find Full Text PDF

Similar Publications

Comparative performance of machine learning and conventional scoring systems for neuroprognostication in out-of-hospital cardiac arrest survivors.

J Formos Med Assoc

September 2025

Department of Emergency Medicine, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan; Department of Emergency Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan. Electronic address:

Chi-Hsin Chen , Edward Pei-Chuan Huang , Cheng-Yi Fan , Yi-Chien Kuo , Yi-Ju Ho

Background: Accurately predicting the neurological outcomes in out-of-hospital cardiac arrest (OHCA) survivors is crucial. Conventional prediction scores should be validated across different settings. Additionally, machine learning (ML) models may provide improved predictive performance.

View Article and Find Full Text PDF

Similar Publications