Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data.

Simon J Doran , Theo Barfoot , Linda Wedlake , Jessica M Winfield , James Petts , Ben Glocker , Xingfeng Li , Martin Leach , Martin Kaiser , Tara D Barwick , Aristeidis Chaidos , Laura Satchwell , Neil Soneji , Khalil Elgendy , Alexander Sheeka , Kathryn Wallitt , Dow-Mu Koh , Christina Messiou , Andrea Rockall

Insights Imaging

Division of Cancer, Department of Surgery and Cancer, Imperial College London, London, UK.

Published: February 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objectives: MAchine Learning In MyelomA Response (MALIMAR) is an observational clinical study combining "real-world" and clinical trial data, both retrospective and prospective. Images were acquired on three MRI scanners over a 10-year window at two institutions, leading to a need for extensive curation.

Methods: Curation involved image aggregation, pseudonymisation, allocation between project phases, data cleaning, upload to an XNAT repository visible from multiple sites, annotation, incorporation of machine learning research outputs and quality assurance using programmatic methods.

Results: A total of 796 whole-body MR imaging sessions from 462 subjects were curated. A major change in scan protocol part way through the retrospective window meant that approximately 30% of available imaging sessions had properties that differed significantly from the remainder of the data. Issues were found with a vendor-supplied clinical algorithm for "composing" whole-body images from multiple imaging stations. Historic weaknesses in a digital video disk (DVD) research archive (already addressed by the mid-2010s) were highlighted by incomplete datasets, some of which could not be completely recovered. The final dataset contained 736 imaging sessions for 432 subjects. Software was written to clean and harmonise data. Implications for the subsequent machine learning activity are considered.

Conclusions: MALIMAR exemplifies the vital role that curation plays in machine learning studies that use real-world data. A research repository such as XNAT facilitates day-to-day management, ensures robustness and consistency and enhances the value of the final dataset. The types of process described here will be vital for future large-scale multi-institutional and multi-national imaging projects.

Critical Relevance Statement: This article showcases innovative data curation methods using a state-of-the-art image repository platform; such tools will be vital for managing the large multi-institutional datasets required to train and validate generalisable ML algorithms and future foundation models in medical imaging.

Key Points: • Heterogeneous data in the MALIMAR study required the development of novel curation strategies. • Correction of multiple problems affecting the real-world data was successful, but implications for machine learning are still being evaluated. • Modern image repositories have rich application programming interfaces enabling data enrichment and programmatic QA, making them much more than simple "image marts".

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10869673	PMC
http://dx.doi.org/10.1186/s13244-023-01591-7	DOI Listing

Publication Analysis

Top Keywords

machine learning

real-world data

imaging sessions

data

final dataset

will vital

curation

machine

learning

imaging

Similar Publications

Performance of vision language models for optic disc swelling identification on fundus photographs.

Front Digit Health

August 2025

Department of Ophthalmology, Stanford University, Palo Alto, CA, United States.

Kelvin Zhenghao Li , Tuyet Thao Nguyen , Heather E Moss

Introduction: Vision language models (VLMs) combine image analysis capabilities with large language models (LLMs). Because of their multimodal capabilities, VLMs offer a clinical advantage over image classification models for the diagnosis of optic disc swelling by allowing a consideration of clinical context. In this study, we compare the performance of non-specialty-trained VLMs with different prompts in the classification of optic disc swelling on fundus photographs.

View Article and Find Full Text PDF

Similar Publications

Multi-Omics and Clinical Validation Identify Key Glycolysis- and Immune-Related Genes in Sepsis.

Int J Gen Med

September 2025

Department of Geriatrics, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 610072, People's Republic of China.

Hengjian Du , Xin Dai , Ting Zhang , Zhao Zhang , XiaoTao Xu

Background: Sepsis is characterized by profound immune and metabolic perturbations, with glycolysis serving as a pivotal modulator of immune responses. However, the molecular mechanisms linking glycolytic reprogramming to immune dysfunction remain poorly defined.

Methods: Transcriptomic profiles of sepsis were obtained from the Gene Expression Omnibus.

View Article and Find Full Text PDF

Similar Publications

Identifying Features of Electroencephalography Associated with Improved Awareness in Persistent Vegetative State via Multiscale Entropy: A Machine Learning Modeling Study.

Neurotrauma Rep

August 2025

Institute of Acupuncture and Moxibustion, China Academy of Chinese Medical Sciences, Beijing, China.

Keyun Lai , Xiao Chen , Liyun He , Qi Liu , Changsheng Lai

Accurate differentiation between persistent vegetative state (PVS) and minimally conscious state and estimation of recovery likelihood in patients in PVS are crucial. This study analyzed electroencephalography (EEG) metrics to investigate their relationship with consciousness improvements in patients in PVS and developed a machine learning prediction model. We retrospectively evaluated 19 patients in PVS, categorizing them into two groups: those with improved consciousness ( = 7) and those without improvement ( = 12).

View Article and Find Full Text PDF

Similar Publications

Artificial Intelligence in Liver Pathology: Precision Histology for Accurate Diagnoses.

J Clin Exp Hepatol

August 2025

Dept of Histopathology, PGIMER, Chandigarh, 160012, India.

Parikshit Sanyal , Dipanwita Biswas , Suvradeep Mitra

Artificial intelligence (AI) is a technique or tool to simulate or emulate human "intelligence." Precision medicine or precision histology refers to the subpopulation-tailored diagnosis, therapeutics, and management of diseases with its sociocultural, behavioral, genomic, transcriptomic, and pharmaco-omic implications. The modern decade experiences a quantum leap in AI-based models in various aspects of daily routines including practice of precision medicine and histology.

View Article and Find Full Text PDF

Similar Publications

Machine learning predicts improvement of functional outcomes in spinal cord injury patients after inpatient rehabilitation.

Front Rehabil Sci

August 2025

Department of Neurosurgery, David Geffen School of Medicine, University of California, Los Angeles, CA, United States.

Mohammad Rasoolinejad , Irene Say , Peter B Wu , Xinran Liu , Yan Zhou

Introduction: Spinal cord injury (SCI) presents a significant burden to patients, families, and the healthcare system. The ability to accurately predict functional outcomes for SCI patients is essential for optimizing rehabilitation strategies, guiding patient and family decision making, and improving patient care.

Methods: We conducted a retrospective analysis of 589 SCI patients admitted to a single acute rehabilitation facility and used the dataset to train advanced machine learning algorithms to predict patients' rehabilitation outcomes.

View Article and Find Full Text PDF

Similar Publications