Simulation-derived best practices for clustering clinical data.

J Biomed Inform

Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Dr, Columbus, OH 43210, USA. Electronic address:

Published: June 2021


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Introduction: Clustering analyses in clinical contexts hold promise to improve the understanding of patient phenotype and disease course in chronic and acute clinical medicine. However, work remains to ensure that solutions are rigorous, valid, and reproducible. In this paper, we evaluate best practices for dissimilarity matrix calculation and clustering on mixed-type, clinical data.

Methods: We simulate clinical data to represent problems in clinical trials, cohort studies, and EHR data, including single-type datasets (binary, continuous, categorical) and 4 data mixtures. We test 5 single distance metrics (Jaccard, Hamming, Gower, Manhattan, Euclidean) and 3 mixed distance metrics (DAISY, Supersom, and Mercator) with 3 clustering algorithms (hierarchical (HC), k-medoids, self-organizing maps (SOM)). We quantitatively and visually validate by Adjusted Rand Index (ARI) and silhouette width (SW). We applied our best methods to two real-world data sets: (1) 21 features collected on 247 patients with chronic lymphocytic leukemia, and (2) 40 features collected on 6000 patients admitted to an intensive care unit.

Results: HC outperformed k-medoids and SOM by ARI across data types. DAISY produced the highest mean ARI for mixed data types for all mixtures except unbalanced mixtures dominated by continuous data. Compared to other methods, DAISY with HC uncovered superior, separable clusters in both real-world data sets.

Discussion: Selecting an appropriate mixed-type metric allows the investigator to obtain optimal separation of patient clusters and get maximum use of their data. Superior metrics for mixed-type data handle multiple data types using multiple, type-focused distances. Better subclassification of disease opens avenues for targeted treatments, precision medicine, clinical decision support, and improved patient outcomes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9017600PMC
http://dx.doi.org/10.1016/j.jbi.2021.103788DOI Listing

Publication Analysis

Top Keywords

data
12
data types
12
best practices
8
clinical data
8
distance metrics
8
real-world data
8
features collected
8
clinical
7
simulation-derived best
4
clustering
4

Similar Publications

Evaluating Amino Acid Profiles and Blood Gas Concentrations Between Single and Twin Merino Newborn Lambs.

Anim Sci J

January 2025

Davies Livestock Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, South Australia, Australia.

As sheep production standards progress, and animals are bred for high production in terms of the number and weight of lambs weaned per ewe, research has identified a difference in the physiology of single lambs compared to multiple born lambs. The current study aimed to report the baseline amino acid (AA) profiles and blood gas concentrations in newborn, Merino single and twin lambs. From 120 days of gestation, 50 single-bearing and 50 twin-bearing, naturally mated Merino ewes were monitored for signs of approaching parturition.

View Article and Find Full Text PDF

Dupilumab monotherapy in super-elderly patients with bullous pemphigoid: a retrospective study on long-term efficacy and safety in mild to moderate cases.

J Dermatolog Treat

December 2025

Department of Dermatology, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China.

Background: Bullous pemphigoid (BP) is a common autoimmune subepidermal bullous disease. Dupilumab, an IL-4/IL-13 inhibitor, represents a novel therapeutic approach for BP, but real-world long-term data in super-elderly patients are limited.

Methods: This retrospective, single-center observational study included super-elderly BP patients (≥80 years) receiving dupilumab monotherapy from September 2022 to September 2024.

View Article and Find Full Text PDF

Peripheral parenteral nutrition use outside of a dedicated nutrition support service: A descriptive cohort study.

JPEN J Parenter Enteral Nutr

September 2025

Department of Gastroenterology, Austin Health, Heidelberg, Victoria, Australia.

Background: Hospitalized patients may require nutrition support because of inadequate intake or impaired gut function. Enteral nutrition is preferred over parenteral nutrition because of fewer complications and earlier return of gut function. This study describes peripheral parenteral nutrition (PPN) use in an Australian tertiary center, evaluating its indications, incidence of adverse effects, and outcomes without the support of a nutrition support service.

View Article and Find Full Text PDF

Accelerated failure time (AFT) models offer an attractive alternative to Cox proportional hazards models. AFT models are collapsible and, unlike hazard ratios in proportional hazards models, the acceleration factor-a key effect measure in AFT models-is collapsible, meaning its value remains unchanged when adjusting for additional covariates. In addition, AFT models provide an intuitive interpretation directly on the survival time scale.

View Article and Find Full Text PDF

Phase I dose escalation trials in oncology generally aim to find the maximum tolerated dose. However, with the advent of molecular-targeted therapies and antibody drug conjugates, dose-limiting toxicities are less frequently observed, giving rise to the concept of optimal biological dose (OBD), which considers both efficacy and toxicity. The estimand framework presented in the addendum of the ICH E9(R1) guidelines strengthens the dialogue between different stakeholders by bringing in greater clarity in the clinical trial objectives and by providing alignment between the targeted estimand under consideration and the statistical analysis methods.

View Article and Find Full Text PDF