Handling skewness and directional tails in model-based clustering.

Stat Pap (Berl)

Department of Mathematics and Statistics, MacEwan University, Edmonton, Alberta Canada.

Published: July 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Model-based clustering is a powerful approach used in data analysis to unveil underlying patterns or groups within a data set. However, when applied to clusters that exhibit skewness, heavy tails, or both, the classification of data points becomes more challenging. In this study, we introduce two models considering two component-wise transformations of the observed data within a mixture of multiple scaled contaminated normal (MSCN) distributions. MSCN distributions are designed to enable a different tail behavior in each dimension and directional outlier detection in the direction of the principal components. Using the transformed MSCN distributions as components of a mixture, we obtain model-based clustering techniques that allow for 1) flexible cluster shapes in terms of skewness and kurtosis and 2) component-wise and directional outlier detection. We assess the efficacy of the proposed techniques by comparing them with model-based clustering methods that perform global or component-wise outlier detection using simulated and real data sets. This comparative analysis aims to demonstrate which practical clustering scenarios using the proposed MSCN-based approaches are advantageous.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12226708PMC
http://dx.doi.org/10.1007/s00362-025-01723-9DOI Listing

Publication Analysis

Top Keywords

model-based clustering
16
mscn distributions
12
outlier detection
12
directional outlier
8
clustering
5
data
5
handling skewness
4
skewness directional
4
directional tails
4
model-based
4

Similar Publications

Student dropout is a significant challenge in Bangladesh, with serious impacts on both educational and socio-economic outcomes. This study investigates the factors influencing school dropout among students aged 6-24 years, employing data from the 2019 Multiple Indicator Cluster Survey (MICS). The research integrates statistical analysis with machine learning (ML) techniques and explainable AI (XAI) to identify key predictors and enhance model interpretability.

View Article and Find Full Text PDF

Unlabelled: The queen snapper ( Valenciennes in Cuvier & Valenciennes, 1828) is a deep-sea snapper whose commercial importance continues to increase in the US Caribbean. However, little is known about the biology and ecology of this species. In this study, the presence of a fine-scale population structure and genetic diversity of queen snapper from Puerto Rico was assessed through 16,188 SNPs derived from the Restriction site Associated DNA Sequencing (RAD-Seq) technique.

View Article and Find Full Text PDF

Studies have reported the special value of PANoptosis in cancer, but there is no study on the prognostic and therapeutic effects of PANoptosis in bladder cancer (BLCA). This study aimed to explore the role of PANoptosis in BLCA heterogeneity and its impact on clinical outcomes and immunotherapy response while establishing a robust prognostic model based on PANoptosis-related features. Gene expression profiles and clinical data were collected from public databases.

View Article and Find Full Text PDF

Background: Hepatocellular carcinoma (HCC) prognosis continues to be challenging due to tumor heterogeneity and dynamic immunosuppressive microenvironments. Although pyroptosis plays a critical role in tumor-immune interactions, its prognostic significance in HCC at single-cell resolution has not been systematically investigated.

Methods: We analyzed a publicly available single-cell RNA sequencing (scRNA-seq) data from 10 HCC tumors and paired adjacent tissue samples (60,496 cells) to elucidate pyroptosis-related gene (PRG) profiles.

View Article and Find Full Text PDF

Objectives: To identify immunosuppressive neutrophil subsets in patients with prostate cancer (PCa) and construct a risk prediction model for prognosis and immunotherapy response of the patients based on these neutrophil subsets.

Methods: Single-cell and transcriptome data from PCa patients were collected from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA). Neutrophil subsets in PCa were identified through unsupervised clustering, and their biological functions and effects on immune regulation were analyzed by functional enrichment, cell interaction, and pseudo-time series analyses.

View Article and Find Full Text PDF