A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data.

Xi Peng , Huajin Tang , Lei Zhang , Zhang Yi , Shijie Xiao

IEEE Trans Neural Netw Learn Syst

Published: December 2016

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph, which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and l -norm-based representation, and have achieved the state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving the large-scale problems. Second, they cannot cope with the out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework that makes the representation-based subspace clustering algorithms feasible to cluster both the out-of-sample and the large-scale data. Under our framework, the large-scale problem is tackled by converting it as the out-of-sample problem in the manner of sampling, clustering, coding, and classifying. Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently proposed scalable methods in clustering a large-scale data set.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2015.2490080	DOI Listing

Publication Analysis

Top Keywords

subspace clustering

large-scale data

similarity graph

unified framework

framework representation-based

representation-based subspace

out-of-sample large-scale

data

data framework

graph cluster

Similar Publications

Cluster synchronization via graph Laplacian eigenvectors.

Chaos

September 2025

Department of Mathematics and Statistics, University of Vermont, Burlington, Vermont 05405, USA.

Tobias Timofeyev , Alice Patania

Almost equitable partitions (AEPs) have been linked to cluster synchronization in oscillatory systems, highlighting the importance of structure in collective network dynamics. We provide a general spectral framework that formalizes this connection, showing how eigenvectors associated with AEPs span a subspace of the Laplacian spectrum that governs partition-induced synchronization behavior. This offers a principled reduction of network dynamics, allowing clustered states to be understood in terms of quotient graph projections.

View Article and Find Full Text PDF

Similar Publications

Clustering Single-Cell RNA-Seq Data with Low-Rank Matrix Factorization and Local Graph Regularization.

Interdiscip Sci

September 2025

School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, 430205, China.

Yue Yu , Wei Zhang , Xiaoying Zheng , Juan Shen , Yuanyuan Li

Single-cell RNA sequencing (scRNA-seq) offers significant opportunities to reveal cellular heterogeneity and diversity. Accurate cell type identification is critical for downstream analyses and understanding the mechanisms of heterogeneity. However, challenges arise from the high dimensionality, sparsity, and noise of scRNA-seq data.

View Article and Find Full Text PDF

Similar Publications

One-step bipartite graph cut: A normalized formulation and its application to scalable subspace clustering.

Neural Netw

August 2025

School of Computer Science and Engineering, Sun Yat-sen University, China. Electronic address:

Si-Guo Fang , Dong Huang , Chang-Dong Wang , Jian-Huang Lai

The bipartite graph structure has shown its promising ability in facilitating the subspace clustering and spectral clustering algorithms for large-scale datasets. To avoid the post-processing via k-means during the bipartite graph partitioning, the constrained Laplacian rank (CLR) is often utilized for constraining the number of connected components (i.e.

View Article and Find Full Text PDF

Similar Publications

Mode and Ridge Estimation in Euclidean and Directional Product Spaces: A Mean Shift Approach.

J Comput Graph Stat

July 2025

Department of Statistics, University of Washington.

Yikun Zhang , Yen-Chi Chen

The set of local modes and density ridge lines are important summary characteristics of the data-generating distribution. In this work, we focus on estimating local modes and density ridges from point cloud data in a product space combining two or more Euclidean and/or directional metric spaces. Specifically, our approach extends the (subspace constrained) mean shift algorithm to such product spaces, addressing potential challenges in the generalization process.

View Article and Find Full Text PDF

Similar Publications

scPEGSSC:Proximity Enhanced Graph Convolutional Sparse Subspace Clustering Method for scRNA-seq Data.

IEEE Trans Comput Biol Bioinform

June 2025

Jingli Wu , Xiaopeng Wei , Gaoshi Li , Jiafei Liu , Chang He

The identification of cell types by clustering singlecell RNA sequencing (scRNA-seq) data is a fundamental step in the downstream analysis of single-cell data. However, great challenges remain owing to the inherent characteristics of scRNAseq data, including high dimensionality, high noise, and high sparsity. In this study, we propose a proximity enhanced graph convolutional sparse subspace clustering method scPEGSSC for scRNA-seq data.

View Article and Find Full Text PDF

Similar Publications