Pyramid contrastive learning for clustering.

Neural Netw

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China. Electronic address:

Published: May 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

With its ability of joint representation learning and clustering via deep neural networks, the deep clustering have gained significant attention in recent years. Despite the considerable progress, most of the previous deep clustering methods still suffer from three critical limitations. First, they tend to associate some distribution-based clustering loss to the neural network, which often overlook the sample-wise contrastiveness for discriminative representation learning. Second, they generally utilize the features learned at a single layer for the clustering process, which, surprisingly, cannot go beyond a single layer to explore multiple layers for joint multi-layer (multi-stage) learning. Third, they typically use the convolutional neural network (CNN) for clustering images, which focus on local information yet cannot well capture the global dependencies. To tackle these issues, this paper presents a new deep clustering method called pyramid contrastive learning for clustering (PCLC), which is able to incorporate a pyramidal contrastive architecture to jointly enforce contrastive learning and clustering at multiple network layers (or stages). Particularly, for an input image, two types of augmentations are first performed to generate two paralleled augmented views. To bridge the gap between the CNN (for capturing local information) and the Transformer (for reflecting global dependencies), a mixed CNN-Transformer based encoder is utilized as the backbone, whose CNN-Transformer blocks are further divided into four stages, thus giving rise to a pyramid of multi-stage feature representations. Thereafter, multiple stages of twin contrastive learning are simultaneously conducted at both the instance-level and the cluster-level, through the optimization of which the final clustering can be achieved. Extensive experiments on multiple challenging image datasets demonstrate the superior clustering performance of PCLC over the state-of-the-art. The source code is available at https://github.com/Zachary-Chow/PCLC.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2025.107217DOI Listing

Publication Analysis

Top Keywords

contrastive learning
16
learning clustering
16
clustering
12
deep clustering
12
pyramid contrastive
8
representation learning
8
neural network
8
single layer
8
global dependencies
8
learning
7

Similar Publications

Aim: To explore nursing students' satisfaction levels of each specific item and perceptions under the unprecedented abrupt online clinical practicum during the COVID-19 pandemic.

Design: A mixed-method design comprises a questionnaire and qualitative content analysis.

Methods: The study used purposive sampling using data from nursing students in grade 3 of a 4-year bachelor RN programme at a technological university in the north of Taiwan, compiled from May 2021 to June 2021 using an online questionnaire.

View Article and Find Full Text PDF

To develop and validate a deep-learning-based algorithm for automatic identification of anatomical landmarks and calculating femoral and tibial version angles (FTT angles) on lower-extremity CT scans. In this IRB-approved, retrospective study, lower-extremity CT scans from 270 adult patients (median age, 69 years; female to male ratio, 235:35) were analyzed. CT data were preprocessed using contrast-limited adaptive histogram equalization and RGB superposition to enhance tissue boundary distinction.

View Article and Find Full Text PDF

Introduction: Augmented reality (AR) telestration has the potential to completely transform surgical teaching and training. In contrast to traditional telestration and telestration without AR, this systematic review and meta-analysis attempted to thoroughly assess the effect of telestration with AR on a variety of performance metrics, including task completion time, error rates, GOALS task-specific scores, Objective Structured Assessments of Technical Skills (OSATS) task-specific scores, and Global Operative Assessment of Laparoscopic Skills (GOALS) global scores.

Methods: Six relevant publications were included after a thorough literature search was carried out on March 2024 across relevant databases.

View Article and Find Full Text PDF

EndoChat: Grounded multimodal large language model for endoscopic surgery.

Med Image Anal

August 2025

The Chinese University of Hong Kong, 999077, Hong Kong Special Administrative Region of China. Electronic address:

Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense potential in computer-aided diagnosis and decision-making. In the context of robotic-assisted surgery, MLLMs can serve as effective tools for surgical training and guidance. However, there is still a deficiency of MLLMs specialized for surgical scene understanding in endoscopic procedures.

View Article and Find Full Text PDF

Benchmarking AI-driven acoustic monitoring for floating marine debris: Challenges in deep learning-based debris extraction.

Mar Pollut Bull

September 2025

Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8563, Japan. Electronic address:

Existing studies have identified a substantial amount of invisible floating debris in low-visibility marine environments, in addition to debris on the surface and seabed. These suspended pollutants represent a persistent and dynamic threat to marine ecosystems and maritime safety. Although sonar technology facilitates debris monitoring in low-visibility waters, the automatic extraction of small and weakly contrasted debris targets remains a critical challenge.

View Article and Find Full Text PDF