Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

RGB-D action data inherently equip with extra depth information to improve performance of action recognition compared with RGB data, and many works represent the RGB-D data as a third-order tensor containing a spatiotemporal structure and find a subspace with lower dimension. However, there are two main challenges of these methods. First, the dimension of subspace is usually fixed manually, which may not describe the samples well in the subspace. Second, preserving local information by finding the intra-class and inter-class neighbors from a manifold is highly time-consuming. In this paper, we learn a tensor subspace, whose dimension is learned automatically by low-rank learning, for RGB-D action recognition. Particularly, the tensor samples are factorized to obtain three projection matrices (PMs) by Tucker Decomposition, where all the PMs are performed by nuclear norm in a close-form to obtain the tensor ranks, which are used as tensor subspace dimension. In addition, we extract the discriminant and local information from a manifold using a graph constraint. This graph preserves the local knowledge inherently, which is faster than the previous way of calculating both the intra-class and inter-class neighbors of each sample. We evaluate the proposed method on four widely used RGB-D action datasets including MSRDailyActivity3D, MSRActionPairs, MSRActionPairs skeleton, and UTKinect-Action3D datasets, and the experimental results show higher accuracy and efficiency of the proposed method.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2016.2589320DOI Listing

Publication Analysis

Top Keywords

rgb-d action
16
tensor subspace
12
action recognition
12
learning rgb-d
8
intra-class inter-class
8
inter-class neighbors
8
subspace dimension
8
proposed method
8
subspace
6
rgb-d
5

Similar Publications

Contrastive learning has shown remarkable success in the domain of skeleton-based action recognition. However, the design of data transformations, which is crucial for effective contrastive learning, remains a challenging aspect in the context of skeleton-based action recognition. The difficulty lies in creating data transformations that capture rich motion patterns while ensuring that the transformed data retains the same semantic information.

View Article and Find Full Text PDF

Human Motion Intention Recognition (HMIR) plays a vital role in advancing medical rehabilitation and assistive technologies by enabling the early detection of pain-indicative actions such as sneezing, coughing, or back discomfort. However, existing systems struggle with recognizing such subtle movements due to complex postural variations and environmental noise. This paper presents a novel multi-modal framework that integrates RGB and depth data to extract high-resolution spatial-temporal and anatomical features for accurate HMIR.

View Article and Find Full Text PDF

Multistage fall detection framework via 3D pose sequences and TCN integration.

Sci Rep

July 2025

School of Basic Medical Sciences, Shandong Second Medical University, Weifang, 261053, Shandong, China.

An accurate yet computationally efficient fall detection system for sports activities is a significant and challenging task. To address this, we propose a novel multi-stage fall detection framework that integrates 3D pose sequences with temporal convolutional modeling. The framework first performs 2D human pose estimation to extract and enhance multi-scale spatial features.

View Article and Find Full Text PDF

Human Activity Recognition (HAR) plays a pivotal role in video understanding, with applications ranging from surveillance to virtual reality. Skeletal data has emerged as a robust modality for HAR, overcoming challenges such as noisy backgrounds and lighting variations. However, current Graph Convolutional Network (GCNN)-based methods for skeletal activity recognition face two key limitations: (1) they fail to capture dynamic changes in node affinities induced by movements, and (2) they overlook the interplay between spatial and temporal information critical for recognizing complex actions.

View Article and Find Full Text PDF

Robust indoor robot navigation typically demands either costly sensors or extensive training data. We propose a cost-effective RGB-D navigation pipeline that couples feature-based relative pose estimation with a lightweight multi-layer-perceptron (MLP) policy. RGB-D keyframes extracted from human-driven traversals form nodes of a topological map; edges are added when visual similarity and geometric-kinematic constraints are jointly satisfied.

View Article and Find Full Text PDF