Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Accurately modeling detailed interactions between human/hand and object is an appealing yet challenging task. Current multi-view capture systems are only capable of reconstructing multiple subjects into a single, unified mesh, which fails to model the states of each instance individually during interactions. To address this, previous methods use template-based representations to track human/hand and object. However, the quality of the reconstructions is limited by the descriptive capabilities of the templates so these methods inherently struggle with geometric details, pressing deformations and invisible contact surfaces. In this work, we propose an end-to-end Instance-aware Human-Object Interactions recovery (Ins-HOI) framework by introducing an instance-level occupancy field representation. However, the real-captured data is presented as a holistic mesh, unable to provide instance-level supervision. To address this, we further propose a complementary training strategy that leverages synthetic data to introduce instance-level shape priors, enabling the disentanglement of occupancy fields for different instances. Specifically, synthetic data, created by randomly combining individual scans of humans/hands and objects, guides the network to learn a coarse prior of instances. Meanwhile, real-captured data helps in learning the overall geometry and restricting interpenetration in contact areas. As demonstrated in experiments, our method Ins-HOI supports instance-level reconstruction and provides reasonable and realistic invisible contact surfaces even in cases of extremely close interaction. To facilitate research on this task, we collect a large-scale, high-fidelity 3D scan dataset, including 5.2k high-quality scans with real-world human-chair and hand-object interactions. The code and data will be public for research purposes.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2025.3588268DOI Listing

Publication Analysis

Top Keywords

human-object interactions
8
interactions recovery
8
human/hand object
8
invisible contact
8
contact surfaces
8
real-captured data
8
synthetic data
8
interactions
5
data
5
ins-hoi instance
4

Similar Publications

To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved.

View Article and Find Full Text PDF

Introduction: Surface neatness is a fundamental yet underexplored determinant of the aesthetic evaluation of everyday objects. While prior research has typically examined individual surface features - such as gloss, shine, dirt, or scratches - in isolation, the holistic impact of surface neatness has received little systematic attention.

Methods: In this study, participants viewed images of objects from five categories (household items, tools, personal use items, stationery, and kitchen utensils), each presented in three surface conditions: untidy (displaying mechanical and hygienic defects), neutral (without visible defects), and neat (exhibiting gloss and cleanliness).

View Article and Find Full Text PDF

human-object interaction (HOI) detection tackles the problem of joint localization and classification of HOIs. Recent HOI detection methods are mainly based on transformer networks, where the explicit priors at the object level (e.g.

View Article and Find Full Text PDF

Human Activity Recognition (HAR) systems aim to understand human behavior and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. Each modality provides unique and complementary information suited to different application scenarios.

View Article and Find Full Text PDF

Accurately modeling detailed interactions between human/hand and object is an appealing yet challenging task. Current multi-view capture systems are only capable of reconstructing multiple subjects into a single, unified mesh, which fails to model the states of each instance individually during interactions. To address this, previous methods use template-based representations to track human/hand and object.

View Article and Find Full Text PDF