98%
921
2 minutes
20
Accurately modeling detailed interactions between human/hand and object is an appealing yet challenging task. Current multi-view capture systems are only capable of reconstructing multiple subjects into a single, unified mesh, which fails to model the states of each instance individually during interactions. To address this, previous methods use template-based representations to track human/hand and object. However, the quality of the reconstructions is limited by the descriptive capabilities of the templates so these methods inherently struggle with geometric details, pressing deformations and invisible contact surfaces. In this work, we propose an end-to-end Instance-aware Human-Object Interactions recovery (Ins-HOI) framework by introducing an instance-level occupancy field representation. However, the real-captured data is presented as a holistic mesh, unable to provide instance-level supervision. To address this, we further propose a complementary training strategy that leverages synthetic data to introduce instance-level shape priors, enabling the disentanglement of occupancy fields for different instances. Specifically, synthetic data, created by randomly combining individual scans of humans/hands and objects, guides the network to learn a coarse prior of instances. Meanwhile, real-captured data helps in learning the overall geometry and restricting interpenetration in contact areas. As demonstrated in experiments, our method Ins-HOI supports instance-level reconstruction and provides reasonable and realistic invisible contact surfaces even in cases of extremely close interaction. To facilitate research on this task, we collect a large-scale, high-fidelity 3D scan dataset, including 5.2k high-quality scans with real-world human-chair and hand-object interactions. The code and data will be public for research purposes.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2025.3588268 | DOI Listing |
IEEE Trans Neural Netw Learn Syst
July 2025
To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved.
View Article and Find Full Text PDFFront Psychol
July 2025
Institute for Cognitive Neuroscience, HSE University, Moscow, Russia.
Introduction: Surface neatness is a fundamental yet underexplored determinant of the aesthetic evaluation of everyday objects. While prior research has typically examined individual surface features - such as gloss, shine, dirt, or scratches - in isolation, the holistic impact of surface neatness has received little systematic attention.
Methods: In this study, participants viewed images of objects from five categories (household items, tools, personal use items, stationery, and kitchen utensils), each presented in three surface conditions: untidy (displaying mechanical and hygienic defects), neutral (without visible defects), and neat (exhibiting gloss and cleanliness).
IEEE Trans Cybern
September 2025
human-object interaction (HOI) detection tackles the problem of joint localization and classification of HOIs. Recent HOI detection methods are mainly based on transformer networks, where the explicit priors at the object level (e.g.
View Article and Find Full Text PDFSensors (Basel)
June 2025
School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Japan.
Human Activity Recognition (HAR) systems aim to understand human behavior and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. Each modality provides unique and complementary information suited to different application scenarios.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
July 2025
Accurately modeling detailed interactions between human/hand and object is an appealing yet challenging task. Current multi-view capture systems are only capable of reconstructing multiple subjects into a single, unified mesh, which fails to model the states of each instance individually during interactions. To address this, previous methods use template-based representations to track human/hand and object.
View Article and Find Full Text PDF