98%
921
2 minutes
20
Current prevailing vision-language models have achieved remarkable progress in 3D scene understanding while trained in the closed-set setting and with full labels. The major bottleneck for the current robot 3D scene recognition approach for robotic applications is that these models do not have the capacity to recognize any unseen novel classes beyond the training categories in diverse real-world robot applications such as robot manipulation as well as robot navigation. In the meantime, current state-of-the-art 3D scene understanding approaches primarily require a large number of high-quality labels to train neural networks, which merely perform well in a fully supervised manner. Therefore, we are in urgent need of a framework that can simultaneously be applicable to both 3D point cloud segmentation and detection, particularly in the circumstances where the labels are rather scarce. This work presents a generalized and straightforward framework for dealing with 3D scene understanding when the labeled scenes are quite limited. To extract knowledge for novel categories from the pre-trained vision-language models, we propose a hierarchical feature-aligned pre-training and knowledge distillation strategy to extract and distill meaningful information from large-scale vision-language models, which helps benefit the open-vocabulary scene understanding tasks. To leverage the boundary information, we propose a novel energy-based loss with boundary awareness benefiting from the region-level boundary predictions. To encourage latent instance discrimination and to guarantee efficiency, we propose the unsupervised region-level semantic contrastive learning scheme for point clouds, using confident predictions of the neural network to discriminate the intermediate feature embeddings at multiple stages. In the limited reconstruction case, our proposed approach, termed WS3D++, ranks 1st on the large-scale ScanNet benchmark on both the task of semantic segmentation and instance segmentation. Also, our proposed WS3D++ achieves state-of-the-art data-efficient learning performance on the other large-scale real-scene indoor and outdoor datasets S3DIS and SemanticKITTI. Extensive experiments with both indoor and outdoor scenes demonstrated the effectiveness of our approach in both data-efficient learning and open-world few-shot learning.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2025.3566593 | DOI Listing |
Trauma Violence Abuse
September 2025
Ghent University, Ghent, Belgium.
This study presents a scoping review and crime script analysis of the modus operandi of online romance scammers. Online romance scams are a form of fraud in which perpetrators fabricate online romantic relationships with victims, aiming to emotionally manipulate and, ultimately, financially exploit them. The review aims to synthesize existing research on how scammers operate and to develop a comprehensive crime script that can guide prevention and policy efforts.
View Article and Find Full Text PDFAcad Psychiatry
September 2025
University of Toronto, Toronto, Ontario, Canada.
Objective: A deep understanding of patients in psychiatry requires an ability to appreciate and describe the biopsychosocial determinants of health. Great works of theatre portray a nuanced observation of the human condition, but these have not been formally evaluated in psychiatric literature as teaching tools. The purpose of this study was to explore Shakespeare's King Lear as an educational intervention in supporting formulation skills training in geriatric psychiatry residency.
View Article and Find Full Text PDFIperception
September 2025
Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, The Netherlands.
Some occluders evoke the compelling impression that the space behind them is empty. Stage magicians use this illusion of absence to produce objects out of thin air. The generic view principle predicts that the illusion of absence should increase with decreasing occluder size.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
September 2025
Considering how to make the model accurately understand and follow natural language instructions and perform actions consistent with world knowledge is a key challenge in robot manipulation. This mainly includes human fuzzy instruction reasoning and the following of physical knowledge. Therefore, the embodied intelligence agent must have the ability to model world knowledge from training data.
View Article and Find Full Text PDFIET Syst Biol
September 2025
School of Computer and Information Techonology, Xinyang Normal University, Xinyang, China.
Accurate polyp segmentation is crucial for computer-aided diagnosis and early detection of colorectal cancer. Whereas feature pyramid network (FPN) and its variants are widely used in polyp segmentation, inherent limitations existing in FPN include: (1) repeated upsampling degrades fine details, reducing small polyp segmentation accuracy and (2) naive feature fusion (e.g.
View Article and Find Full Text PDF