Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Recent progress in 3-D scene understanding has explored visual grounding [3D visual grounding (3DVG)] to localize a target object through a language description. However, existing methods only consider the dependency between the entire sentence and the target object, ignoring fine-grained relationships between contexts and nontarget ones. In this article, we extend 3DVG to a more fine-grained task, called 3D phrase-aware grounding (3DPAG). The 3DPAG task aims to localize the target objects in a 3-D scene by explicitly identifying all phrase-related objects and then conducting the reasoning according to contextual phrases. To tackle this problem, we manually labeled about 227 K phrase-level annotations using a self-developed platform, from 88 K sentences of widely used 3DVG datasets, i.e., Natural Reference in 3-D (Nr3D), Spatial Reference in 3-D (Sr3D), and ScanRefer. By tapping on our datasets, we can extend previous 3DVG methods to the fine-grained phrase-aware scenario. It is achieved through the proposed novel phrase-object alignment (POA) optimization and phrase-specific pretraining (PSP), boosting conventional 3DVG performance as well. Extensive results confirm significant improvements, i.e., previous state-of-the-art method achieves 3.9%, 3.5%, and 4.6% overall accuracy gains on Nr3D, Sr3D, and ScanRefer, respectively. Our datasets and platform are released in https://github.com/CurryYuan/PhraseRefer.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2025.3571959DOI Listing

Publication Analysis

Top Keywords

visual grounding
12
3-d scene
8
localize target
8
target object
8
reference 3-d
8
sr3d scanrefer
8
fine-grained
4
fine-grained 3-d
4
3-d visual
4
grounding
4

Similar Publications

Optically Controlled Memristor Enabling Synergistic Sensing-Memory-Computing for Neuromorphic Vision Systems.

Adv Mater

September 2025

Key Laboratory of Brain-Like Neuromorphic Devices and Systems of Hebei Province, College of Electronic and Information Engineering, Hebei University, Baoding, 071002, China.

Neuromorphic Visual Devices hold considerable promise for integration into neuromorphic vision systems that combine sensing, memory, and computing. This potential arises from their synergistic benefits in optical signal detection and neuro-inspired computational processes. However, current devices face challenges such as insufficient light/dark resistance ratios, mismatched transient photo-response, and volatile retention characteristics, limiting their adaptability to complex artificial vision systems.

View Article and Find Full Text PDF

Purpose: Michigan Screening and Intervention for Glaucoma and Eye Health through Telemedicine Program (MI-SIGHT) was developed to facilitate access to glaucoma and eye disease screening and improve attendance at recommended follow-up in underserved communities. MI-SIGHT offered free eye disease screenings, low-cost glasses and for those who screened positive for glaucoma, personalized education, and language-concordant coaching grounded in motivational interviewing. The primary aims of this study were 1) To explore barriers to eye care among Latine participants with limited English proficiency (LEP) who screened positive for glaucoma, 2) to understand whether and how the MI-SIGHT program facilitated access to care and 3) to understand participant experience in MI-SIGHT to inform the development of future interventions.

View Article and Find Full Text PDF

Mass spectrometry imaging (MSI) is a label-free technique that enables the visualization of the spatial distribution of thousands of ions within biosamples. Data denoising is the computational strategy aimed at enhancing the MSI data quality, providing an effective alternative to experimental methods. However, due to the complex noise pattern inherent in MSI data and the difficulty in obtaining ground truth from noise-free data, achieving reliable denoised images remains challenging.

View Article and Find Full Text PDF

Purpose: Residual speech sound disorder (RSSD) is a high-prevalence condition that can limit children's academic and social participation, with negative consequences for overall well-being. Previous studies have described visual biofeedback as a promising option for RSSD, but results have been inconclusive due to study design limitations and small sample sizes.

Method: In a preregistered randomized controlled trial, 108 children aged 9-15 years with RSSD affecting American English /ɹ/ were randomly assigned to receive treatment incorporating visual biofeedback (subdivided into ultrasound and visual-acoustic types) or a comparison condition of motor-based treatment consistent with current best practices in speech therapy.

View Article and Find Full Text PDF

Generalized visual grounding tasks, including Generalized Referring Expression Comprehension (GREC) and Segmentation (GRES), extend the classical visual grounding paradigm by accommodating multi-target and non-target scenarios. Specifically, GREC focuses on accurately identifying all referential objects at the coarse bounding box level, while GRES aims for achieve fine-grained pixel-level perception. However, existing approaches typically treat these tasks independently, overlooking the benefits of jointly training GREC and GRES to ensure consistent multi-granularity predictions and streamline the overall process.

View Article and Find Full Text PDF