98%
921
2 minutes
20
Recent progress in 3-D scene understanding has explored visual grounding [3D visual grounding (3DVG)] to localize a target object through a language description. However, existing methods only consider the dependency between the entire sentence and the target object, ignoring fine-grained relationships between contexts and nontarget ones. In this article, we extend 3DVG to a more fine-grained task, called 3D phrase-aware grounding (3DPAG). The 3DPAG task aims to localize the target objects in a 3-D scene by explicitly identifying all phrase-related objects and then conducting the reasoning according to contextual phrases. To tackle this problem, we manually labeled about 227 K phrase-level annotations using a self-developed platform, from 88 K sentences of widely used 3DVG datasets, i.e., Natural Reference in 3-D (Nr3D), Spatial Reference in 3-D (Sr3D), and ScanRefer. By tapping on our datasets, we can extend previous 3DVG methods to the fine-grained phrase-aware scenario. It is achieved through the proposed novel phrase-object alignment (POA) optimization and phrase-specific pretraining (PSP), boosting conventional 3DVG performance as well. Extensive results confirm significant improvements, i.e., previous state-of-the-art method achieves 3.9%, 3.5%, and 4.6% overall accuracy gains on Nr3D, Sr3D, and ScanRefer, respectively. Our datasets and platform are released in https://github.com/CurryYuan/PhraseRefer.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2025.3571959 | DOI Listing |
Adv Mater
September 2025
Key Laboratory of Brain-Like Neuromorphic Devices and Systems of Hebei Province, College of Electronic and Information Engineering, Hebei University, Baoding, 071002, China.
Neuromorphic Visual Devices hold considerable promise for integration into neuromorphic vision systems that combine sensing, memory, and computing. This potential arises from their synergistic benefits in optical signal detection and neuro-inspired computational processes. However, current devices face challenges such as insufficient light/dark resistance ratios, mismatched transient photo-response, and volatile retention characteristics, limiting their adaptability to complex artificial vision systems.
View Article and Find Full Text PDFAJO Int
October 2025
Department of Ophthalmology & Visual Sciences, University of Michigan Medical School, 1000 Wall Street, Ann Arbor, MI, 48105, USA.
Purpose: Michigan Screening and Intervention for Glaucoma and Eye Health through Telemedicine Program (MI-SIGHT) was developed to facilitate access to glaucoma and eye disease screening and improve attendance at recommended follow-up in underserved communities. MI-SIGHT offered free eye disease screenings, low-cost glasses and for those who screened positive for glaucoma, personalized education, and language-concordant coaching grounded in motivational interviewing. The primary aims of this study were 1) To explore barriers to eye care among Latine participants with limited English proficiency (LEP) who screened positive for glaucoma, 2) to understand whether and how the MI-SIGHT program facilitated access to care and 3) to understand participant experience in MI-SIGHT to inform the development of future interventions.
View Article and Find Full Text PDFAnal Chem
September 2025
State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong SAR 999077, China.
Mass spectrometry imaging (MSI) is a label-free technique that enables the visualization of the spatial distribution of thousands of ions within biosamples. Data denoising is the computational strategy aimed at enhancing the MSI data quality, providing an effective alternative to experimental methods. However, due to the complex noise pattern inherent in MSI data and the difficulty in obtaining ground truth from noise-free data, achieving reliable denoised images remains challenging.
View Article and Find Full Text PDFJ Speech Lang Hear Res
September 2025
Department of Communication Sciences & Disorders, Montclair State University, Bloomfield, NJ.
Purpose: Residual speech sound disorder (RSSD) is a high-prevalence condition that can limit children's academic and social participation, with negative consequences for overall well-being. Previous studies have described visual biofeedback as a promising option for RSSD, but results have been inconclusive due to study design limitations and small sample sizes.
Method: In a preregistered randomized controlled trial, 108 children aged 9-15 years with RSSD affecting American English /ɹ/ were randomly assigned to receive treatment incorporating visual biofeedback (subdivided into ultrasound and visual-acoustic types) or a comparison condition of motor-based treatment consistent with current best practices in speech therapy.
IEEE Trans Pattern Anal Mach Intell
September 2025
Generalized visual grounding tasks, including Generalized Referring Expression Comprehension (GREC) and Segmentation (GRES), extend the classical visual grounding paradigm by accommodating multi-target and non-target scenarios. Specifically, GREC focuses on accurately identifying all referential objects at the coarse bounding box level, while GRES aims for achieve fine-grained pixel-level perception. However, existing approaches typically treat these tasks independently, overlooking the benefits of jointly training GREC and GRES to ensure consistent multi-granularity predictions and streamline the overall process.
View Article and Find Full Text PDF