98%
921
2 minutes
20
Lipreading, the task of recognizing speech based on visual cues from lip movements, typically requires a substantial amount of labeled training data to achieve optimal performance. However, this task is highly sensitive to variations among speakers, often resulting in significantly degraded recognition accuracy for unseen speakers. In this work, we introduce a novel framework, multi-scale masked temporal fusion with Dirichlet uncertainty estimation (MsDUNE), designed to mitigate the feature distribution disparities across different speakers. The proposed framework leverages a Dirichlet distribution to parameterize the latent space of a single feature branch, which is then quantitatively assessed through evidence and belief masses. Furthermore, MsDUNE calibrates multi-scale feature distributions by accounting for the mutual influence of feature beliefs between two branches, thereby enhancing the generalization capability of the lipreading model. We validate our approach through extensive experiments conducted on two widely recognized benchmarks, LRW-ID and AV Letters, as well as a self-collected lipreading dataset, CVSR100. The experimental results highlight the state-of-the-art performance of our method, particularly in scenarios involving unseen or overlapping speakers.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2025.107783 | DOI Listing |
IEEE Trans Pattern Anal Mach Intell
September 2025
Camouflaged Object Segmentation (COS) faces significant challenges due to the scarcity of annotated data, where meticulous pixel-level annotation is both labor-intensive and costly, primarily due to the intricate object-background boundaries. Addressing the core question, "Can COS be effectively achieved in a zero-shot manner without manual annotations for any camouflaged object?", we propose an affirmative solution. We analyze the learned attention patterns for camouflaged objects and introduce a robust zero-shot COS framework.
View Article and Find Full Text PDFAcad Radiol
September 2025
In-Service Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan (H.-C.K., S.-J.P.); Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei Medical University, Taipei, Taiwan (S.-J.P.). Electronic address: sjpeng2
Rationale And Objectives: Computed tomography (CT) remains the primary modality for assessing renal tumors; however, tumor identification and segmentation rely heavily on manual interpretation by clinicians, which is time-consuming and subject to inter-observer variability. The heterogeneity of tumor appearance and indistinct margins further complicate accurate delineation, impacting histopathological classification, treatment planning, and prognostic assessment. There is a pressing clinical need for an automated segmentation tool to enhance diagnostic workflows and support clinical decision-making with results that are reliable, accurate, and reproducible.
View Article and Find Full Text PDFInt J Comput Assist Radiol Surg
September 2025
Department of Oncology, University of Cambridge, Cambridge, United Kingdom.
Purpose: : High-grade serous ovarian carcinoma (HGSOC) is characterised by significant spatial and temporal heterogeneity, often presenting at an advanced metastatic stage. One of the most common treatment approaches involves neoadjuvant chemotherapy (NACT), followed by surgery. However, the multi-scale complexity of HGSOC poses a major challenge in evaluating response to NACT.
View Article and Find Full Text PDFSensors (Basel)
August 2025
College of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an 710021, China.
In GNSS-deprived settings, such as indoor and underground environments, research on simultaneous localization and mapping (SLAM) technology remains a focal point. Addressing the influence of dynamic variables on positional precision and constructing a persistent map comprising solely static elements are pivotal objectives in visual SLAM for dynamic scenes. This paper introduces optical flow motion segmentation-based SLAM(OS-SLAM), a dynamic environment SLAM system that incorporates optical flow motion segmentation for enhanced robustness.
View Article and Find Full Text PDFSensors (Basel)
August 2025
College of Electronics and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.
To address the challenging problem of multi-scale inshore-offshore ship detection in synthetic aperture radar (SAR) remote sensing images, we propose a novel deep learning-based automatic ship detection method within the framework of compositional learning. The proposed method is supported by three pillars: context-guided region proposal, prototype-based model-pretraining, and multi-model ensemble learning. To reduce the false alarms induced by the discrete ground clutters, the prior knowledge of the harbour's layout is exploited to generate land masks for terrain delimitation.
View Article and Find Full Text PDF