Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Social images are often associated with rich but noisy tags from community contributions. Although social tags can potentially provide valuable semantic training information for image retrieval, existing studies all fail to effectively filter noises by exploiting the cross-modal correlation between image content and tags. The current cross-modal vision-and-language representation learning methods, which selectively attend to the relevant parts of the image and text, show a promising direction. However, they are not suitable for social image retrieval since: (1) they deal with natural text sequences where the relationships between words can be easily captured by language models for cross-modal relevance estimation, while the tags are isolated and noisy; (2) they take (image, text) pair as input, and consequently cannot be employed directly for unimodal social image retrieval. This paper tackles the challenge of utilizing cross-modal interactions to learn precise representations for unimodal retrieval. The proposed framework, dubbed CGVR (Cross-modal Guided Visual Representation), extracts accurate semantic representations of images from noisy tags and transfers this ability to image-only hashing subnetwork by a carefully designed training scheme. To well capture correlated semantics and filter noises, it embeds a priori common-sense relationship among tags into attention computation for joint awareness of textual and visual context. Experiments show that CGVR achieves approximately 8.82 and 5.45 points improvement in MAP over the state-of-the-art on two widely used social image benchmarks. CGVR can serve as a new baseline for the image retrieval community. The code is provided at https://github.com/zhaowanqing/CGVR.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2024.3519112DOI Listing

Publication Analysis

Top Keywords

image retrieval
20
social image
16
image
9
cross-modal guided
8
guided visual
8
visual representation
8
representation learning
8
noisy tags
8
filter noises
8
image text
8

Similar Publications

Aims And Objective: The field of medical statistics has experienced significant advancements driven by integrating innovative statistical methodologies. This study aims to conduct a comprehensive analysis to explore current trends, influential research areas, and future directions in medical statistics.

Methods: This paper maps the evolution of statistical methods used in medical research based on 4,919 relevant publications retrieved from the Web of Science.

View Article and Find Full Text PDF

ObjectivesThe objective of this study was to evaluate the occurrence of voltage-gated potassium channel (VGKC) antibodies and the pattern of MRI changes in cats with complex partial seizures with orofacial involvement (CPSOFI), as well as to investigate whether there are factors influencing survival that could be used as prognostic markers in those cats.MethodsCats with CPSOFI were identified retrospectively. The following data were retrieved from the hospital database: signalment, age at first seizure and presentation, the presence of antibodies against VGKC (leucine-rich glioma inactivating factor 1 (LGI1), contactin-associated protein 2 (CASPR2)) and cerebrospinal fluid (CSF) analysis findings.

View Article and Find Full Text PDF

Implementation of Fully Automated AI-Integrated System for Body Composition Assessment on Computed Tomography for Opportunistic Sarcopenia Screening: Multicenter Prospective Study.

JMIR Form Res

September 2025

Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Asan Medical Center, Seoul, 05505, Republic of Korea.

Background: Opportunistic computed tomography (CT) screening for the evaluation of sarcopenia and myosteatosis has been gaining emphasis. A fully automated artificial intelligence (AI)-integrated system for body composition assessment on CT scans is a prerequisite for effective opportunistic screening. However, no study has evaluated the implementation of fully automated AI systems for opportunistic screening in real-world clinical practice for routine health check-ups.

View Article and Find Full Text PDF

Background: Early diagnosis can significantly improve survival rate of Pancreatic ductal adenocarcinoma (PDAC), but due to the insidious and non-specific early symptoms, most patients are not suitable for surgery when diagnosed. Traditional imaging techniques and an increasing number of non-imaging diagnostic methods have been used for the early diagnosis of pancreatic cancer (PC) through deep learning (DL).

Objective: This review summarizes diagnosis methods for pancreatic cancer with the technique of deep learning and looks forward to the future development directions of deep learning for early diagnosis of pancreatic cancer.

View Article and Find Full Text PDF

Multivariate pattern analysis (MVPA) methods are a versatile tool to retrieve information from neurophysiological data obtained with functional magnetic resonance imaging (fMRI) techniques. Since fMRI is based on measuring the hemodynamic response following neural activation, the spatial specificity of the fMRI signal is inherently limited by contributions of macrovascular compartments that drain the signal from the actual location of neural activation, making it challenging to image cortical structures at the spatial scale of cortical columns and layers. By relying on information from multiple voxels, MVPA has shown promising results in retrieving information encoded in fine-grained spatial patterns.

View Article and Find Full Text PDF