Cross-modal Guided Visual Representation Learning for Social Image Retrieval.

Ziyu Guan , Wanqing Zhao , Hongmin Liu , Yuta Nakashima , Noboru Babaguchi , Xiaofei He

IEEE Trans Pattern Anal Mach Intell

Published: December 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Social images are often associated with rich but noisy tags from community contributions. Although social tags can potentially provide valuable semantic training information for image retrieval, existing studies all fail to effectively filter noises by exploiting the cross-modal correlation between image content and tags. The current cross-modal vision-and-language representation learning methods, which selectively attend to the relevant parts of the image and text, show a promising direction. However, they are not suitable for social image retrieval since: (1) they deal with natural text sequences where the relationships between words can be easily captured by language models for cross-modal relevance estimation, while the tags are isolated and noisy; (2) they take (image, text) pair as input, and consequently cannot be employed directly for unimodal social image retrieval. This paper tackles the challenge of utilizing cross-modal interactions to learn precise representations for unimodal retrieval. The proposed framework, dubbed CGVR (Cross-modal Guided Visual Representation), extracts accurate semantic representations of images from noisy tags and transfers this ability to image-only hashing subnetwork by a carefully designed training scheme. To well capture correlated semantics and filter noises, it embeds a priori common-sense relationship among tags into attention computation for joint awareness of textual and visual context. Experiments show that CGVR achieves approximately 8.82 and 5.45 points improvement in MAP over the state-of-the-art on two widely used social image benchmarks. CGVR can serve as a new baseline for the image retrieval community. The code is provided at https://github.com/zhaowanqing/CGVR.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2024.3519112	DOI Listing

Publication Analysis

Top Keywords

image retrieval

social image

image

cross-modal guided

guided visual

visual representation

representation learning

noisy tags

filter noises

image text

Similar Publications

Current Trends and Future Directions of Statistical Methods in Medical Research: A Scientometric Analysis.

J Eval Clin Pract

September 2025

Department of Orthopedics and Traumatology, Medical Faculty, University of Health Sciences, Antalya, Turkey.

Fatma Yardibi , Chaomei Chen , Cagdas Hakan Aladag , Ozkan Kose

Aims And Objective: The field of medical statistics has experienced significant advancements driven by integrating innovative statistical methodologies. This study aims to conduct a comprehensive analysis to explore current trends, influential research areas, and future directions in medical statistics.

Methods: This paper maps the evolution of statistical methods used in medical research based on 4,919 relevant publications retrieved from the Web of Science.

View Article and Find Full Text PDF

Similar Publications

Complex partial seizures with orofacial involvement in 35 cats: MRI changes, cerebrospinal fluid analysis, voltage-gated potassium channel antibodies and survival.

J Feline Med Surg

September 2025

Department for Small Animals, Veterinary Faculty, Leipzig University, Leipzig, Germany.

Thomas Flegel , Kaspar Matiasek , Miriam Füsser , Lisa F Becker , Irene C Böttcher

ObjectivesThe objective of this study was to evaluate the occurrence of voltage-gated potassium channel (VGKC) antibodies and the pattern of MRI changes in cats with complex partial seizures with orofacial involvement (CPSOFI), as well as to investigate whether there are factors influencing survival that could be used as prognostic markers in those cats.MethodsCats with CPSOFI were identified retrospectively. The following data were retrieved from the hospital database: signalment, age at first seizure and presentation, the presence of antibodies against VGKC (leucine-rich glioma inactivating factor 1 (LGI1), contactin-associated protein 2 (CASPR2)) and cerebrospinal fluid (CSF) analysis findings.

View Article and Find Full Text PDF

Similar Publications

Implementation of Fully Automated AI-Integrated System for Body Composition Assessment on Computed Tomography for Opportunistic Sarcopenia Screening: Multicenter Prospective Study.

JMIR Form Res

September 2025

Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Asan Medical Center, Seoul, 05505, Republic of Korea.

Bushra Urooj , Yousun Ko , Seongwon Na , In-One Kim , Eun-Hee Lee

Background: Opportunistic computed tomography (CT) screening for the evaluation of sarcopenia and myosteatosis has been gaining emphasis. A fully automated artificial intelligence (AI)-integrated system for body composition assessment on CT scans is a prerequisite for effective opportunistic screening. However, no study has evaluated the implementation of fully automated AI systems for opportunistic screening in real-world clinical practice for routine health check-ups.

View Article and Find Full Text PDF

Similar Publications

Diagnosis methods for pancreatic cancer with the technique of deep learning: a review and a meta-analysis.

Front Oncol

August 2025

Department of Hepatobiliary Surgery, The Second Hospital of Hebei Medical University, Shijiazhuang, Hebei, China.

Yuanbo Bi , Dongrui Li , Ruochen Pang , Chengxv Du , Da Li

Background: Early diagnosis can significantly improve survival rate of Pancreatic ductal adenocarcinoma (PDAC), but due to the insidious and non-specific early symptoms, most patients are not suitable for surgery when diagnosed. Traditional imaging techniques and an increasing number of non-imaging diagnostic methods have been used for the early diagnosis of pancreatic cancer (PC) through deep learning (DL).

Objective: This review summarizes diagnosis methods for pancreatic cancer with the technique of deep learning and looks forward to the future development directions of deep learning for early diagnosis of pancreatic cancer.

View Article and Find Full Text PDF

Similar Publications

Decoding of columnar-level organization across cortical depth using BOLD- and CBV-fMRI at 7 T.

bioRxiv

August 2025

Daniel Haenelt , Denis Chaimow , Marianna Elisa Schmidt , Shahin Nasr , Nikolaus Weiskopf

Multivariate pattern analysis (MVPA) methods are a versatile tool to retrieve information from neurophysiological data obtained with functional magnetic resonance imaging (fMRI) techniques. Since fMRI is based on measuring the hemodynamic response following neural activation, the spatial specificity of the fMRI signal is inherently limited by contributions of macrovascular compartments that drain the signal from the actual location of neural activation, making it challenging to image cortical structures at the spatial scale of cortical columns and layers. By relying on information from multiple voxels, MVPA has shown promising results in retrieving information encoded in fine-grained spatial patterns.

View Article and Find Full Text PDF

Similar Publications