Dual aggregation based joint-modal similarity hashing for cross-modal retrieval.

Neural Netw

Shanghai Maritime University, Shanghai, 201306, China. Electronic address:

Published: September 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Cross-modal hashing aims to leverage hashing functions to map multimodal data into a unified low-dimensional space, realizing efficient cross-modal retrieval. In particular, unsupervised cross-modal hashing methods attract significant attention for not needing external label information. However, in the field of unsupervised cross-modal hashing, there are several pressing issues to address: (1) how to facilitate semantic alignment between modalities, and (2) how to effectively capture the intrinsic relationships between data, thereby constructing a more reliable affinity matrix to assist in the learning of hash codes. In this paper, Dual Aggregation-Based Joint-modal Similarity Hashing (DAJSH) is proposed to overcome these challenges. To enhance cross-modal semantic alignment, we employ a Transformer encoder to fuse image and text features and introduce a contrastive loss to optimize cross-modal consistency. Additionally, for constructing a more reliable affinity matrix to assist hash code learning, we propose a dual-aggregation affinity matrix construction scheme. This scheme integrates intra-modal cosine similarity and Euclidean distance while incorporating cross-modal similarity, thereby maximally preserving cross-modal semantic information. Experimental results demonstrate that our method achieves performance improvements of 1.9 % ∼ 5.1 %, 0.9 % ∼ 5.8 % and 0.6 % ∼ 2.6 % over state-of-the-art approaches on the MIR Flickr, NUS-WIDE and MS COCO benchmark datasets, respectively.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2025.108069DOI Listing

Publication Analysis

Top Keywords

cross-modal hashing
12
affinity matrix
12
cross-modal
9
joint-modal similarity
8
similarity hashing
8
cross-modal retrieval
8
unsupervised cross-modal
8
semantic alignment
8
constructing reliable
8
reliable affinity
8

Similar Publications

Cross-modal hashing aims to leverage hashing functions to map multimodal data into a unified low-dimensional space, realizing efficient cross-modal retrieval. In particular, unsupervised cross-modal hashing methods attract significant attention for not needing external label information. However, in the field of unsupervised cross-modal hashing, there are several pressing issues to address: (1) how to facilitate semantic alignment between modalities, and (2) how to effectively capture the intrinsic relationships between data, thereby constructing a more reliable affinity matrix to assist in the learning of hash codes.

View Article and Find Full Text PDF

Due to the characteristics of low storage requirement and high retrieval efficiency, hashing-based retrieval has shown its great potential and has been widely applied for information retrieval. However, retrieval tasks in real-world applications are usually required to handle the data from various domains, leading to the unsatisfactory performances of existing hashing-based methods, as most of them assuming that the retrieval pool and the querying set are similar. Most of the existing works overlooked the self-representation that containing the modality-specific semantic information, in the cross-modal data.

View Article and Find Full Text PDF

Deformable image registration (DIR) is critical for accurate clinical diagnosis and effective treatment planning. However, patient movement, significant intensity differences, and large breathing deformations hinder accurate anatomical alignment in multi-modal image registration. These factors exacerbate the entanglement of anatomical and modality-specific style information, thereby severely limiting the performance of multi-modal registration.

View Article and Find Full Text PDF

Cross-Modal Hashing (CMH) has become a powerful technique for large-scale cross-modal retrieval, offering benefits like fast computation and efficient storage. However, most CMH models struggle to adapt to streaming multimodal data in real-time once deployed. Although recent online CMH studies have made progress in this area, they often overlook two key challenges: 1) learning effectively from streaming partial-modal multimodal data, and 2) avoiding the high costs associated with frequent hash function re-training and large-scale updates to database hash codes.

View Article and Find Full Text PDF

[Cross modal medical image online hash retrieval based on online semantic similarity].

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi

April 2025

School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China.

Online hashing methods are receiving increasing attention in cross modal medical image retrieval research. However, existing online methods often lack the learning ability to maintain semantic correlation between new and existing data. To this end, we proposed online semantic similarity cross-modal hashing (OSCMH) learning framework to incrementally learn compact binary hash codes of medical stream data.

View Article and Find Full Text PDF