98%
921
2 minutes
20
Cross-modal hashing aims to leverage hashing functions to map multimodal data into a unified low-dimensional space, realizing efficient cross-modal retrieval. In particular, unsupervised cross-modal hashing methods attract significant attention for not needing external label information. However, in the field of unsupervised cross-modal hashing, there are several pressing issues to address: (1) how to facilitate semantic alignment between modalities, and (2) how to effectively capture the intrinsic relationships between data, thereby constructing a more reliable affinity matrix to assist in the learning of hash codes. In this paper, Dual Aggregation-Based Joint-modal Similarity Hashing (DAJSH) is proposed to overcome these challenges. To enhance cross-modal semantic alignment, we employ a Transformer encoder to fuse image and text features and introduce a contrastive loss to optimize cross-modal consistency. Additionally, for constructing a more reliable affinity matrix to assist hash code learning, we propose a dual-aggregation affinity matrix construction scheme. This scheme integrates intra-modal cosine similarity and Euclidean distance while incorporating cross-modal similarity, thereby maximally preserving cross-modal semantic information. Experimental results demonstrate that our method achieves performance improvements of 1.9 % ∼ 5.1 %, 0.9 % ∼ 5.8 % and 0.6 % ∼ 2.6 % over state-of-the-art approaches on the MIR Flickr, NUS-WIDE and MS COCO benchmark datasets, respectively.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2025.108069 | DOI Listing |
Neural Netw
September 2025
Shanghai Maritime University, Shanghai, 201306, China. Electronic address:
Cross-modal hashing aims to leverage hashing functions to map multimodal data into a unified low-dimensional space, realizing efficient cross-modal retrieval. In particular, unsupervised cross-modal hashing methods attract significant attention for not needing external label information. However, in the field of unsupervised cross-modal hashing, there are several pressing issues to address: (1) how to facilitate semantic alignment between modalities, and (2) how to effectively capture the intrinsic relationships between data, thereby constructing a more reliable affinity matrix to assist in the learning of hash codes.
View Article and Find Full Text PDFIEEE Trans Image Process
January 2025
Due to the characteristics of low storage requirement and high retrieval efficiency, hashing-based retrieval has shown its great potential and has been widely applied for information retrieval. However, retrieval tasks in real-world applications are usually required to handle the data from various domains, leading to the unsatisfactory performances of existing hashing-based methods, as most of them assuming that the retrieval pool and the querying set are similar. Most of the existing works overlooked the self-representation that containing the modality-specific semantic information, in the cross-modal data.
View Article and Find Full Text PDFIEEE J Biomed Health Inform
July 2025
Deformable image registration (DIR) is critical for accurate clinical diagnosis and effective treatment planning. However, patient movement, significant intensity differences, and large breathing deformations hinder accurate anatomical alignment in multi-modal image registration. These factors exacerbate the entanglement of anatomical and modality-specific style information, thereby severely limiting the performance of multi-modal registration.
View Article and Find Full Text PDFCross-Modal Hashing (CMH) has become a powerful technique for large-scale cross-modal retrieval, offering benefits like fast computation and efficient storage. However, most CMH models struggle to adapt to streaming multimodal data in real-time once deployed. Although recent online CMH studies have made progress in this area, they often overlook two key challenges: 1) learning effectively from streaming partial-modal multimodal data, and 2) avoiding the high costs associated with frequent hash function re-training and large-scale updates to database hash codes.
View Article and Find Full Text PDFSheng Wu Yi Xue Gong Cheng Xue Za Zhi
April 2025
School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China.
Online hashing methods are receiving increasing attention in cross modal medical image retrieval research. However, existing online methods often lack the learning ability to maintain semantic correlation between new and existing data. To this end, we proposed online semantic similarity cross-modal hashing (OSCMH) learning framework to incrementally learn compact binary hash codes of medical stream data.
View Article and Find Full Text PDF