Publications by Jinhui Tang | LitMetric

Publications by authors named "Jinhui Tang"

Page 1 of 5

Causal Inference Hashing for Long-Tailed Image Retrieval.

Lu Jin , Zhengyun Lu , Zechao Li , Yonghua Pan , Longquan Dai , Jinhui Tang

IEEE Trans Image Process

January 2025

In hashing-based long-tailed image retrieval, the dominance of data-rich head classes often hinders the learning of effective hash codes for data-poor tail classes due to inherent long-tailed bias. Interestingly, this bias also contains valuable prior knowledge by revealing inter-class dependencies, which can be beneficial for hash learning. However, previous methods have not thoroughly analyzed this tangled negative and positive effects of long-tailed bias from a causal inference perspective.

View Article and Find Full Text PDF

Client-Unbiased Skeletal Action Recognizer in Federated Learning.

Xingyu Zhu , Xiangbo Shu , Jinhui Tang

IEEE Trans Image Process

January 2025

Edge sensor devices generate vast amounts of user data, but centralized processing poses privacy risks. Federated Learning addresses this by decentralizing training. However, applying Federated Learning directly to skeleton videos fails to preserve motion dynamics and suffers from client heterogeneity bias.

View Article and Find Full Text PDF

Inducible Mtld expression facilitated the introduction of the mannitol synthesis pathway in PCC 7942.

Jiahui Sun , Jinyu Cui , Xuejing Xu , Jinhui Tang , Huili Sun

Front Bioeng Biotechnol

March 2025

Mannitol is a valuable sugar alcohol, extensively used across various industries. Cyanobacteria show potential as future platforms for mannitol production, utilizing CO and solar energy directly. The proof-of-concept has been demonstrated by introducing a two-step pathway in cyanobacteria, converting fructose-6-phosphate to mannitol-1-phosphate and sequentially to mannitol.

View Article and Find Full Text PDF

Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks.

Yang Yang , Hongpeng Pan , Qing-Yuan Jiang , Yi Xu , Jinhui Tang

IEEE Trans Pattern Anal Mach Intell

June 2025

Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approaches often employ a modal-level control mechanism for adjusting the update of each modal parameter.

View Article and Find Full Text PDF

Learning Efficient Deep Discriminative Spatial and Temporal Networks for Video Deblurring.

Jinshan Pan , Long Sun , Boming Xu , Jiangxin Dong , Jinhui Tang

IEEE Trans Pattern Anal Mach Intell

July 2025

How to effectively explore spatial and temporal information is important for video deblurring. In contrast to existing methods that directly align adjacent frames without discrimination, we develop a deep discriminative spatial and temporal network to facilitate the spatial and temporal feature exploration for better video deblurring. We first develop a channel-wise gated dynamic network to adaptively explore the spatial information.

View Article and Find Full Text PDF

Towards Unified Deep Image Deraining: A Survey and a New Benchmark.

Xiang Chen , Jinshan Pan , Jiangxin Dong , Jinhui Tang

IEEE Trans Pattern Anal Mach Intell

July 2025

Recent years have witnessed significant advances in image deraining due to the progress of effective image priors and deep learning models. As each deraining approach has individual settings (e.g.

View Article and Find Full Text PDF

Multi-level semantic-aware transformer for image captioning.

Qin Xu , Shan Song , Qihang Wu , Bo Jiang , Bin Luo , Jinhui Tang

Neural Netw

July 2025

Effective visual representation is crucial for image captioning task. Among the existing methods, the grid-based visual encoding methods take fragmented features extracted from the entire image as input, lacking the fine-grained semantic information focused on salient objects. To address this issue, we propose an effective method, namely Multi-Level Semantic-Aware Transformer (MLSAT) for image captioning, to simultaneously focus on contextual details and high-level semantic information centered on salient objects.

View Article and Find Full Text PDF

Group Visual Relation Detection.

Fan Yu , Beibei Zhang , Tongwei Ren , Jiale Liu , Gangshan Wu , Jinhui Tang

IEEE Trans Image Process

March 2025

In this paper, we propose a novel visual relation detection task, named Group Visual Relation Detection (GVRD), for detecting visual relations whose subjects and/or objects are groups (GVRs), inspired by the observation that groups are common in image semantic representation. GVRD can be deemed as an evolution over the existing visual relation detection task that limits both subjects and objects of visual relations as individuals. We propose a Simultaneous Group Relation Prediction (SGRP) method that can simultaneously predict groups and predicates to address GVRD.

View Article and Find Full Text PDF

Merging Context Clustering With Visual State Space Models for Medical Image Segmentation.

Yun Zhu , Dong Zhang , Yi Lin , Yifei Feng , Jinhui Tang

IEEE Trans Med Imaging

May 2025

Medical image segmentation demands the aggregation of global and local feature representations, posing a challenge for current methodologies in handling both long-range and short-range feature interactions. Recently, vision mamba (ViM) models have emerged as promising solutions for addressing model complexities by excelling in long-range feature iterations with linear complexity. However, existing ViM approaches overlook the importance of preserving short-range local dependencies by directly flattening spatial tokens and are constrained by fixed scanning patterns that limit the capture of dynamic spatial context information.

View Article and Find Full Text PDF

Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection.

Hao Tang , Zechao Li , Dong Zhang , Shengfeng He , Jinhui Tang

IEEE Trans Pattern Anal Mach Intell

December 2024

RGB-Thermal Salient Object Detection (RGB-T SOD) aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. A key challenge lies in bridging the inherent disparities between RGB and Thermal modalities for effective saliency map prediction. Traditional encoder-decoder architectures, while designed for cross-modality feature interactions, may not have adequately considered the robustness against noise originating from defective modalities, thereby leading to suboptimal performance in complex scenarios.

View Article and Find Full Text PDF

Rebalanced Vision-Language Retrieval Considering Structure-Aware Distillation.

Yang Yang , Wenjuan Xi , Luping Zhou , Jinhui Tang

IEEE Trans Image Process

December 2024

Vision-language retrieval aims to search for similar instances in one modality based on queries from another modality. The primary objective is to learn cross-modal matching representations in a latent common space. Actually, the assumption underlying cross-modal matching is modal balance, where each modality contains sufficient information to represent the others.

View Article and Find Full Text PDF

A Review on Multi-Scale Toughening and Regulating Methods for Modern Concrete: From Toughening Theory to Practical Engineering Application.

Jinhui Tang , Chang Gao , Yi Li , Jie Xu , Jiale Huang

Research (Wash D C)

December 2024

Concrete is the most widely used and highest-volume basic material in the word today. Enhancing its toughness, including tensile strength and deformation resistance, can boost the structural load-bearing capacity, minimize cracking, and decrease the amount of concrete and steel required in engineering projects. These advancements are crucial for the safety, durability, energy efficiency, and emission reduction of structural engineering.

View Article and Find Full Text PDF

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset.

Jing Liu , Sihan Chen , Xingjian He , Longteng Guo , Xinxin Zhu , Jinhui Tang

IEEE Trans Pattern Anal Mach Intell

February 2025

In this paper, we propose the Vision-Audio-Language Omni-peRception pretraining model (VALOR) for multimodal understanding and generation. Unlike widely-studied vision-language pretraining models, VALOR jointly models the relationships among vision, audio, and language in an end-to-end manner. It consists of three separate encoders for single modality representations and a decoder for multimodal conditional text generation.

View Article and Find Full Text PDF

Multi-Granularity Part Sampling Attention for Fine-Grained Visual Classification.

Jiahui Wang , Qin Xu , Bo Jiang , Bin Luo , Jinhui Tang

IEEE Trans Image Process

August 2024

Fine-grained visual classification aims to classify similar sub-categories with the challenges of large variations within the same sub-category and high visual similarities between different sub-categories. Recently, methods that extract semantic parts of the discriminative regions have attracted increasing attention. However, most existing methods extract the part features via rectangular bounding boxes by object detection module or attention mechanism, which makes it difficult to capture the rich shape information of objects.

View Article and Find Full Text PDF

Biomimetic Mechanical Robust Cement-Resin Composites with Machine Learning-Assisted Gradient Hierarchical Structures.

Zhangyu Wu , Hao Pan , Peng Huang , Jinhui Tang , Wei She

Adv Mater

August 2024

Biological materials relying on hierarchically ordered architectures inspire the emergence of advanced composites with mutually exclusive mechanical properties, but the efficient topology optimization and large-scale manufacturing remain challenging. Herein, this work proposes a scalable bottom-up approach to fabricate a novel nacre-like cement-resin composite with gradient brick-and-mortar (BM) structure, and demonstrates a machine learning-assisted method to optimize the gradient structure. The fabricated gradient composite exhibits an extraordinary combination of high flexural strength, toughness, and impact resistance.

View Article and Find Full Text PDF

ADPS: Asymmetric Distillation Postsegmentation for Image Anomaly Detection.

Peng Xing , Hao Tang , Jinhui Tang , Zechao Li

IEEE Trans Neural Netw Learn Syst

April 2025

Knowledge distillation-based anomaly detection (KDAD) methods rely on the teacher-student paradigm to detect and segment anomalous regions by contrasting the unique features extracted by both networks. However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network's representations and 2) the features of the teacher network serve solely as a "reference standard" and are not fully leveraged. Toward this end, we depart from the established paradigm and instead propose an innovative approach called asymmetric distillation postsegmentation (ADPS).

View Article and Find Full Text PDF

Diffuse large B-cell lymphoma with contemporary involvement of central and peripheral nervous system: A case report and literature review.

Chuwen Tang , Peng Jiang , Jinhui Tang , Jinli Liao , Qingli Zeng

Heliyon

April 2024

Article Synopsis

A rare case of diffuse large B-cell lymphoma (DLBCL) involved both the peripheral and central nervous systems, confirmed through pathology with early symptoms of dyspnoea and hyperventilation.
The patient experienced fatigue, limb pain, and worsening breathlessness, leading to ventilator support after ineffective initial treatment for what was suspected to be Guillain-Barre syndrome.
Diagnosis was complicated due to non-specific early signs, but after chemotherapy, the patient improved briefly before sadly passing away from pneumonia, emphasizing the poor prognosis linked to nervous system involvement in DLBCL.

View Article and Find Full Text PDF

BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature Fusion for Deep Exemplar-Based Video Colorization.

Yixin Yang , Jinshan Pan , Zhongzheng Peng , Xiaoyu Du , Zhulin Tao , Jinhui Tang

IEEE Trans Pattern Anal Mach Intell

August 2024

How to effectively explore the colors of exemplars and propagate them to colorize each frame is vital for exemplar-based video colorization. In this article, we present a BiSTNet to explore colors of exemplars and utilize them to help video colorization by a bidirectional temporal feature fusion with the guidance of semantic image prior. We first establish the semantic correspondence between each frame and the exemplars in deep feature space to explore color information from exemplars.

View Article and Find Full Text PDF

Spatial Structure Constraints for Weakly Supervised Semantic Segmentation.

Tao Chen , Yazhou Yao , Xingguo Huang , Zechao Li , Liqiang Nie , Jinhui Tang

IEEE Trans Image Process

February 2024

The image-level label has prevailed in weakly supervised semantic segmentation tasks due to its easy availability. Since image-level labels can only indicate the existence or absence of specific categories of objects, visualization-based techniques have been widely adopted to provide object location clues. Considering class activation maps (CAMs) can only locate the most discriminative part of objects, recent approaches usually adopt an expansion strategy to enlarge the activation area for more integral object localization.

View Article and Find Full Text PDF

Semantic-Disentangled Transformer With Noun-Verb Embedding for Compositional Action Recognition.

Peng Huang , Rui Yan , Xiangbo Shu , Zhewei Tu , Guangzhao Dai , Jinhui Tang

IEEE Trans Image Process

December 2023

Recognizing actions performed on unseen objects, known as Compositional Action Recognition (CAR), has attracted increasing attention in recent years. The main challenge is to overcome the distribution shift of "action-objects" pairs between the training and testing sets. Previous works for CAR usually introduce extra information (e.

View Article and Find Full Text PDF

Context Disentangling and Prototype Inheriting for Robust Visual Grounding.

Wei Tang , Liang Li , Xuejing Liu , Lu Jin , Jinhui Tang

IEEE Trans Pattern Anal Mach Intell

May 2024

Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that have the same category as others. However, most previous methods underestimate such information.

View Article and Find Full Text PDF

Accurate and Efficient Stereo Matching via Attention Concatenation Volume.

Gangwei Xu , Yun Wang , Junda Cheng , Jinhui Tang , Xin Yang

IEEE Trans Pattern Anal Mach Intell

April 2024

Stereo matching is a fundamental building block for many vision and robotics applications. An informative and concise cost volume representation is vital for stereo matching of high accuracy and efficiency. In this article, we present a novel cost volume construction method, named attention concatenation volume (ACV), which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume.

View Article and Find Full Text PDF

New Insights into the Dissolution Kinetics of Alite Powder and the Effects of Organic Toughening Materials.

Jinhui Tang , Guangye Tu , Zongshuo Tao , Yu Yan

Materials (Basel)

November 2023

Alite dissolution plays a crucial role in cement hydration. However, quantitative investigations into alite powder dissolution are limited, especially regarding the influence of chemical admixtures. This study investigates the impact of particle size, temperature, saturation level, and mixing speed on alite powder dissolution rate, considering the real-time evolution of specific surface area during the alite powder dissolution process.

View Article and Find Full Text PDF

Relational Consistency Induced Self-Supervised Hashing for Image Retrieval.

Lu Jin , Zechao Li , Yonghua Pan , Jinhui Tang

IEEE Trans Neural Netw Learn Syst

January 2025

This article proposes a new hashing framework named relational consistency induced self-supervised hashing (RCSH) for large-scale image retrieval. To capture the potential semantic structure of data, RCSH explores the relational consistency between data samples in different spaces, which learns reliable data relationships in the latent feature space and then preserves the learned relationships in the Hamming space. The data relationships are uncovered by learning a set of prototypes that group similar data samples in the latent feature space.

View Article and Find Full Text PDF

CLIP-Driven Fine-Grained Text-Image Person Re-Identification.

Shuanglin Yan , Neng Dong , Liyan Zhang , Jinhui Tang

IEEE Trans Image Process

November 2023

Text-Image Person Re-identification (TIReID) aims to retrieve the image corresponding to the given text query from a pool of candidate images. Existing methods employ prior knowledge from single-modality pre-training to facilitate learning, but lack multi-modal correspondence information. Vision-Language Pre-training, such as CLIP (Contrastive Language-Image Pretraining), can address the limitation.

View Article and Find Full Text PDF