Publications by Xiaochun Cao

Publications by authors named "Xiaochun Cao"

Page 1 of 5

CAN: Cascade Augmentations Against Noise for Image Restoration.

Yanyang Yan , Siyuan Yao , Wenqi Ren , Rui Zhang , Qi Guo , Xiaochun Cao

IEEE Trans Image Process

January 2025

Image restoration aims to recover the latent clean image from a degraded counterpart. In general, the prevailing state-of-the-art image restoration methods concentrate on solving only a specific degradation type according to the task, e.g.

View Article and Find Full Text PDF

Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding.

Ke Zou , Yang Bai , Bo Liu , Yidi Chen , Zhihao Chen , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

August 2025

Medical phrase grounding is crucial for identifying relevant regions in medical images based on phrase queries, facilitating accurate image analysis and diagnosis. However, current methods rely on manual extraction of key phrases from medical reports, reducing efficiency and increasing the workload for clinicians. Additionally, the lack of model confidence estimation limits clinical trust and usability.

View Article and Find Full Text PDF

MAP: Masked Adversarial Perturbation for Boosting Black-Box Attack Transferability.

Kaige Li , Maoxian Wan , Qichuan Geng , Weimin Shi , Xiaochun Cao

IEEE Trans Image Process

January 2025

The transferability of adversarial examples is vital for black-box attacks, as it enables the adversary to deceive the target model without knowing its internals. Despite numerous methods focusing on transferability, they still struggle with transferring across models with distinct architectural components (e.g.

View Article and Find Full Text PDF

CAT+: Investigating and Enhancing Audio-visual Understanding in Large Language Models.

Qilang Ye , Zitong Yu , Rui Shao , Yawen Cui , Xiangui Kang , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

June 2025

Multimodal Large Language Models (MLLMs) have gained significant attention due to their rich internal implicit knowledge for cross-modal learning. Although advances in bringing audio-visuals into LLMs have resulted in boosts for a variety of Audio-Visual Question Answering (AVQA) tasks, they still face two crucial challenges: 1) audio-visual ambiguity, and 2) audio-visual hallucination. Existing MLLMs can respond to audio-visual content, yet sometimes fail to describe specific objects due to the ambiguity or hallucination of responses.

View Article and Find Full Text PDF

Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack.

Xiaojun Jia , Sensen Gao , Qing Guo , Simeng Qin , Ke Ma , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

June 2025

Vision-language pre-training (VLP) models excel at interpreting both images and text but remain vulnerable to multimodal adversarial examples (AEs). Advancing the generation of transferable AEs, which succeed across unseen models, is key to developing more robust and practical VLP models. Previous approaches augment image-text pairs to enhance diversity within the adversarial example generation process, aiming to improve transferability by expanding the contrast space of image-text features.

View Article and Find Full Text PDF

Reliable and Balanced Transfer Learning for Generalized Multimodal Face Anti-Spoofing.

Xun Lin , Ajian Liu , Zitong Yu , Rizhao Cai , Shuai Wang , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

September 2025

Face Anti-Spoofing (FAS) is essential for securing face recognition systems against presentation attacks. Recent advances in sensor technology and multimodal learning have enabled the development of multimodal FAS systems. However, existing methods often struggle to generalize to unseen attacks and diverse environments due to two key challenges: (1) Modality unreliability, where sensors such as depth and infrared suffer from severe domain shifts, impairing the reliability of cross-modal fusion; and (2) Modality imbalance, where over-reliance on a dominant modality weakens the model's robustness against attacks that affect other modalities.

View Article and Find Full Text PDF

UncTrack: Reliable Visual Object Tracking With Uncertainty-Aware Prototype Memory Network.

Siyuan Yao , Yang Guo , Yanyang Yan , Wenqi Ren , Xiaochun Cao

IEEE Trans Image Process

January 2025

Transformer-based trackers have achieved promising success and become the dominant tracking paradigm because of their accuracy and efficiency. Despite the substantial progress, most of the existing approaches handle object tracking as a deterministic coordinate regression problem, while the target localization uncertainty has been largely overlooked, which hampers trackers' ability to maintain reliable target state prediction in challenging scenarios. To address this issue, we propose UncTrack, a novel uncertainty-aware transformer-based tracker that predicts the target localization uncertainty and incorporates this uncertainty information for accurate target state inference.

View Article and Find Full Text PDF

Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection.

Ruoyu Chen , Hua Zhang , Jingzhi Li , Li Liu , Zhen Huang , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

August 2025

The objective of few-shot object detection (FSOD) is to detect novel objects with few training samples. The key challenge is constructing a generalized feature space for novel categories with limited data, leveraging the base category space to adapt the detection model. Most fine-tuning methods address this by pre-training on base categories and fine-tuning on novel ones.

View Article and Find Full Text PDF

LangLoc: Language-Driven Localization via Formatted Spatial Description Generation.

Weimin Shi , Changhao Chen , Kaige Li , Yuan Xiong , Xiaochun Cao

IEEE Trans Image Process

March 2025

Existing localization methods commonly employ vision to perceive scene and achieve localization in GNSS-denied areas, yet they often struggle in environments with complex lighting conditions, dynamic objects or privacy-preserving areas. Humans possess the ability to describe various scenes using natural language, effectively inferring their location by leveraging the rich semantic information in these descriptions. Harnessing language presents a potential solution for robust localization.

View Article and Find Full Text PDF

Optimal Graph Learning-Based Label Propagation for Cross-Domain Image Classification.

Wei Wang , Mengzhu Wang , Chao Huang , Cong Wang , Jie Mu , Xiaochun Cao

IEEE Trans Image Process

March 2025

Label propagation (LP) is a popular semi-supervised learning technique that propagates labels from a training dataset to a test one using a similarity graph, assuming that nearby samples should have similar labels. However, the recent cross-domain problem assumes that training (source domain) and test data sets (target domain) follow different distributions, which may unexpectedly degrade the performance of LP due to small similarity weights connecting the two domains. To address this problem, we propose optimal graph learning-based label propagation (OGL2P), which optimizes one cross-domain graph and two intra-domain graphs to connect the two domains and preserve domain-specific structures, respectively.

View Article and Find Full Text PDF

AUCPro: AUC-Oriented Provable Robustness Learning.

Shilong Bao , Qianqian Xu , Zhiyong Yang , Yuan He , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

June 2025

The current studies of provable robustness for deep neural networks (DNNs) usually assume that the class distribution is overall balanced. However, in real-world applications especially for safety-sensitive systems, the class distribution often exhibits a long-tailed property. It is well-known that the Area Under the ROC Curve (AUC) is a more proper metric for long-tailed learning problems.

View Article and Find Full Text PDF

MOVE: Effective and Harmless Ownership Verification via Embedded External Features.

Yiming Li , Linghui Zhu , Xiaojun Jia , Yang Bai , Yong Jiang , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

June 2025

Currently, deep neural networks (DNNs) are widely adopted in different applications. Despite its commercial values, training a well-performing DNN is resource-consuming. Accordingly, the well-trained model is valuable intellectual property for its owner.

View Article and Find Full Text PDF

Deep Label Propagation with Nuclear Norm Maximization for Visual Domain Adaptation.

Wei Wang , Hanyang Li , Cong Wang , Chao Huang , Zhengming Ding , Xiaochun Cao

IEEE Trans Image Process

January 2025

Domain adaptation aims to leverage abundant label information from a source domain to an unlabeled target domain with two different distributions. Existing methods usually rely on a classifier to generate high-quality pseudo-labels for the target domain, facilitating the learning of discriminative features. Label propagation (LP), as an effective classifier, propagates labels from the source domain to the target domain by designing a smooth function over a similarity graph, which represents structural relationships among data points in feature space.

View Article and Find Full Text PDF

Distributionally Location-Aware Transferable Adversarial Patches for Facial Images.

Xingxing Wei , Shouwei Ruan , Yinpeng Dong , Hang Su , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

April 2025

Adversarial patch is one of the important forms of performing adversarial attacks in the physical world. To improve the naturalness and aggressiveness of existing adversarial patches, location-aware patches are proposed, where the patch's location on the target object is integrated into the optimization process to perform attacks. Although it is effective, efficiently finding the optimal location for placing the patches is challenging, especially under the black-box attack settings.

View Article and Find Full Text PDF

Harnessing Multi-modal Large Language Models for Measuring and Interpreting Color Differences.

Zhihua Wang , Yu Long , Qiuping Jiang , Chao Huang , Xiaochun Cao

IEEE Trans Image Process

January 2025

The accurate measurement of perceptual color differences (CDs) between two images plays an important role in modern smartphone photography. Although traditional CD metrics provide numerical scores to quantify color variations, they often lack the ability to offer intuitive insights or explanations that reflect the factors behind these differences in a way that aligns with human perception and reasoning. Here, we present CD-Reasoning, an innovative method designed not merely to compute numerical CD scores but also to provide a detailed rationale for the observed CDs between images.

View Article and Find Full Text PDF

Invisible DNN Watermarking Against Model Extraction Attack.

Zuping Xi , Zuomin Qu , Wei Lu , Xiangyang Luo , Xiaochun Cao

IEEE Trans Cybern

December 2024

Deep neural network (DNN) models are widely used in various fields, such as pattern recognition and natural language processing, and provide considerable commercial value to their owners. Embedding a digital watermark in the model allows the legitimate owner to detect unauthorized use of the model. However, the existing DNN watermarking methods are vulnerable to model extraction attacks since the watermark task and the original model task are independent.

View Article and Find Full Text PDF

Promises and perils of using Transformer-based models for SE research.

Yan Xiao , Xinyue Zuo , Xiaoyue Lu , Jin Song Dong , Xiaochun Cao

Neural Netw

April 2025

Many Transformer-based pre-trained models for code have been developed and applied to code-related tasks. In this paper, we analyze 519 papers published on this topic during 2017-2023, examine the suitability of model architectures for different tasks, summarize their resource consumption, and look at the generalization ability of models on different datasets. We examine three representative pre-trained models for code: CodeBERT, CodeGPT, and CodeT5, and conduct experiments on the four topmost targeted software engineering tasks from the literature: Bug Fixing, Bug Detection, Code Summarization, and Code Search.

View Article and Find Full Text PDF

Explicitly-Decoupled Text Transfer With Minimized Background Reconstruction for Scene Text Editing.

Jianqun Zhou , Pengwen Dai , Yang Li , Manjiang Hu , Xiaochun Cao

IEEE Trans Image Process

October 2024

Scene text editing aims to replace the source text with the target text while preserving the original background. Its practical applications span various domains, such as data generation and privacy protection, highlighting its increasing importance in recent years. In this study, we propose a novel Scene Text Editing network with Explicitly-decoupled text transfer and Minimized background reconstruction, called STEEM.

View Article and Find Full Text PDF

Hierarchical Graph Interaction Transformer With Dynamic Token Clustering for Camouflaged Object Detection.

Siyuan Yao , Hao Sun , Tian-Zhu Xiang , Xiao Wang , Xiaochun Cao

IEEE Trans Image Process

October 2024

Camouflaged object detection (COD) aims to identify the objects that seamlessly blend into the surrounding backgrounds. Due to the intrinsic similarity between the camouflaged objects and the background region, it is extremely challenging to precisely distinguish the camouflaged objects by existing approaches. In this paper, we propose a hierarchical graph interaction network termed HGINet for camouflaged object detection, which is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.

View Article and Find Full Text PDF

MLFA: Toward Realistic Test Time Adaptive Object Detection by Multi-Level Feature Alignment.

Yabo Liu , Jinghua Wang , Chao Huang , Yiling Wu , Yong Xu , Xiaochun Cao

IEEE Trans Image Process

October 2024

Object detection methods have achieved remarkable performances when the training and testing data satisfy the assumption of i.i.d.

View Article and Find Full Text PDF

INformer: Inertial-Based Fusion Transformer for Camera Shake Deblurring.

Wenqi Ren , Linrui Wu , Yanyang Yan , Shengyao Xu , Feng Huang , Xiaochun Cao

IEEE Trans Image Process

October 2024

Inertial measurement units (IMU) in the capturing device can record the motion information of the device, with gyroscopes measuring angular velocity and accelerometers measuring acceleration. However, conventional deblurring methods seldom incorporate IMU data, and existing approaches that utilize IMU information often face challenges in fully leveraging this valuable data, resulting in noise issues from the sensors. To address these issues, in this paper, we propose a multi-stage deblurring network named INformer, which combines inertial information with the Transformer architecture.

View Article and Find Full Text PDF

Revitalizing Convolutional Network for Image Restoration.

Yuning Cui , Wenqi Ren , Xiaochun Cao , Alois Knoll

IEEE Trans Pattern Anal Mach Intell

December 2024

Image restoration aims to reconstruct a high-quality image from its corrupted version, playing essential roles in many scenarios. Recent years have witnessed a paradigm shift in image restoration from convolutional neural networks (CNNs) to Transformer-based models due to their powerful ability to model long-range pixel interactions. In this paper, we explore the potential of CNNs for image restoration and show that the proposed simple convolutional network architecture, termed ConvIR, can perform on par with or better than the Transformer counterparts.

View Article and Find Full Text PDF

Sequential Manipulation Against Rank Aggregation: Theory and Algorithm.

Ke Ma , Qianqian Xu , Jinshan Zeng , Wei Liu , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

December 2024

Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc. Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical.

View Article and Find Full Text PDF

Improved Diversity-Promoting Collaborative Metric Learning for Recommendation.

Shilong Bao , Qianqian Xu , Zhiyong Yang , Yuan He , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

December 2024

Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests.

View Article and Find Full Text PDF

Representing Noisy Image Without Denoising.

Shuren Qi , Yushu Zhang , Chao Wang , Tao Xiang , Xiaochun Cao

IEEE Trans Pattern Anal Mach Intell

October 2024

A long-standing topic in artificial intelligence is the effective recognition of patterns from noisy images. In this regard, the recent data-driven paradigm considers 1) improving the representation robustness by adding noisy samples in training phase (i.e.

View Article and Find Full Text PDF