Publications by Lizhuang Ma | LitMetric

Publications by authors named "Lizhuang Ma"

Page 1 of 2

M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising.

Chengjie Wang , Haokun Zhu , Jinlong Peng , Yue Wang , Ran Yi , Lizhuang Ma

IEEE Trans Pattern Anal Mach Intell

July 2025

Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR framework to leveraging strong multi-modal discriminative capabilities of CLIP.

View Article and Find Full Text PDF

Diverse Target and Contribution Scheduling for Domain Generalization.

Shaocong Long , Qianyu Zhou , Chenhao Ying , Lizhuang Ma , Yuan Luo

IEEE Trans Image Process

January 2025

Generalization under distribution shifts has been a great challenge in computer vision. The prevailing practice of directly employing the one-hot labels as the training targets in domain generalization (DG) can lead to gradient conflicts, making it insufficient for capturing the intrinsic class characteristics and hard to increase the intra-class variation. Besides, existing methods in DG mostly overlook the distinct contributions of source (seen) domains, resulting in uneven learning from these domains.

View Article and Find Full Text PDF

MOL: Joint Estimation of Micro-Expression, Optical Flow, and Landmark via Transformer-Graph-Style Convolution.

Zhiwen Shao , Yifan Cheng , Feiran Li , Yong Zhou , Xuequan Lu , Lizhuang Ma

IEEE Trans Pattern Anal Mach Intell

June 2025

Facial micro-expression recognition (MER) is a challenging problem, due to transient and subtle micro-expression (ME) actions. Most existing methods depend on hand-crafted features, key frames like onset, apex, and offset frames, or deep networks limited by small-scale and low-diversity datasets. In this paper, we propose an end-to-end micro-action-aware deep learning framework with advantages from transformer, graph convolution, and vanilla convolution.

View Article and Find Full Text PDF

DCS-RISR: Dynamic channel splitting for efficient real-world image super-resolution.

Junbo Qiao , Shaohui Lin , Yulun Zhang , Wei Li , Jie Hu , Lizhuang Ma

Neural Netw

April 2025

Real-world image super-resolution (RISR) has received increased focus for improving the quality of SR images under unknown complex degradation. Existing methods rely on the heavy SR models to enhance low-resolution (LR) images of different degradation levels, which significantly restricts their practical deployments on resource-limited devices. In this paper, we propose a novel Dynamic Channel Splitting scheme for efficient Real-world Image Super-Resolution, termed DCS-RISR.

View Article and Find Full Text PDF

PIG: Prompt Images Guidance for Night-Time Scene Parsing.

Zhifeng Xie , Rui Qiu , Sen Wang , Xin Tan , Yuan Xie , Lizhuang Ma

IEEE Trans Image Process

June 2024

Night-time scene parsing aims to extract pixel-level semantic information in night images, aiding downstream tasks in understanding scene object distribution. Due to limited labeled night image datasets, unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes. UDA typically relies on paired day-night image pairs to guide adaptation, but this approach hampers dataset construction and restricts generalization across night scenes in different datasets.

View Article and Find Full Text PDF

CloudMix: Dual Mixup Consistency for Unpaired Point Cloud Completion.

Fengqi Liu , Jingyu Gong , Qianyu Zhou , Xuequan Lu , Ran Yi , Lizhuang Ma

IEEE Trans Vis Comput Graph

April 2025

Due to the unsatisfactory performance of supervised methods on unpaired real-world scans, point cloud completion via cross-domain adaptation has recently drawn growing attention. Nevertheless, previous approaches only focus on alleviating the distribution shift through domain alignment, resulting in massive information loss of real-world domain data. To tackle this issue, we propose a dual mixup-induced consistency regularization to integrate both source and target domain to improve robustness and generalization capability.

View Article and Find Full Text PDF

Variational Distillation for Multi-View Learning.

Xudong Tian , Zhizhong Zhang , Cong Wang , Wensheng Zhang , Yanyun Qu , Lizhuang Ma

IEEE Trans Pattern Anal Mach Intell

July 2024

Information Bottleneck (IB) provides an information-theoretic principle for multi-view learning by revealing the various components contained in each viewpoint. This highlights the necessity to capture their distinct roles to achieve view-invariance and predictive representations but remains under-explored due to the technical intractability of modeling and organizing innumerable mutual information (MI) terms. Recent studies show that sufficiency and consistency play such key roles in multi-view representation learning, and could be preserved via a variational distillation framework.

View Article and Find Full Text PDF

3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation.

Junshu Tang , Bo Zhang , Binxin Yang , Ting Zhang , Dong Chen , Lizhuang Ma

IEEE Trans Vis Comput Graph

September 2024

In contrast to the traditional avatar creation pipeline which is a costly process, contemporary generative approaches directly learn the data distribution from photographs. While plenty of works extend unconditional generative models and achieve some levels of controllability, it is still challenging to ensure multi-view consistency, especially in large poses. In this work, we propose a network that generates 3D-aware portraits while being controllable according to semantic parameters regarding pose, identity, expression and illumination.

View Article and Find Full Text PDF

Positive-Negative Receptive Field Reasoning for Omni-Supervised 3D Segmentation.

Xin Tan , Qihang Ma , Jingyu Gong , Jiachen Xu , Zhizhong Zhang , Lizhuang Ma

IEEE Trans Pattern Anal Mach Intell

December 2023

Hidden features in the neural networks usually fail to learn informative representation for 3D segmentation as supervisions are only given on output prediction, while this can be solved by omni-scale supervision on intermediate layers. In this paper, we bring the first omni-scale supervision method to 3D segmentation via the proposed gradual Receptive Field Component Reasoning (RFCR), where target Receptive Field Component Codes (RFCCs) is designed to record categories within receptive fields for hidden units in the encoder. Then, target RFCCs will supervise the decoder to gradually infer the RFCCs in a coarse-to-fine categories reasoning manner, and finally obtain the semantic labels.

View Article and Find Full Text PDF

Boosting Night-Time Scene Parsing With Learnable Frequency.

Zhifeng Xie , Sen Wang , Ke Xu , Zhizhong Zhang , Xin Tan , Lizhuang Ma

IEEE Trans Image Process

April 2023

Night-Time Scene Parsing (NTSP) is essential to many vision applications, especially for autonomous driving. Most of the existing methods are proposed for day-time scene parsing. They rely on modeling pixel intensity-based spatial contextual cues under even illumination.

View Article and Find Full Text PDF

Self-Adversarial Disentangling for Specific Domain Adaptation.

Qianyu Zhou , Qiqi Gu , Jiangmiao Pang , Xuequan Lu , Lizhuang Ma

IEEE Trans Pattern Anal Mach Intell

July 2023

Domain adaptation aims to bridge the domain shifts between the source and the target domain. These shifts may span different dimensions such as fog, rainfall, etc. However, recent methods typically do not consider explicit prior knowledge about the domain shifts on a specific dimension, thus leading to less desired adaptation performance.

View Article and Find Full Text PDF

MISSU: 3D Medical Image Segmentation via Self-Distilling TransUNet.

Nan Wang , Shaohui Lin , Xiaoxiao Li , Ke Li , Yunhang Shen , Lizhuang Ma

IEEE Trans Med Imaging

September 2023

U-Nets have achieved tremendous success in medical image segmentation. Nevertheless, it may have limitations in global (long-range) contextual interactions and edge-detail preservation. In contrast, the Transformer module has an excellent ability to capture long-range dependencies by leveraging the self-attention mechanism into the encoder.

View Article and Find Full Text PDF

TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers.

Qianyu Zhou , Xiangtai Li , Lu He , Yibo Yang , Guangliang Cheng , Lizhuang Ma

IEEE Trans Pattern Anal Mach Intell

June 2023

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their performance on Video Object Detection (VOD) has not been well explored. In this paper, we present TransVOD, the first end-to-end video object detection system based on simple yet effective spatial-temporal Transformer architectures.

View Article and Find Full Text PDF

LW-CovidNet: Automatic covid-19 lung infection detection from chest X-ray images.

Noor Ahmed , Xin Tan , Lizhuang Ma

IET Image Process

September 2022

Coronavirus Disease 2019 (Covid-19) overtook the worldwide in early 2020, placing the world's health in threat. Automated lung infection detection using Chest X-ray images has a ton of potential for enhancing the traditional covid-19 treatment strategy. However, there are several challenges to detect infected regions from Chest X-ray images, including significant variance in infected features similar spatial characteristics, multi-scale variations in texture shapes and sizes of infected regions.

View Article and Find Full Text PDF

Mirror Detection With the Visual Chirality Cue.

Xin Tan , Jiaying Lin , Ke Xu , Pan Chen , Lizhuang Ma

IEEE Trans Pattern Anal Mach Intell

March 2023

Mirror detection is challenging because the visual appearances of mirrors change depending on those of their surroundings. As existing mirror detection methods are mainly based on extracting contextual contrast and relational similarity between mirror and non-mirror regions, they may fail to identify a mirror region if these assumptions are violated. Inspired by a recent study of applying a CNN to help distinguish whether an image is flipped or not based on the visual chirality property, in this paper, we rethink this image-level visual chirality property and reformulate it as a learnable pixel level cue for mirror detection.

View Article and Find Full Text PDF

PRIN/SPRIN: On Extracting Point-Wise Rotation Invariant Features.

Yang You , Yujing Lou , Ruoxi Shi , Qi Liu , Yu-Wing Tai , Lizhuang Ma

IEEE Trans Pattern Anal Mach Intell

December 2022

Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown. In this paper, we propose a brand new point-set learning framework PRIN, namely, Point-wise Rotation Invariant Network, focusing on rotation invariant feature extraction in point clouds analysis. We construct spherical signals by Density Aware Adaptive Sampling to deal with distorted point distributions in spherical space.

View Article and Find Full Text PDF

Multi-site clustering and nested feature extraction for identifying autism spectrum disorder with resting-state fMRI.

Nan Wang , Dongren Yao , Lizhuang Ma , Mingxia Liu

Med Image Anal

January 2022

Brain functional connectivity (FC) derived from resting-state functional magnetic resonance imaging (rs-fMRI) has been widely employed to study neuropsychiatric disorders such as autism spectrum disorder (ASD). Existing studies usually suffer from (1) significant data heterogeneity caused by different scanners or studied populations in multiple sites, (2) curse of dimensionality caused by millions of voxels in each fMRI scan and a very limited number (tens or hundreds) of training samples, and (3) poor interpretability, which hinders the identification of reproducible disease biomarkers. To this end, we propose a Multi-site Clustering and Nested Feature Extraction (MC-NFE) method for fMRI-based ASD detection.

View Article and Find Full Text PDF

Night-Time Scene Parsing With a Large Real Dataset.

Xin Tan , Ke Xu , Ying Cao , Yiheng Zhang , Lizhuang Ma

IEEE Trans Image Process

November 2021

Although huge progress has been made on scene analysis in recent years, most existing works assume the input images to be in day-time with good lighting conditions. In this work, we aim to address the night-time scene parsing (NTSP) problem, which has two main challenges: 1) labeled night-time data are scarce, and 2) over- and under-exposures may co-occur in the input night-time images and are not explicitly modeled in existing pipelines. To tackle the scarcity of night-time data, we collect a novel labeled dataset, named NightCity, of 4,297 real night-time images with ground truth pixel-level semantic annotations.

View Article and Find Full Text PDF

Explicit Facial Expression Transfer via Fine-Grained Representations.

Zhiwen Shao , Hengliang Zhu , Junshu Tang , Xuequan Lu , Lizhuang Ma

IEEE Trans Image Process

May 2021

Facial expression transfer between two unpaired images is a challenging problem, as fine-grained expression is typically tangled with other facial attributes. Most existing methods treat expression transfer as an application of expression manipulation, and use predicted global expression, landmarks or action units (AUs) as a guidance. However, the prediction may be inaccurate, which limits the performance of transferring fine-grained expression.

View Article and Find Full Text PDF

Understanding Pixel-Level 2D Image Semantics With 3D Keypoint Knowledge Engine.

Yang You , Chengkun Li , Yujing Lou , Zhoujun Cheng , Liangwei Li , Lizhuang Ma

IEEE Trans Pattern Anal Mach Intell

September 2022

Pixel-level 2D object semantic understanding is an important topic in computer vision and could help machine deeply understand objects (e.g., functionality and affordance) in our daily life.

View Article and Find Full Text PDF

Low Rank Matrix Approximation for 3D Geometry Filtering.

Xuequan Lu , Scott Schaefer , Jun Luo , Lizhuang Ma , Ying He

IEEE Trans Vis Comput Graph

April 2022

We propose a robust normal estimation method for both point clouds and meshes using a low rank matrix approximation algorithm. First, we compute a local isotropic structure for each point and find its similar, non-local structures that we organize into a matrix. We then show that a low rank matrix approximation algorithm can robustly estimate normals for both point clouds and meshes.

View Article and Find Full Text PDF

Robust Kernelized Multiview Self-Representation for Subspace Clustering.

Yuan Xie , Jinyan Liu , Yanyun Qu , Dacheng Tao , Wensheng Zhang , Lizhuang Ma

IEEE Trans Neural Netw Learn Syst

February 2021

In this article, we propose a multiview self-representation model for nonlinear subspaces clustering. By assuming that the heterogeneous features lie within the union of multiple linear subspaces, the recent multiview subspace learning methods aim to capture the complementary and consensus from multiple views to boost the performance. However, in real-world applications, data feature usually resides in multiple nonlinear subspaces, leading to undesirable results.

View Article and Find Full Text PDF

Molecular and phenotypic spectrum of Noonan syndrome in Chinese patients.

Xin Li , Ruen Yao , Xin Tan , Niu Li , Yu Ding , Lizhuang Ma

Clin Genet

October 2019

Noonan syndrome (NS) is a common autosomal dominant/recessive disorder. No large-scale study has been conducted on NS in China, which is the most populous country in the world. Next-generation sequencing (NGS) was used to identify pathogenic variants in patients that exhibited NS-related phenotypes.

View Article and Find Full Text PDF

Continuous dynamic gesture spotting algorithm based on Dempster-Shafer Theory in the augmented reality human computer interaction.

Qiming Li , Chen Huang , Zhengwei Yao , Yimin Chen , Lizhuang Ma

Int J Med Robot

October 2018

Background: Human-computer interaction (HCI) is an important feature of augmented reality (AR) technology. The naturalness is the inevitable trend of HCI. Gesture is the most natural and frequently used body auxiliary interaction mode in daily interactions except for language.

View Article and Find Full Text PDF

An Human-Computer Interactive Augmented Reality System for Coronary Artery Diagnosis Planning and Training.

Qiming Li , Chen Huang , Shengqing Lv , Zeyu Li , Yimin Chen , Lizhuang Ma

J Med Syst

September 2017

In order to let the doctor carry on the coronary artery diagnosis and preoperative planning in a more intuitive and more natural way, and to improve the training effect for interns, an augmented reality system for coronary artery diagnosis planning and training (ARS-CADPT) is designed and realized in this paper. At first, a 3D reconstruction algorithm based on computed tomographic (CT) images is proposed to model the coronary artery vessels (CAV). Secondly, the algorithms of static gesture recognition and dynamic gesture spotting and recognition are presented to realize the real-time and friendly human-computer interaction (HCI), which is the characteristic of ARS-CADPT.

View Article and Find Full Text PDF