Publications by authors named "Yanyun Qu"

Parameter-efficient tunings (PETs) have demonstrated impressive performance and promising perspectives in training large models, while they are still confronted with a common problem: the trade-off between learning new content and protecting old knowledge, leading to zero-shot generalization collapse, and cross-modal hallucination. In this paper, we reformulate Adapter, LoRA, Prefix-tuning, and Prompt-tuning from the perspective of gradient projection, and firstly propose a unified framework called Parameter Efficient Gradient Projection (PEGP). We introduce orthogonal gradient projection into different PET paradigms and theoretically demonstrate that the orthogonal condition for the gradient can effectively resist forgetting even for large-scale models.

View Article and Find Full Text PDF

Anomaly localization (AL) is an indispensable and challenging task in manufacturing. Recently, diffusion models have been widely used to localize anomalies through discrepancies between original and reconstructed representations, which is based on the hypothesis that diffusion models regard anomalies as noise and reconstruct them to normal representations. However, anomalies usually deviate from prior standard Gaussian distribution and diffusion models cannot reconstruct anomaly parts as normal patterns well due to powerful generalization.

View Article and Find Full Text PDF

Humans can quickly learn new concepts with limited experience, while not forgetting learned knowledge. Such ability in machine learning is referred to as few-shot class-incremental learning (FSCIL). Although some methods try to solve this problem by putting similar efforts to prevent forgetting and promote learning, we find existing techniques do not give enough importance to the new category as new training samples are rather rare.

View Article and Find Full Text PDF

Weakly supervised point cloud semantic segmentation is an increasingly active topic, because fully supervised learning acquires well-labeled point clouds and entails high costs. The existing weakly supervised methods either need meticulously designed data augmentation for self-supervised learning or ignore the negative effects of learning on pseudolabel noises. In this article, by designing different granularity of cross-cloud structures, we propose a cross-cloud consistency method for weakly supervised point cloud semantic segmentation which forms the expectation-maximum (EM) framework.

View Article and Find Full Text PDF

Combining LiDAR points and images for robust semantic segmentation has shown great potential. However, the heterogeneity between the two modalities (e.g.

View Article and Find Full Text PDF

Weakly supervised point cloud semantic segmentation methods that require 1% or fewer labels with the aim of realizing almost the same performance as fully supervised approaches have recently attracted extensive research attention. A typical solution in this framework is to use self-training or pseudo-labeling to mine the supervision from the point cloud itself while ignoring the critical information from images. In fact, cameras widely exist in LiDAR scenarios, and this complementary information seems to be highly important for 3D applications.

View Article and Find Full Text PDF

Information Bottleneck (IB) provides an information-theoretic principle for multi-view learning by revealing the various components contained in each viewpoint. This highlights the necessity to capture their distinct roles to achieve view-invariance and predictive representations but remains under-explored due to the technical intractability of modeling and organizing innumerable mutual information (MI) terms. Recent studies show that sufficiency and consistency play such key roles in multi-view representation learning, and could be preserved via a variational distillation framework.

View Article and Find Full Text PDF

Recently, with the development of intelligent manufacturing, the demand for surface defect inspection has been increasing. Deep learning has achieved promising results in defect inspection. However, due to the rareness of defect data and the difficulties of pixelwise annotation, the existing supervised defect inspection methods are too inferior to be implemented in practice.

View Article and Find Full Text PDF

Hidden features in the neural networks usually fail to learn informative representation for 3D segmentation as supervisions are only given on output prediction, while this can be solved by omni-scale supervision on intermediate layers. In this paper, we bring the first omni-scale supervision method to 3D segmentation via the proposed gradual Receptive Field Component Reasoning (RFCR), where target Receptive Field Component Codes (RFCCs) is designed to record categories within receptive fields for hidden units in the encoder. Then, target RFCCs will supervise the decoder to gradually infer the RFCCs in a coarse-to-fine categories reasoning manner, and finally obtain the semantic labels.

View Article and Find Full Text PDF

Image super-resolution (SR) usually synthesizes degraded low-resolution images with a predefined degradation model for training. Existing SR methods inevitably perform poorly when the true degradation does not follow the predefined degradation, especially in the case of the real world. To tackle this robustness issue, we propose a cascaded degradation-aware blind super-resolution network (CDASRN), which not only eliminates the influence of noise on blur kernel estimation but also can estimate the spatially varying blur kernel.

View Article and Find Full Text PDF

Multiview clustering via binary representation has attracted intensive attention due to its effectiveness in handling large-scale multiple view data. However, these kind of clustering approaches usually ignore a very important potential high-order correlation in discrete representation learning. In this article, we propose a novel all-in collaborative multiview binary representation for clustering (AC-MVBC) framework, where multiview collaborative binary representation and clustering structure are learned in a joint manner.

View Article and Find Full Text PDF

Deep learning has made unprecedented progress in image restoration (IR), where residual block (RB) is popularly used and has a significant effect on promising performance. However, the massive stacked RBs bring about burdensome memory and computation cost. To tackle this issue, we aim to design an economical structure for adaptively connecting pair-wise RBs, thereby enhancing the model representation.

View Article and Find Full Text PDF

In this article, we propose a multiview self-representation model for nonlinear subspaces clustering. By assuming that the heterogeneous features lie within the union of multiple linear subspaces, the recent multiview subspace learning methods aim to capture the complementary and consensus from multiple views to boost the performance. However, in real-world applications, data feature usually resides in multiple nonlinear subspaces, leading to undesirable results.

View Article and Find Full Text PDF

Existing enhancement methods are empirically expected to help the high-level end computer vision task: however, that is observed to not always be the case in practice. We focus on object or face detection in poor visibility enhancements caused by bad weathers (haze, rain) and low light conditions. To provide a more thorough examination and fair comparison, we introduce three benchmark sets collected in real-world hazy, rainy, and low-light conditions, respectively, with annotated objects/faces.

View Article and Find Full Text PDF

Purpose: Image-based breast lesion detection is a powerful clinical diagnosis technology. In recent years, deep learning architectures have achieved considerable success in medical image analysis however, they always require large-scale samples. In mammography images, breast lesions are inconspicuous, multiscale, and have blurred edges.

View Article and Find Full Text PDF

In this paper, we address the multiview nonlinear subspace representation problem. Traditional multiview subspace learning methods assume that the heterogeneous features of the data usually lie within the union of multiple linear subspaces. However, instead of linear subspaces, the data feature actually resides in multiple nonlinear subspaces in many real-world applications, resulting in unsatisfactory clustering performance.

View Article and Find Full Text PDF

We investigate the scalable image classification problem with a large number of categories. Hierarchical visual data structures are helpful for improving the efficiency and performance of large-scale multi-class classification. We propose a novel image classification method based on learning hierarchical inter-class structures.

View Article and Find Full Text PDF

We propose a robust tracking algorithm based on local sparse coding with discriminative dictionary learning and new keypoint matching schema. This algorithm consists of two parts: the local sparse coding with online updated discriminative dictionary for tracking (SOD part), and the keypoint matching refinement for enhancing the tracking performance (KP part). In the SOD part, the local image patches of the target object and background are represented by their sparse codes using an over-complete discriminative dictionary.

View Article and Find Full Text PDF