NNG-Mix: Improving Semi-Supervised Anomaly Detection With Pseudo-Anomaly Generation.

Hao Dong , Gaetan Frusque , Yue Zhao , Eleni Chatzi , Olga Fink

IEEE Trans Neural Netw Learn Syst

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised AD. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this article, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named nearest neighbor Gaussian mix-up (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised AD algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on 57 benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP datasets in ADBench. Our source code is available at https://github.com/donghao51/NNG-Mix.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2024.3497801	DOI Listing

Publication Analysis

Top Keywords

semi-supervised supervised

anomaly detection

novel algorithm

unlabeled data

original training

training data

datasets adbench

data

detection

nng-mix

A PHP Error was encountered