Rethinking softmax in incremental learning.

Neural Netw

Department of Statistical Sciences, University of Toronto, Ontario, Canada; Department of Computer Science, University of Toronto, Ontario, Canada; Department of Statistics and Data Science, MBZUAI, Abu Dhabi, UAE. Electronic address:

Published: August 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Mitigating catastrophic forgetting remains a fundamental challenge in incremental learning. This paper identifies a key limitation of the widely used softmax cross-entropy loss: the non-identifiability inherent in the standard softmax cross-entropy distillation loss. To address this issue, we propose two complementary strategies: (1) adopting an imbalance-invariant distillation loss to mitigate the adverse effect of imbalanced weights during distillation, and (2) regularizing the original prediction/distillation loss with shift-sensitive alternatives, which render the optimization problem identifiable and proactively prevent imbalance from arising. These strategies form the foundation of five novel approaches that can be seamlessly integrated into existing distillation-based incremental learning frameworks such as LWF, LWM, and LUCIR. We validate the effectiveness of our approaches through extensive numerical experiments, demonstrating consistent improvements in predictive accuracy and substantial reductions in forgetting. For example, in a 10-task incremental learning setting on CIFAR-100, our methods improve the average accuracy of three widely used approaches - LWF, LWM, and LUCIR - by 11.8 %, 11.5 %, and 12.8 %, respectively, while reducing their average forgetting rates by 16.5 %, 16.8 %, and 13.8 %, respectively. Our code is publicly available at https://github.com/nexais/RethinkSoftmax.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2025.108017DOI Listing

Publication Analysis

Top Keywords

incremental learning
16
softmax cross-entropy
8
distillation loss
8
lwf lwm
8
lwm lucir
8
rethinking softmax
4
incremental
4
softmax incremental
4
learning
4
learning mitigating
4

Similar Publications

The widespread dissemination of fake news presents a critical challenge to the integrity of digital information and erodes public trust. This urgent problem necessitates the development of sophisticated and reliable automated detection mechanisms. This study addresses this gap by proposing a robust fake news detection framework centred on a transformer-based architecture.

View Article and Find Full Text PDF

In industrial scenarios, semantic segmentation of surface defects is vital for identifying, localizing, and delineating defects. However, new defect types constantly emerge with product iterations or process updates. Existing defect segmentation models lack incremental learning capabilities, and direct fine-tuning (FT) often leads to catastrophic forgetting.

View Article and Find Full Text PDF

Emotion annotation in code-mixed languages like Hinglish (Hindi-English) presents unique challenges due to linguistic complexity and resource constraints. This study introduces a hybrid active learning framework that combines lexical rules, machine learning, and iterative expert feedback to achieve cost-efficient, high-accuracy emotion annotation. Grounded in psychological theories of emotion, including Discrete Emotions Theory and Cognitive Appraisal Theory, the framework employs bilingual emotion dictionaries (e.

View Article and Find Full Text PDF

Class incremental learning (CIL) offers a promising framework for continuous fault diagnosis (CFD), allowing networks to accumulate knowledge from streaming industrial data and recognize new fault classes. However, current CIL methods assume a balanced data stream, which does not align with the long-tail distribution of fault classes in real industrial scenarios. To fill this gap, this article investigates the impact of long-tail bias in the data stream on the CIL training process through the experimental analysis.

View Article and Find Full Text PDF

Objective: This study aimed to identify dynamic spatiotemporal traffic factors influencing conflict risk levels on National Highways under heterogeneous traffic conditions in India. The research addresses gaps by capturing vehicle interactions using high-resolution UAV-based trajectory data and proposes a novel two-stage methodology for real-time conflict risk evaluation, moving beyond traditional binary risk classifications to a four-level framework (High, Moderate, Low, No-Risk).

Methods: Over 40,000 conflict risk sequences were classified into four severity levels using the Modified Time-to-Collision (MTTC) surrogate safety measure.

View Article and Find Full Text PDF