Rethinking softmax in incremental learning.

Zheng Zhai , Jiali Zhang , Haiyu Wang , Mingxin Wu , Keshun Yang , Xiaoyan Qiao , Qiang Sun

Neural Netw

Department of Statistical Sciences, University of Toronto, Ontario, Canada; Department of Computer Science, University of Toronto, Ontario, Canada; Department of Statistics and Data Science, MBZUAI, Abu Dhabi, UAE. Electronic address:

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Mitigating catastrophic forgetting remains a fundamental challenge in incremental learning. This paper identifies a key limitation of the widely used softmax cross-entropy loss: the non-identifiability inherent in the standard softmax cross-entropy distillation loss. To address this issue, we propose two complementary strategies: (1) adopting an imbalance-invariant distillation loss to mitigate the adverse effect of imbalanced weights during distillation, and (2) regularizing the original prediction/distillation loss with shift-sensitive alternatives, which render the optimization problem identifiable and proactively prevent imbalance from arising. These strategies form the foundation of five novel approaches that can be seamlessly integrated into existing distillation-based incremental learning frameworks such as LWF, LWM, and LUCIR. We validate the effectiveness of our approaches through extensive numerical experiments, demonstrating consistent improvements in predictive accuracy and substantial reductions in forgetting. For example, in a 10-task incremental learning setting on CIFAR-100, our methods improve the average accuracy of three widely used approaches - LWF, LWM, and LUCIR - by 11.8 %, 11.5 %, and 12.8 %, respectively, while reducing their average forgetting rates by 16.5 %, 16.8 %, and 13.8 %, respectively. Our code is publicly available at https://github.com/nexais/RethinkSoftmax.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.neunet.2025.108017	DOI Listing

Publication Analysis

Top Keywords

incremental learning

softmax cross-entropy

distillation loss

lwf lwm

lwm lucir

rethinking softmax

incremental

softmax incremental

learning

learning mitigating

Similar Publications

Enhancing fake news detection with transformer-based deep learning: A multidisciplinary approach.

PLoS One

September 2025

Department of Computer Science, COMSATS University Islamabad, Sahiwal, Pakistan.

Nabeel Raza , Said Jadid Abdulkadir , Yawar Abbas Abid , Sami S Albouq , Ayed Alwadain

The widespread dissemination of fake news presents a critical challenge to the integrity of digital information and erodes public trust. This urgent problem necessitates the development of sophisticated and reliable automated detection mechanisms. This study addresses this gap by proposing a robust fake news detection framework centred on a transformer-based architecture.

View Article and Find Full Text PDF

Similar Publications

Incremental Learning for Defect Segmentation With Efficient Transformer Semantic Complement.

IEEE Trans Neural Netw Learn Syst

September 2025

Xiqi Li , Zhifu Huang , Ge Ma , Yu Liu

In industrial scenarios, semantic segmentation of surface defects is vital for identifying, localizing, and delineating defects. However, new defect types constantly emerge with product iterations or process updates. Existing defect segmentation models lack incremental learning capabilities, and direct fine-tuning (FT) often leads to catastrophic forgetting.

View Article and Find Full Text PDF

Similar Publications

Machine Learning and Lexical Rule-Based Cost-Efficient Emotion Annotation of Hinglish Utterances.

J Vis Exp

August 2025

Chitkara University Institute of Engineering & Technology, Chitkara University.

Pratibha Verma , Amandeep Kaur , Meenu Khurana , Deepali Gupta

Emotion annotation in code-mixed languages like Hinglish (Hindi-English) presents unique challenges due to linguistic complexity and resource constraints. This study introduces a hybrid active learning framework that combines lexical rules, machine learning, and iterative expert feedback to achieve cost-efficient, high-accuracy emotion annotation. Grounded in psychological theories of emotion, including Discrete Emotions Theory and Cognitive Appraisal Theory, the framework employs bilingual emotion dictionaries (e.

View Article and Find Full Text PDF

Similar Publications

Long-Tail Class Incremental Learning via Bias Calibration With Application to Continuous Fault Diagnosis.

IEEE Trans Neural Netw Learn Syst

September 2025

Dongyue Chen , Zongxia Xie , Wenlong Yu , Qinghua Hu

Class incremental learning (CIL) offers a promising framework for continuous fault diagnosis (CFD), allowing networks to accumulate knowledge from streaming industrial data and recognize new fault classes. However, current CIL methods assume a balanced data stream, which does not align with the long-tail distribution of fault classes in real industrial scenarios. To fill this gap, this article investigates the impact of long-tail bias in the data stream on the CIL training process through the experimental analysis.

View Article and Find Full Text PDF

Similar Publications

Investigating spatiotemporal traffic dynamics toward conflict risk levels using trajectory data in heterogeneous traffic conditions.

Traffic Inj Prev

September 2025

Department of Civil Engineering, Sardar Vallabhbhai National Institute of Technology, Surat, India.

Vineet Jain , Ashish Dhamaniya

Objective: This study aimed to identify dynamic spatiotemporal traffic factors influencing conflict risk levels on National Highways under heterogeneous traffic conditions in India. The research addresses gaps by capturing vehicle interactions using high-resolution UAV-based trajectory data and proposes a novel two-stage methodology for real-time conflict risk evaluation, moving beyond traditional binary risk classifications to a four-level framework (High, Moderate, Low, No-Risk).

Methods: Over 40,000 conflict risk sequences were classified into four severity levels using the Modified Time-to-Collision (MTTC) surrogate safety measure.

View Article and Find Full Text PDF

Similar Publications