Eliminating Primacy Bias in Online Reinforcement Learning by Self-Distillation.

Jingchen Li , Haobin Shi , Huarui Wu , Chunjiang Zhao , Kao-Shing Hwang

IEEE Trans Neural Netw Learn Syst

Published: April 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Excessive invalid explorations at the beginning of training lead deep reinforcement learning process to fall into the risk of overfitting, further resulting in spurious decisions, which obstruct agents in the following states and explorations. This phenomenon is termed primacy bias in online reinforcement learning. This work systematically investigates the primacy bias in online reinforcement learning, discussing the reason for primacy bias, while the characteristic of primacy bias is also analyzed. Besides, to learn a policy generalized to the following states and explorations, we develop an online reinforcement learning framework, termed self-distillation reinforcement learning (SDRL), based on knowledge distillation, allowing the agent to transfer the learned knowledge into a randomly initialized policy at regular intervals, and the new policy network is used to replace the original one in the following training. The core idea for this work is distilling knowledge from the trained policy to another policy can filter biases out, generating a more generalized policy in the learning process. Moreover, to avoid the overfitting of the new policy due to excessive distillations, we add an additional loss in the knowledge distillation process, using L2 regularization to improve the generalization, and the self-imitation mechanism is introduced to accelerate the learning on the current experiences. The results of several experiments in DMC and Atari 100k suggest the proposal has the ability to eliminate primacy bias for reinforcement learning methods, and the policy after knowledge distillation can urge agents to get higher scores more quickly.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2024.3397704	DOI Listing

Publication Analysis

Top Keywords

reinforcement learning

primacy bias

online reinforcement

bias online

knowledge distillation

learning

learning process

states explorations

policy

reinforcement

Similar Publications

Cerebellar Stimulation Modulates Reward Processing: A High-definition Transcranial Direct Current Stimulation Study.

Cerebellum

September 2025

Neuropsychology and Applied Cognitive Neuroscience Laboratory, State Key Laboratory of Cognitive Science and Mental Health, Institute of Psychology, Chinese Academy of Sciences, Beijing, China.

Xuan Wang , Jin-Ting Yu , Ling-Ling Wang , Jia Huang , Yi Wang

Reward processing involves several components, including reward anticipation, cost-effort computation, reward consumption, reward sensitivity, and reward learning. Recent research has highlighted the cerebellum's role in reward processing. This study aimed to investigate the effects of cerebellar stimulation on reward processing using high-definition transcranial direct current stimulation (HD-tDCS).

View Article and Find Full Text PDF

Similar Publications

A Silent Invader: Asymptomatic Rhodococcus Infection Unmasked in A Patient with Ectopic ACTH-Dependent Cushing's Syndrome.

Eur J Case Rep Intern Med

August 2025

Charleston Area Medical Center, Charleston, USA.

Shahzeb Saeed , George Fawzy , Ayesha Shah , Molly John

Introduction: species, particularly , are rare opportunistic pathogens that typically affect immunocompromised individuals. These infections usually present with respiratory or systemic symptoms and are often linked to environmental exposure. Asymptomatic infections are exceedingly rare and pose unique diagnostic and therapeutic challenges.

View Article and Find Full Text PDF

Similar Publications

Box of Lessons: An Open Educational Resource for Exploring Biomolecular Structure and Function.

J Coll Sci Teach

March 2025

RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers University, Piscataway, New Jersey, United States.

Alexandra S Pettit , Keith A Johnson , Brian Gadd , Shuchismita Dutta

Structure-function relationships are a core concept in many STEM disciplines. Most biology curricula introduce students to macromolecules, their building blocks, and other small molecules that play key roles in biological processes. However, the shapes, interactions, and functions of these molecules are often discussed using schematic diagrams, ignoring the vast amounts of three-dimensional structural and bioinformatics data freely available from public data resources.

View Article and Find Full Text PDF

Similar Publications

Enhancing Knowledge Retention by Simulation-Based Learning Among First-Year Medical Students.

Cureus

August 2025

Physiology, SGT University, Gurugram, IND.

Nimarpreet Kaur , Bhupendra Yadav , Deepti Dwivedi , Harminder Kaur , Pragyashaa Chaudhary

Introduction Simulation-based training has been a vital part of medical education since Competency-Based Medical Education (CBME) was introduced, and new guidelines since 2023 have expanded to include simulation as a mandatory methodology of teaching. This method enables learners to build and develop both technical and non-technical abilities in a safe and controlled setting, enhancing their preparedness for real-life medical scenarios. Simulation-based training improves skill acquisition and retention and enhances learners' confidence, reduces anxiety, reinforces learning, corrects errors, and promotes reflective practice, in contrast with the traditional method of teaching.

View Article and Find Full Text PDF

Similar Publications

High spatiotemporal-resolution abdominal 4D-MRI through respiratory-synchronized frame collaborative reconstruction.

Med Phys

September 2025

Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China.

Yinghui Wang , Lu Wang , Yidan Feng , Zhi Chen , Jing Qin

Background: Four-dimensional magnetic resonance imaging (4D-MRI) holds great promise for precise abdominal radiotherapy guidance. However, current 4D-MRI methods are limited by an inherent trade-off between spatial and temporal resolutions, resulting in compromised image quality characterized by low spatial resolution and significant motion artifacts, hindering clinical implementation. Despite recent advancements, existing methods inadequately exploit redundant frame information and struggle to restore structural details from highly undersampled acquisitions.

View Article and Find Full Text PDF

Similar Publications