Human Strategy Adaptation in Reinforcement Learning Resembles Policy Gradient Ascent.

Hua-Dong Xiong , Li Ji-An , Robert C Wilson , Marcelo G Mattar

bioRxiv

Department of Psychology, New York University.

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

A hallmark of intelligence is the ability to adapt behavior to changing environments, which requires adapting one's own learning strategies. This phenomenon is known as learning to learn in cognitive science and meta-learning in artificial intelligence. While this phenomenon is well-established in humans and animals, no quantitative framework exists for characterizing the trajectories through which biological agents adapt their learning strategies. Previous computational studies that either assume fixed strategies or use task-optimized neural networks do not explain how humans refine strategies through experience. Here we show that humans adjust their reinforcement learning strategies resembling principles of gradient-based online optimization. We introduce DynamicRL, a framework using neural networks to track how participants' learning parameters (e.g., learning rates and decision temperatures) evolve throughout experiments. Across four diverse bandit tasks, DynamicRL consistently outperforms traditional reinforcement learning models with fixed parameters, demonstrating that humans continuously adapt their strategies over time. These dynamically-estimated parameters reveal trajectories that systematically increase expected rewards, with updates significantly aligned with policy gradient ascent directions. Furthermore, this learning process operates across multiple timescales, with strategy parameters updating more slowly than behavioral choices, and update effectiveness correlates with local gradient strength in the reward landscape. Our work offers a generalizable approach for characterizing meta-learning trajectories, bridging theories of biological and artificial intelligence by providing a quantitative method for studying how adaptive behavior is optimized through experience.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12324363	PMC
http://dx.doi.org/10.1101/2025.07.28.667308	DOI Listing

Publication Analysis

Top Keywords

reinforcement learning

learning strategies

learning

policy gradient

gradient ascent

artificial intelligence

neural networks

strategies

human strategy

strategy adaptation

Similar Publications

Exploring Neural Markers of Reward and Loss Processing and Problematic Parenting Styles A Mothers With and Without Histories of Depression.

Dev Psychobiol

September 2025

Department of Psychiatry, University of Illinois Chicago, Chicago, Illinois, USA.

Jennifer H Suor , Rebecca Mueller , Stewart A Shankman , Katie L Burkhouse

Depressed mothers often experience parenting difficulties, which can persist after their symptoms have remitted. However, not all depressed mothers show parenting struggles, suggesting that there could be unidentified characteristics that increase risk. Specifically, neurobiological models emphasize that reward system deficits contribute to maladaptive parenting and depression, but no studies have evaluated how they could conjointly lead to parenting challenges.

View Article and Find Full Text PDF

Similar Publications

A neuronal correlate for time interval estimation in the crow's telencephalon.

Nat Commun

September 2025

Animal Physiology Unit, Institute of Neurobiology, University of Tübingen, Tübingen, Germany.

Melissa Johnston , Maximilian E Kirschhock , Andreas Nieder

Interval timing, the ability to perceive and estimate durations between events, is essential for many animal behaviors. In mammals, it is linked to specific cortical and sub-cortical brain regions, but its neural basis in birds remains unclear. We trained two male carrion crows on a time estimation task using visual stimuli, cueing them to wait for a minimum duration of 1500 ms, 3000 ms, or 6000 ms before responding to receive a reward.

View Article and Find Full Text PDF

Similar Publications

A journey of leading a healthcare start-up in India: from the National Health Service to a corporate leadership culture.

BMJ Lead

September 2025

Green Templeton College, University of Oxford, Oxford, UK.

Gurpreet Singh Kalra , David Cahill , Oscar Lyons

Background: In 2021, Dr Kalra embraced an opportunity for a leadership role at a start-up healthcare organisation in India. This gave him an opportunity to adapt his National Health Service (NHS) leadership experience to the evolving Indian private healthcare landscape. This paper shares his lived experience as a National Medical Director and delves into the experiences and leadership insights he acquired during this.

View Article and Find Full Text PDF

Similar Publications

How can teen driver education be enhanced with ADAS training: Stakeholder perspectives.

J Safety Res

September 2025

University of Massachusetts Amherst, 160 Governors Drive, Amherst, MA 01002, USA. Electronic address:

Meng Wang , Madison Perry , Apoorva Hungund , Stefanie Reineke , Anuj K Pradhan

Introduction: Effective driver education for teen drivers is increasingly important, especially as Advanced Driver Assistance Systems (ADAS) become standard in modern vehicles. This study examines driver education programs in the commonwealth of Massachusetts and explores how they are placed to prepare young drivers to understand and safely use ADAS technologies.

Method: Through a convergent mixed-methods approach, we analyzed thematic data from interviews and surveys of key stakeholders and performed sentiment analysis to capture their concerns and attitudes.

View Article and Find Full Text PDF

Similar Publications

Teaching experience with immersive virtual reality using the "VR-Triage" tool.

Int J Med Inform

September 2025

Profesora Titular de la Universidad de Alicante, Spain. Electronic address:

Noelia García-Aracil , Mª Elena Castejón-de la Encina , Rosario López-Picazo , Daniel Ruiz-Fernández , Sara Cano-Sánchez

Background: Immersive Virtual Reality (IVR) is increasingly used in health sciences education to simulate high-risk, low-frequency scenarios such as mass casualty incidents. While prior research has focused on student outcomes, the perceptions of instructors about available IVR tools remains underexplored.

Objective: To evaluate instructors' perceptions regarding ease of use, educational value, and technical quality of the "VR-Triage" immersive simulation tool in a disaster and mass casualty incident course.

View Article and Find Full Text PDF

Similar Publications