Multi-Agent Reinforcement Learning in Games: Research and Applications.

Haiyang Li , Ping Yang , Weidong Liu , Shaoqiang Yan , Xinyi Zhang , Donglin Zhu

Biomimetics (Basel)

School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Biological systems, ranging from ant colonies to neural ecosystems, exhibit remarkable self-organizing intelligence. Inspired by these phenomena, this study investigates how bio-inspired computing principles can bridge game-theoretic rationality and multi-agent adaptability. This study systematically reviews the convergence of multi-agent reinforcement learning (MARL) and game theory, elucidating the innovative potential of this integrated paradigm for collective intelligent decision-making in dynamic open environments. Building upon stochastic game and extensive-form game-theoretic frameworks, we establish a methodological taxonomy across three dimensions: value function optimization, policy gradient learning, and online search planning, thereby clarifying the evolutionary logic and innovation trajectories of algorithmic advancements. Focusing on complex smart city scenarios-including intelligent transportation coordination and UAV swarm scheduling-we identify technical breakthroughs in MARL applications for policy space modeling and distributed decision optimization. By incorporating bio-inspired optimization approaches, the investigation particularly highlights evolutionary computation mechanisms for dynamic strategy generation in search planning, alongside population-based learning paradigms for enhancing exploration efficiency in policy refinement. The findings reveal core principles governing how groups make optimal choices in complex environments while mapping the technological development pathways created by blending cross-disciplinary methods to enhance multi-agent systems.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12190516	PMC
http://dx.doi.org/10.3390/biomimetics10060375	DOI Listing

Publication Analysis

Top Keywords

multi-agent reinforcement

reinforcement learning

search planning

multi-agent

learning

learning games

games applications

applications biological

biological systems

systems ranging

Similar Publications

Reinforcement Learning on Dyads to Enhance Medication Adherence.

Artif Intell Med Conf Artif Intell Med (2005-)

June 2025

Harvard University, Cambridge, MA, USA.

Ziping Xu , Hinal Jajal , Sung Won Choi , Inbal Nahum-Shani , Guy Shani

Medication adherence is critical for the recovery of adolescents and young adults (AYAs) who have undergone hematopoietic cell transplantation. However, maintaining adherence is challenging for AYAs after hospital discharge, who experience both individual (e.g.

View Article and Find Full Text PDF

Similar Publications

Asymmetric Social Representations in the Prefrontal Cortex for Cooperative Behavior.

bioRxiv

August 2025

Yuan Cheng , Yusi Chen , Myungji Kwak , Ross P Kempner , Rudramani Singha

Cooperation is a hallmark of social species, enabling individuals to achieve goals that are unattainable alone. Across species, cooperative behaviors are often organized by distinct social roles such as leaders and followers, yet the neural mechanisms supporting such role-based coordination remain elusive. Here we introduce a new paradigm for studying cooperation in mice, where pairs of animals engage in a joint spatial foraging task that naturally gives rise to stable leader-follower roles predictive of learning speed.

View Article and Find Full Text PDF

Similar Publications

A robot scheduling method based on rMAPPO for H-beam riveting and welding work cell.

PLoS One

September 2025

Hubei Key Laboratory of Broadband Wireless Communication and Sensor Networks, School of Information Engineering, Wuhan University of Technology, Wuhan, Hubei, China.

Jianbin Zheng , Chuyi Zhou , Yang Gao , Ziyao Chen , Yifan Gao

The H-beam riveting and welding work cell is an automated unit used for processing H-beams. By coordinating the gripping and welding robots, the work cell achieves processes such as riveting and welding stiffener plates, transforming the H-beam into a stiffened H-beam. In the context of intelligent manufacturing, there is still significant potential for improving the productivity of riveting and welding tasks in existing H-beam riveting and welding work cells.

View Article and Find Full Text PDF

Similar Publications

A novel data-driven multi-agent pedestrian flow risk assessment framework for avoiding stampede incident.

Accid Anal Prev

August 2025

Department of Civil Engineering, The University of Tokyo, Tokyo, Japan. Electronic address:

Zi-Xuan Zhou , Kai Liu , Pei-Yang Wu , Wataru Nakanishi , Yasuo Asakura

This paper addresses the critical issue of monitoring high-density crowds in public spaces like transportation hubs to prevent accidents from overcrowding. It highlights the limitations of prevailing simulation tools in dealing with real-world challenges such as diverse pedestrian destinations, multi-directional flows, and the medley space designs in communal areas. The paper aims to introduce a data-driven, multi-agent framework that assesses crowd dynamics and early warning conditions in different spatial layouts.

View Article and Find Full Text PDF

Similar Publications

Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour.

PLoS Comput Biol

August 2025

AI Centre, Department of Computer Science, University College London, London, United Kingdom.

Olivia Macmillan-Scott , Mirco Musolesi

The coevolution of signalling is a complex problem within animal behaviour, and is also central to communication between artificial agents. The Sir Philip Sidney game was designed to model this dyadic interaction from an evolutionary biology perspective, and was formulated to demonstrate the emergence of honest signalling. We use Multi-Agent Reinforcement Learning (MARL) to show that in the majority of cases, the resulting behaviour adopted by agents is not that shown in the original derivation of the model.

View Article and Find Full Text PDF

Similar Publications