Multiagent Inductive Policy Optimization.

Yubo Huang , Xiaowei Zhao

IEEE Trans Neural Netw Learn Syst

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Policy optimization methods are promising to tackle high-complexity reinforcement learning (RL) tasks with multiple agents. In this article, we derive a general trust region for policy optimization methods by considering the effect of subpolicy combinations among agents in multiagent environments. Based on this trust region, we propose an inductive objective to train the policy function, which can ensure agents learn monotonically improving policies. Furthermore, we observe that the policy always updates very weakly before falling into a local optimum. To address this, we introduce a cost regarding policy distance in the inductive objective to strengthen the motivation of agents to explore new policies. This approach strikes a balance during training, where the policy update step size remains within the constraints of the trust region, preventing excessive updates while avoiding getting stuck in local optima. Simulations on wind farm (WF) control tasks and two multiagent benchmarks demonstrate the high performance of the proposed multiagent inductive policy optimization (MAIPO) method.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2025.3601360	DOI Listing

Publication Analysis

Top Keywords

policy optimization

trust region

multiagent inductive

policy

inductive policy

optimization methods

inductive objective

multiagent

optimization

optimization policy

A PHP Error was encountered