98%
921
2 minutes
20
Deep reinforcement learning (DRL) is a powerful approach that combines reinforcement learning (RL) and deep learning to address complex decision-making problems in high-dimensional environments. Although DRL has been remarkably successful, its low sample efficiency necessitates extensive training times and large amounts of data to learn optimal policies. These limitations are more pronounced in the context of multi-agent reinforcement learning (MARL). To address these limitations, various studies have been conducted to improve DRL. In this study, we propose an approach that combines a masked reconstruction task with QMIX (M-QMIX). By introducing a masked reconstruction task as an auxiliary task, we aim to achieve enhanced sample efficiency-a fundamental limitation of RL in multi-agent systems. Experiments were conducted using the StarCraft II micromanagement benchmark to validate the effectiveness of the proposed method. We used 11 scenarios comprising five easy, three hard, and three very hard scenarios. We particularly focused on using a limited number of time steps for each scenario to demonstrate the improved sample efficiency. Compared to QMIX, the proposed method is superior in eight of the 11 scenarios. These results provide strong evidence that the proposed method is more sample-efficient than QMIX, demonstrating that it effectively addresses the limitations of DRL in multi-agent systems.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501567 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0291545 | PLOS |
J Exp Anal Behav
September 2025
Fralin Biomedical Research Institute at VTC, Roanoke, VA, United States of America.
Reward delays are often associated with reduced probability of reward, although standard assessments of delay discounting do not specify degree of reward certainty. Thus, the extent to which estimates of delay discounting are influenced by uncontrolled variance in perceived reward certainty remains unclear. Here we examine 370 participants who were randomly assigned to complete a delay discounting task when reward certainty was either unspecified (n=184) or specified as 100% (n = 186) in the task trials and task instructions.
View Article and Find Full Text PDFPLoS One
September 2025
College of Business Administration, Northern Border University (NBU), Arar, Kingdom of Saudi Arabia.
The increasing dependence on cloud computing as a cornerstone of modern technological infrastructures has introduced significant challenges in resource management. Traditional load-balancing techniques often prove inadequate in addressing cloud environments' dynamic and complex nature, resulting in suboptimal resource utilization and heightened operational costs. This paper presents a novel smart load-balancing strategy incorporating advanced techniques to mitigate these limitations.
View Article and Find Full Text PDFJ Exp Anal Behav
September 2025
Laboratorio de Análisis de la Conducta, Universidad Nacional Autónoma de México. Facultad de Estudios Superiores Iztacala.
Rules can control the listener's behavior, yet few studies have examined variables that quantitatively determine the extent of this control relative to other rules and contingencies. To explore these variables, we employed a novel procedure that required a choice between rules. Participants clicked two buttons on a computer screen to earn points exchangeable for money.
View Article and Find Full Text PDFPolicy optimization methods are promising to tackle high-complexity reinforcement learning (RL) tasks with multiple agents. In this article, we derive a general trust region for policy optimization methods by considering the effect of subpolicy combinations among agents in multiagent environments. Based on this trust region, we propose an inductive objective to train the policy function, which can ensure agents learn monotonically improving policies.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
September 2025
In essence, reinforcement learning (RL) solves optimal control problem (OCP) by employing a neural network (NN) to fit the optimal policy from state to action. The accuracy of policy approximation is often very low in complex control tasks, leading to unsatisfactory control performance compared with online optimal controllers. A primary reason is that the landscape of value function is always not only rugged in most areas but also flat on the bottom, which damages the convergence to the minimum point.
View Article and Find Full Text PDF