Related papers: Regularized Softmax Deep Multi-Agent $Q$-Learning

Regularize! Don't Mix: Multi-Agent Reinforcement Learning without Explicit Centralized Structures

We propose using regularization for Multi-Agent Reinforcement Learning rather than learning explicit cooperative structures called {\em Multi-Agent Regularized Q-learning} (MARQ). Many MARL approaches leverage centralized structures in…

Machine Learning · Computer Science 2021-09-21 Chapman Siu , Jason Traish , Richard Yi Da Xu

Implicitly Regularized RL with Implicit Q-Values

The $Q$-function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy w.r.t. to $Q$. It is a powerful tool that allows action selection without a model of the…

Machine Learning · Computer Science 2022-06-01 Nino Vieillard , Marcin Andrychowicz , Anton Raichuk , Olivier Pietquin , Matthieu Geist

Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

QMIX is a popular $Q$-learning algorithm for cooperative MARL in the centralised training and decentralised execution paradigm. In order to enable easy decentralisation, QMIX restricts the joint action $Q$-values it can represent to be a…

Machine Learning · Computer Science 2020-10-23 Tabish Rashid , Gregory Farquhar , Bei Peng , Shimon Whiteson

QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning

This paper introduces four new algorithms that can be used for tackling multi-agent reinforcement learning (MARL) problems occurring in cooperative settings. All algorithms are based on the Deep Quality-Value (DQV) family of algorithms, a…

Machine Learning · Computer Science 2020-12-23 Pascal Leroy , Damien Ernst , Pierre Geurts , Gilles Louppe , Jonathan Pisane , Matthia Sabatelli

Enhancing the Robustness of QMIX against State-adversarial Attacks

Deep reinforcement learning (DRL) performance is generally impacted by state-adversarial attacks, a perturbation applied to an agent's observation. Most recent research has concentrated on robust single-agent reinforcement learning (SARL)…

Machine Learning · Computer Science 2024-03-07 Weiran Guo , Guanjun Liu , Ziyuan Zhou , Ling Wang , Jiacun Wang

Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer

Overestimation in single-agent reinforcement learning has been extensively studied. In contrast, overestimation in the multiagent setting has received comparatively little attention although it increases with the number of agents and leads…

Multiagent Systems · Computer Science 2025-02-05 Yaodong Yang , Guangyong Chen , Hongyao Tang , Furui Liu , Danruo Deng , Pheng Ann Heng

ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation

In a multirobot system, a number of cyber-physical attacks (e.g., communication hijack, observation perturbations) can challenge the robustness of agents. This robustness issue worsens in multiagent reinforcement learning because there…

Machine Learning · Computer Science 2021-09-15 Chuangchuang Sun , Dong-Ki Kim , Jonathan P. How

Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning

In decentralized multi-agent reinforcement learning, agents learning in isolation can lead to relative over-generalization (RO), where optimal joint actions are undervalued in favor of suboptimal ones. This hinders effective coordination in…

Machine Learning · Computer Science 2024-11-19 Ting Zhu , Yue Jin , Jeremie Houssineau , Giovanni Montana

Stabilizing Q Learning Via Soft Mellowmax Operator

Learning complicated value functions in high dimensional state space by function approximation is a challenging task, partially due to that the max-operator used in temporal difference updates can theoretically cause instability for most…

Machine Learning · Computer Science 2020-12-21 Yaozhong Gan , Zhe Zhang , Xiaoyang Tan

Residual Q-Networks for Value Function Factorizing in Multi-Agent Reinforcement Learning

Multi-Agent Reinforcement Learning (MARL) is useful in many problems that require the cooperation and coordination of multiple agents. Learning optimal policies using reinforcement learning in a multi-agent setting can be very difficult as…

Machine Learning · Computer Science 2022-05-31 Rafael Pina , Varuna De Silva , Joosep Hook , Ahmet Kondoz

MARS: Co-evolving Dual-System Deep Research via Multi-Agent Reinforcement Learning

Large Reasoning Models (LRMs) face two fundamental limitations: excessive token consumption when overanalyzing simple information processing tasks, and inability to access up-to-date knowledge beyond their training data. We introduce MARS…

Artificial Intelligence · Computer Science 2026-02-03 Guoxin Chen , Zile Qiao , Wenqing Wang , Donglei Yu , Xuanzhong Chen , Hao Sun , Minpeng Liao , Kai Fan , Yong Jiang , Penguin Xie , Wayne Xin Zhao , Ruihua Song , Fei Huang

Energy-based Surprise Minimization for Multi-Agent Value Factorization

Multi-Agent Reinforcement Learning (MARL) has demonstrated significant success in training decentralised policies in a centralised manner by making use of value factorization methods. However, addressing surprise across spurious states and…

Machine Learning · Computer Science 2021-01-19 Karush Suri , Xiao Qi Shi , Konstantinos Plataniotis , Yuri Lawryshyn

Towards a Common Implementation of Reinforcement Learning for Multiple Robotic Tasks

Mobile robots are increasingly being employed for performing complex tasks in dynamic environments. Reinforcement learning (RL) methods are recognized to be promising for specifying such tasks in a relatively simple manner. However, the…

Artificial Intelligence · Computer Science 2017-11-08 Angel Martínez-Tenor , Juan Antonio Fernández-Madrigal , Ana Cruz-Martín , Javier González-Jiménez

Greedy UnMixing for Q-Learning in Multi-Agent Reinforcement Learning

This paper introduces Greedy UnMix (GUM) for cooperative multi-agent reinforcement learning (MARL). Greedy UnMix aims to avoid scenarios where MARL methods fail due to overestimation of values as part of the large joint state-action space.…

Machine Learning · Computer Science 2021-09-21 Chapman Siu , Jason Traish , Richard Yi Da Xu

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during…

Artificial Intelligence · Computer Science 2026-05-21 Yonghyeon Jo , Sunwoo Lee , Seungyul Han

Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning

Many complex multi-agent systems such as robot swarms control and autonomous vehicle coordination can be modeled as Multi-Agent Reinforcement Learning (MARL) tasks. QMIX, a widely popular MARL algorithm, has been used as a baseline for the…

Machine Learning · Computer Science 2023-06-09 Jian Hu , Siyang Jiang , Seth Austin Harding , Haibin Wu , Shih-wei Liao

Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning

Large Language Models (LLMs) have shown remarkable performance in completing various tasks. However, solving complex problems often requires the coordination of multiple agents, raising a fundamental question: how to effectively select and…

Computation and Language · Computer Science 2026-04-02 Eric Hanchen Jiang , Levina Li , Rui Sun , Xiao Liang , Yubei Li , Yuchen Wu , Haozheng Luo , Hengli Li , Zhi Zhang , Zhaolu Kang , Kai-Wei Chang , Ying Nian Wu

QSIM: Mitigating Overestimation in Multi-Agent Reinforcement Learning via Action Similarity Weighted Q-Learning

Value decomposition (VD) methods have achieved remarkable success in cooperative multi-agent reinforcement learning (MARL). However, their reliance on the max operator for temporal-difference (TD) target calculation leads to systematic…

Multiagent Systems · Computer Science 2026-02-27 Yuanjun Li , Bin Zhang , Hao Chen , Zhouyang Jiang , Dapeng Li , Zhiwei Xu

Ensemble Bootstrapping for Q-Learning

Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by…

Machine Learning · Computer Science 2021-04-21 Oren Peer , Chen Tessler , Nadav Merlis , Ron Meir

PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning

Training for multi-agent reinforcement learning(MARL) is a time-consuming process caused by distribution shift of each agent. One drawback is that strategy of each agent in MARL is independent but actually in cooperation. Thus, a vertical…

Artificial Intelligence · Computer Science 2024-03-06 Ke Zhang , DanDan Zhu , Qiuhan Xu , Hao Zhou , Ce Zheng