Related papers: Value Function Approximation in Zero-Sum Markov Ga…

Reinforcement Learning for Multi-Objective and Constrained Markov Decision Processes

In this paper, we consider the problem of optimization and learning for constrained and multi-objective Markov decision processes, for both discounted rewards and expected average rewards. We formulate the problems as zero-sum games where…

Optimization and Control · Mathematics 2021-03-05 Ather Gattami , Qinbo Bai , Vaneet Agarwal

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such…

Machine Learning · Computer Science 2022-03-01 Yulai Zhao , Yuandong Tian , Jason D. Lee , Simon S. Du

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Optimal policies in standard MDPs can be obtained using either value iteration or policy iteration. However, in the case of zero-sum Markov games, there is no efficient policy iteration algorithm; e.g., it has been shown that one has to…

Machine Learning · Computer Science 2023-10-31 Anna Winnicki , R. Srikant

Towards General Function Approximation in Zero-Sum Markov Games

This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves. The study focuses on the challenging settings where the value function or the model is parameterized by general function classes. Provably…

Computer Science and Game Theory · Computer Science 2021-11-02 Baihe Huang , Jason D. Lee , Zhaoran Wang , Zhuoran Yang

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of Markov games where the reward…

Machine Learning · Computer Science 2020-06-25 Qiaomin Xie , Yudong Chen , Zhaoran Wang , Zhuoran Yang

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors. Given a stochastic game with discount factor $\gamma\in(0,1)$ we provide an algorithm that…

Machine Learning · Computer Science 2019-08-30 Aaron Sidford , Mengdi Wang , Lin F. Yang , Yinyu Ye

Reinforcement Learning of Markov Decision Processes with Peak Constraints

In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take…

Optimization and Control · Mathematics 2019-12-09 Ather Gattami

A forward algorithm for a class of Markov zero-sum stopping games

In this paper, we propose a new efficient algorithm to compute the value function for zero-sum stopping games featuring two players with opposing interests. This can be seen as a game version of the ''forward algorithm'' for (one-player)…

Probability · Mathematics 2026-02-03 Nhat-Thang Le

A tutorial on Zero-sum Stochastic Games

Zero-sum stochastic games generalize the notion of Markov Decision Processes (i.e. controlled Markov chains, or stochastic dynamic programming) to the 2-player competitive case : two players jointly control the evolution of a state…

Optimization and Control · Mathematics 2019-05-17 Jérôme Renault

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are…

Machine Learning · Computer Science 2022-10-17 Anna Winnicki , R. Srikant

Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games

Markov games provide a powerful framework for modeling strategic multi-agent interactions in dynamic environments. Traditionally, convergence properties of decentralized learning algorithms in these settings have been established only for…

Multiagent Systems · Computer Science 2025-06-13 Chinmay Maheshwari , Manxi Wu , Shankar Sastry

Zero-Sum Games for piecewise deterministic Markov decision processes with risk-sensitive finite-horizon cost criterion

This paper investigates the two-person zero-sum stochastic games for piece-wise deterministic Markov decision processes with risk-sensitive finite-horizon cost criterion on a general state space. Here, the transition and cost/reward rates…

Optimization and Control · Mathematics 2024-05-15 Subrata Golui

On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games

Similar to the role of Markov decision processes in reinforcement learning, Stochastic Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions. In this paper, we derive…

Computer Science and Game Theory · Computer Science 2023-01-12 Xiaotie Deng , Ningyuan Li , David Mguni , Jun Wang , Yaodong Yang

Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game

Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-collected dataset without further interactions with the environment. While various algorithms have been proposed for offline RL in the previous literature,…

Machine Learning · Computer Science 2023-03-02 Wei Xiong , Han Zhong , Chengshuai Shi , Cong Shen , Liwei Wang , Tong Zhang

Zero-Sum Semi-Markov Games with State-Action-Dependent Discount Factors

Semi-Markov model is one of the most general models for stochastic dynamic systems. This paper deals with a two-person zero-sum game for semi-Markov processes. We focus on the expected discounted payoff criterion with state-action-dependent…

Computer Science and Game Theory · Computer Science 2021-03-09 Zhihui Yu , Xianping Guo , Li Xia

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Model-based algorithms -- algorithms that explore the environment through building and utilizing an estimated model -- are widely used in reinforcement learning practice and theoretically shown to achieve optimal sample efficiency for…

Machine Learning · Computer Science 2021-02-09 Qinghua Liu , Tiancheng Yu , Yu Bai , Chi Jin

Near-Optimal Reinforcement Learning with Self-Play

This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct…

Machine Learning · Computer Science 2020-07-15 Yu Bai , Chi Jin , Tiancheng Yu

Smoothing Policy Iteration for Zero-sum Markov Games

Zero-sum Markov Games (MGs) has been an efficient framework for multi-agent systems and robust control, wherein a minimax problem is constructed to solve the equilibrium policies. At present, this formulation is well studied under tabular…

Machine Learning · Computer Science 2022-12-06 Yangang Ren , Yao Lyu , Wenxuan Wang , Shengbo Eben Li , Zeyang Li , Jingliang Duan

Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives

A classic solution technique for Markov decision processes (MDP) and stochastic games (SG) is value iteration (VI). Due to its good practical performance, this approximative approach is typically preferred over exact techniques, even though…

Artificial Intelligence · Computer Science 2023-04-21 Jan Křetínský , Tobias Meggendorfer , Maximilian Weininger

A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum Stochastic Games

We consider the problem of two-player zero-sum games. This problem is formulated as a min-max Markov game in the literature. The solution of this game, which is the min-max payoff, starting from a given state is called the min-max value of…

Machine Learning · Computer Science 2022-03-21 Raghuram Bharadwaj Diddigi , Chandramouli Kamanchi , Shalabh Bhatnagar