English
Related papers

Related papers: Thompson Sampling Algorithm for Stochastic Games

200 papers

We study Bayesian learning in episodic, finite-horizon zero-sum Markov games with unknown transition and reward models. We investigate a posterior algorithm in which each player maintains a Bayesian posterior over the game model,…

Machine Learning · Computer Science 2026-03-24 Chang-Wei Yueh , Andy Zhao , Ashutosh Nayyar , Rahul Jain

When two players are engaged in a repeated game with unknown payoff matrices, they may use single-agent multi-armed bandit algorithms to choose the actions independent of each other. We show that when the players use Thompson sampling, the…

Computer Science and Game Theory · Computer Science 2025-09-30 Yi Xiong , Ningyuan Chen , Xuefeng Gao

No-regret learning has been widely used to compute a Nash equilibrium in two-person zero-sum games. However, there is still a lack of regret analysis for network stochastic zero-sum games, where players competing in two subnetworks only…

Optimization and Control · Mathematics 2022-05-31 Shijie Huang , Jinlong Lei , Yiguang Hong

We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents through the mean-field (i.e., empirical mean) of the states and controls. Directly using…

Systems and Control · Electrical Eng. & Systems 2020-11-11 Mukul Gagrani , Sagar Sudhakara , Aditya Mahajan , Ashutosh Nayyar , Yi Ouyang

Motivated by the scarcity of accurate payoff feedback in practical applications of game theory, we examine a class of learning dynamics where players adjust their choices based on past payoff observations that are subject to noise and…

Optimization and Control · Mathematics 2016-06-03 Mario Bravo , Panayotis Mertikopoulos

We consider the problem of simultaneous learning in stochastic games with many players in the finite-horizon setting. While the typical target solution for a stochastic game is a Nash equilibrium, this is intractable with many players. We…

Computer Science and Game Theory · Computer Science 2022-10-27 William Brown

In game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player's perspective. In this context, the performance of a learning algorithm is often…

Computer Science and Game Theory · Computer Science 2021-10-19 Yu-Guan Hsieh , Kimon Antonakopoulos , Panayotis Mertikopoulos

This work tackles the complexities of multi-player scenarios in \emph{unknown games}, where the primary challenge lies in navigating the uncertainty of the environment through bandit feedback alongside strategic decision-making. We…

Machine Learning · Computer Science 2024-02-27 Yingru Li , Liangqi Liu , Wenqiang Pu , Hao Liang , Zhi-Quan Luo

Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space. We propose a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling. The algorithm…

Machine Learning · Computer Science 2019-06-04 Zeyu Jia , Lin F. Yang , Mengdi Wang

In this work, we study potential games and Markov potential games under stochastic cost and bandit feedback. We propose a variant of the Frank-Wolfe algorithm with sufficient exploration and recursive gradient estimation, which provably…

Computer Science and Game Theory · Computer Science 2024-04-11 Jing Dong , Baoxiang Wang , Yaoliang Yu

In this paper, we study the distributed generalized Nash equilibrium seeking problem of non-cooperative games in dynamic environments. Each player in the game aims to minimize its own time-varying cost function subject to a local action…

Optimization and Control · Mathematics 2020-04-02 Kaihong Lu , Guangqi Li , Long Wang

Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. We consider a variant of this problem where the game payoff matrix changes over time, possibly in an adversarial…

Machine Learning · Computer Science 2022-02-01 Mengxiao Zhang , Peng Zhao , Haipeng Luo , Zhi-Hua Zhou

We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$,…

Machine Learning · Computer Science 2024-05-03 Bingshan Hu , Zhiming Huang , Tianyue H. Zhang , Mathias Lécuyer , Nidhi Hegde

We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems,…

Machine Learning · Statistics 2021-06-11 Taira Tsuchiya , Junya Honda , Masashi Sugiyama

We consider two classes of constrained finite state-action stochastic games. First, we consider a two player nonzero sum single controller constrained stochastic game with both average and discounted cost criterion. We consider the same…

Optimization and Control · Mathematics 2012-06-11 Vikas Vikram Singh , N. Hemachandra

This paper investigates online stochastic aggregative games subject to local set constraints and time-varying coupled inequality constraints, where each player possesses a time-varying expectation-valued cost function relying on not only…

Optimization and Control · Mathematics 2025-11-18 Kaixin Du , Min Meng

In this paper, we consider two-player zero-sum matrix and stochastic games and develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players. Specifically, the learning dynamics for matrix…

Machine Learning · Computer Science 2024-09-06 Zaiwei Chen , Kaiqing Zhang , Eric Mazumdar , Asuman Ozdaglar , Adam Wierman

We consider the trade-off problem between exploration and exploitation under finite discounted Markov Decision Process, where the state transition matrix of the underlying environment stays unknown. We propose a double Thompson sampling…

Machine Learning · Computer Science 2022-03-01 Shuqing Shi , Xiaobin Wang , Zhiyou Yang , Fan Zhang , Hong Qu

We propose a Thompson sampling-based learning algorithm for the Linear Quadratic (LQ) control problem with unknown system parameters. The algorithm is called Thompson sampling with dynamic episodes (TSDE) where two stopping criteria…

Systems and Control · Computer Science 2017-09-14 Yi Ouyang , Mukul Gagrani , Rahul Jain

Thompson sampling (TS) is a powerful and widely used strategy for sequential decision-making, with applications ranging from Bayesian optimization to reinforcement learning (RL). Despite its success, the theoretical foundations of TS remain…

Machine Learning · Computer Science 2025-10-24 Jasmine Bayrooti , Sattar Vakili , Amanda Prorok , Carl Henrik Ek
‹ Prev 1 2 3 10 Next ›