Related papers: A Strategic Learning Algorithm for State-based Gam…
Stochastic dynamic teams and games are rich models for decentralized systems and challenging testing grounds for multi-agent learning. Previous work that guaranteed team optimality assumed stateless dynamics, or an explicit coordination…
It is known that there are uncoupled learning heuristics leading to Nash equilibrium in all finite games. Why should players use such learning heuristics and where could they come from? We show that there is no uncoupled learning heuristic…
This paper presents a non-manual design engineering method based on heuristic search algorithm to search for candidate agents in the solution space which formed by artificial intelligence agents modeled on the base of bionics.Compared with…
When a game involves many agents or when communication between agents is not possible, it is useful to resort to distributed learning where each agent acts in complete autonomy without any information on the other agents' situations.…
Learning in games has been widely used to solve many cooperative multi-agent problems such as coverage control, consensus, self-reconfiguration or vehicle-target assignment. One standard approach in this domain is to formulate the problem…
This paper presents new families of algorithms for the repeated play of two-agent (near) zero-sum games and two-agent zero-sum stochastic games. For example, the family includes fictitious play and its variants as members. Commonly, the…
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each…
This paper examines the convergence of no-regret learning in games with continuous action sets. For concreteness, we focus on learning via "dual averaging", a widely used class of no-regret learning schemes where players take small steps…
We introduce a two-player model of reinforcement learning with memory. Past actions of an iterated game are stored in a memory and used to determine player's next action. To examine the behaviour of the model some approximate methods are…
In this paper, we extend the Descent framework, which enables learning and planning in the context of two-player games with perfect information, to the framework of stochastic games. We propose two ways of doing this, the first way…
We consider an infinite collection of agents who make decisions, sequentially, about an unknown underlying binary state of the world. Each agent, prior to making a decision, receives an independent private signal whose distribution depends…
Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and…
We introduce robust learning equilibrium. The idea of learning equilibrium is that learning algorithms in multi-agent systems should themselves be in equilibrium rather than only lead to equilibrium. That is, learning equilibrium is immune…
Reinforcement learning (RL) has recently achieved tremendous successes in many artificial intelligence applications. Many of the forefront applications of RL involve multiple agents, e.g., playing chess and Go games, autonomous driving, and…
Stochastic games have become a prevalent framework for studying long-term multi-agent interactions, especially in the context of multi-agent reinforcement learning. In this work, we comprehensively investigate the concept of constant-memory…
Reinforcement learning from self-play has recently reported many successes. Self-play, where the agents compete with themselves, is often used to generate training data for iterative policy improvement. In previous work, heuristic rules are…
In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response. We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an…
In this paper, we examine the long-run behavior of regularized, no-regret learning in finite games. A well-known result in the field states that the empirical frequencies of no-regret play converge to the game's set of coarse correlated…
Motivated by the scarcity of accurate payoff feedback in practical applications of game theory, we examine a class of learning dynamics where players adjust their choices based on past payoff observations that are subject to noise and…
We develop a method based on computer algebra systems to represent the mutual pure strategy best-response dynamics of symmetric two-player, two-action repeated games played by players with a one-period memory. We apply this method to the…