Dustin Morrill — Scifaro

Learning to Be Cautious

A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best…

Artificial Intelligence · Computer Science 2025-10-14 Montaser Mohammedalamen , Dustin Morrill , Alexander Sieusahai , Yash Satsangi , Michael Bowling

Composing Efficient, Robust Tests for Policy Selection

Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental…

Machine Learning · Computer Science 2023-06-14 Dustin Morrill , Thomas J. Walsh , Daniel Hernandez , Peter R. Wurman , Peter Stone

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents…

Computer Science and Game Theory · Computer Science 2022-06-24 Dustin Morrill , Ryan D'Orazio , Marc Lanctot , James R. Wright , Michael Bowling , Amy Greenwald

Hindsight and Sequential Rationality of Correlated Play

Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less…

Computer Science and Game Theory · Computer Science 2022-06-24 Dustin Morrill , Ryan D'Orazio , Reca Sarfati , Marc Lanctot , James R. Wright , Amy Greenwald , Michael Bowling

Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory. The NeuRD expected update is designed to be nearly identical to that…

Machine Learning · Computer Science 2022-06-07 Dustin Morrill , Esra'a Saleh , Michael Bowling , Amy Greenwald

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents…

Computer Science and Game Theory · Computer Science 2022-06-03 Dustin Morrill , Ryan D'Orazio , Marc Lanctot , James R. Wright , Michael Bowling , Amy R. Greenwald

The Partially Observable History Process

We introduce the partially observable history process (POHP) formalism for reinforcement learning. POHP centers around the actions and observations of a single agent and abstracts away the presence of other players without reducing them to…

Artificial Intelligence · Computer Science 2022-02-25 Dustin Morrill , Amy R. Greenwald , Michael Bowling

OpenSpiel: A Framework for Reinforcement Learning in Games

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and…

Machine Learning · Computer Science 2020-09-29 Marc Lanctot , Edward Lockhart , Jean-Baptiste Lespiau , Vinicius Zambaldi , Satyaki Upadhyay , Julien Pérolat , Sriram Srinivasan , Finbarr Timbers , Karl Tuyls , Shayegan Omidshafiei , Daniel Hennes , Dustin Morrill , Paul Muller , Timo Ewalds , Ryan Faulkner , János Kramár , Bart De Vylder , Brennan Saeta , James Bradbury , David Ding , Sebastian Borgeaud , Matthew Lai , Julian Schrittwieser , Thomas Anthony , Edward Hughes , Ivo Danihelka , Jonah Ryan-Davis

The Advantage Regret-Matching Actor-Critic

Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated…

Artificial Intelligence · Computer Science 2020-08-28 Audrūnas Gruslys , Marc Lanctot , Rémi Munos , Finbarr Timbers , Martin Schmid , Julien Perolat , Dustin Morrill , Vinicius Zambaldi , Jean-Baptiste Lespiau , John Schultz , Mohammad Gheshlaghi Azar , Michael Bowling , Karl Tuyls

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove…

Artificial Intelligence · Computer Science 2020-06-15 Edward Lockhart , Marc Lanctot , Julien Pérolat , Jean-Baptiste Lespiau , Dustin Morrill , Finbarr Timbers , Karl Tuyls

Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization

Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing. Regression counterfactual regret minimization (RCFR) is a…

Artificial Intelligence · Computer Science 2020-05-04 Ryan D'Orazio , Dustin Morrill , James R. Wright , Michael Bowling

Neural Replicator Dynamics

Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability.…

Machine Learning · Computer Science 2020-03-02 Daniel Hennes , Dustin Morrill , Shayegan Omidshafiei , Remi Munos , Julien Perolat , Marc Lanctot , Audrunas Gruslys , Jean-Baptiste Lespiau , Paavo Parmas , Edgar Duenez-Guzman , Karl Tuyls

Bounds for Approximate Regret-Matching Algorithms

A dominant approach to solving large imperfect-information games is Counterfactural Regret Minimization (CFR). In CFR, many regret minimization problems are combined to solve the game. For very large games, abstraction is typically needed…

Machine Learning · Computer Science 2019-12-02 Ryan D'Orazio , Dustin Morrill , James R. Wright

DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect…

Artificial Intelligence · Computer Science 2017-03-07 Matej Moravčík , Martin Schmid , Neil Burch , Viliam Lisý , Dustin Morrill , Nolan Bard , Trevor Davis , Kevin Waugh , Michael Johanson , Michael Bowling

Solving Games with Functional Regret Estimation

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these…

Artificial Intelligence · Computer Science 2015-01-05 Kevin Waugh , Dustin Morrill , J. Andrew Bagnell , Michael Bowling