Author

Tom Stepleton

results may include different authors with the same name

5 papers

Counterfactual Credit Assignment in Model-Free Reinforcement Learning

Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of…

Machine Learning · Computer Science 2021-12-15 Thomas Mesnard , Théophane Weber , Fabio Viola , Shantanu Thakoor , Alaa Saade , Anna Harutyunyan , Will Dabney , Tom Stepleton , Nicolas Heess , Arthur Guez , Éric Moulines , Marcus Hutter , Lars Buesing , Rémi Munos

Ethical and social risks of harm from Language Models

This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed.…

Computation and Language · Computer Science 2021-12-09 Laura Weidinger , John Mellor , Maribeth Rauh , Conor Griffin , Jonathan Uesato , Po-Sen Huang , Myra Cheng , Mia Glaese , Borja Balle , Atoosa Kasirzadeh , Zac Kenton , Sasha Brown , Will Hawkins , Tom Stepleton , Courtney Biles , Abeba Birhane , Julia Haas , Laura Rimell , Lisa Anne Hendricks , William Isaac , Sean Legassick , Geoffrey Irving , Iason Gabriel

Wasserstein Fair Classification

We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances. The approach has desirable theoretical properties and is robust to…

Machine Learning · Statistics 2019-07-30 Ray Jiang , Aldo Pacchiano , Tom Stepleton , Heinrich Jiang , Silvia Chiappa

Safe and Efficient Off-Policy Reinforcement Learning

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace($\lambda$), with three desired properties: (1) it…

Machine Learning · Computer Science 2016-11-09 Rémi Munos , Tom Stepleton , Anna Harutyunyan , Marc G. Bellemare

Q($\lambda$) with Off-Policy Corrections

We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of…

Artificial Intelligence · Computer Science 2016-08-12 Anna Harutyunyan , Marc G. Bellemare , Tom Stepleton , Remi Munos