Related papers: Exploiting Language Instructions for Interpretable…

Compositional Learning of Visually-Grounded Concepts Using Reinforcement

Children can rapidly generalize compositionally-constructed rules to unseen test sets. On the other hand, deep reinforcement learning (RL) agents need to be trained over millions of episodes, and their ability to generalize to unseen…

Machine Learning · Computer Science 2024-05-06 Zijun Lin , Haidi Azaman , M Ganesh Kumar , Cheston Tan

Interpretable Reinforcement Learning with Multilevel Subgoal Discovery

We propose a novel Reinforcement Learning model for discrete environments, which is inherently interpretable and supports the discovery of deep subgoal hierarchies. In the model, an agent learns information about environment in the form of…

Artificial Intelligence · Computer Science 2022-02-16 Alexander Demin , Denis Ponomaryov

Concept Learning for Interpretable Multi-Agent Reinforcement Learning

Multi-agent robotic systems are increasingly operating in real-world environments in close proximity to humans, yet are largely controlled by policy models with inscrutable deep neural network representations. We introduce a method for…

Machine Learning · Computer Science 2023-02-24 Renos Zabounidis , Joseph Campbell , Simon Stepputtis , Dana Hughes , Katia Sycara

Teachable Reinforcement Learning via Advice Distillation

Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access…

Machine Learning · Computer Science 2023-02-21 Olivia Watkins , Trevor Darrell , Pieter Abbeel , Jacob Andreas , Abhishek Gupta

Learning Interpretable Classifiers for PDDL Planning

We consider the problem of synthesizing interpretable models that recognize the behaviour of an agent compared to other agents, on a whole set of similar planning tasks expressed in PDDL. Our approach consists in learning logical formulas,…

Artificial Intelligence · Computer Science 2024-10-15 Arnaud Lequen

Tell me why! Explanations support learning relational and causal structure

Inferring the abstract relational and causal structure of the world is a major challenge for reinforcement-learning (RL) agents. For humans, language--particularly in the form of explanations--plays a considerable role in overcoming this…

Machine Learning · Computer Science 2022-05-26 Andrew K. Lampinen , Nicholas A. Roy , Ishita Dasgupta , Stephanie C. Y. Chan , Allison C. Tam , James L. McClelland , Chen Yan , Adam Santoro , Neil C. Rabinowitz , Jane X. Wang , Felix Hill

Complementary reinforcement learning towards explainable agents

Reinforcement learning (RL) algorithms allow agents to learn skills and strategies to perform complex tasks without detailed instructions or expensive labelled training examples. That is, RL agents can learn, as we learn. Given the…

Machine Learning · Computer Science 2019-01-25 Jung Hoon Lee

REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability

Understanding the agent's learning process, particularly the factors that contribute to its success or failure post-training, is crucial for comprehending the rationale behind the agent's decision-making process. Prior methods clarify the…

Artificial Intelligence · Computer Science 2024-10-15 Shuang Ao , Simon Khan , Haris Aziz , Flora D. Salim

Programmable Agents

We build deep RL agents that execute declarative programs expressed in formal language. The agents learn to ground the terms in this language in their environment, and can generalize their behavior at test time to execute new programs that…

Artificial Intelligence · Computer Science 2017-06-21 Misha Denil , Sergio Gómez Colmenarejo , Serkan Cabi , David Saxton , Nando de Freitas

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

The increased complexity of state-of-the-art reinforcement learning (RL) algorithms have resulted in an opacity that inhibits explainability and understanding. This has led to the development of several post-hoc explainability methods that…

Machine Learning · Computer Science 2022-03-25 Charl Maree , Christian Omlin

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Machine Learning models become increasingly proficient in complex tasks. However, even for experts in the field, it can be difficult to understand what the model learned. This hampers trust and acceptance, and it obstructs the possibility…

Machine Learning · Computer Science 2018-07-24 Jasper van der Waa , Jurriaan van Diggelen , Karel van den Bosch , Mark Neerincx

Linear Classifiers that Encourage Constructive Adaptation

Machine learning systems are often used in settings where individuals adapt their features to obtain a desired outcome. In such settings, strategic behavior leads to a sharp loss in model performance in deployment. In this work, we aim to…

Machine Learning · Computer Science 2021-06-11 Yatong Chen , Jialu Wang , Yang Liu

Mechanistic Interpretability of Reinforcement Learning Agents

This paper explores the mechanistic interpretability of reinforcement learning (RL) agents through an analysis of a neural network trained on procedural maze environments. By dissecting the network's inner workings, we identified…

Machine Learning · Computer Science 2024-11-05 Tristan Trim , Triston Grayston

Learning Compositional Negation in Populations of Roth-Erev and Neural Agents

Agent-based models and signalling games are useful tools with which to study the emergence of linguistic communication in a tractable setting. These techniques have been used to study the compositional property of natural languages, but…

Multiagent Systems · Computer Science 2020-12-09 Graham Todd , Shane Steinert-Threlkeld , Christopher Potts

Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning

Machine learning is now ubiquitous in societal decision-making, for example in evaluating job candidates or loan applications, and it is increasingly important to take into account how classified agents will react to the learning…

Machine Learning · Computer Science 2025-08-08 Dravyansh Sharma , Alec Sun

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time. Some policy interpretability methods facilitate this by capturing the…

Machine Learning · Computer Science 2022-03-22 Julius Frost , Olivia Watkins , Eric Weiner , Pieter Abbeel , Trevor Darrell , Bryan Plummer , Kate Saenko

Explaining Agent's Decision-making in a Hierarchical Reinforcement Learning Scenario

Reinforcement learning is a machine learning approach based on behavioral psychology. It is focused on learning agents that can acquire knowledge and learn to carry out new tasks by interacting with the environment. However, a problem…

Artificial Intelligence · Computer Science 2022-12-15 Hugo Muñoz , Ernesto Portugal , Angel Ayala , Bruno Fernandes , Francisco Cruz

Interpretable Imitation Learning with Dynamic Causal Relations

Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret…

Machine Learning · Computer Science 2024-01-31 Tianxiang Zhao , Wenchao Yu , Suhang Wang , Lu Wang , Xiang Zhang , Yuncong Chen , Yanchi Liu , Wei Cheng , Haifeng Chen

Policy Regularization for Legible Behavior

In Reinforcement Learning interpretability generally means to provide insight into the agent's mechanisms such that its decisions are understandable by an expert upon inspection. This definition, with the resulting methods from the…

Artificial Intelligence · Computer Science 2022-03-10 Michele Persiani , Thomas Hellström

Learning Classifier Systems for Self-Explaining Socio-Technical-Systems

In socio-technical settings, operators are increasingly assisted by decision support systems. By employing these, important properties of socio-technical systems such as self-adaptation and self-optimization are expected to improve further.…

Human-Computer Interaction · Computer Science 2022-07-07 Michael Heider , Helena Stegherr , Richard Nordsieck , Jörg Hähner