Related papers: A conversion between utility and information
Recent literature in the last Maximum Entropy workshop introduced an analogy between cumulative probability distributions and normalized utility functions. Based on this analogy, a utility density function can de defined as the derivative…
The maximum entropy principle can be used to assign utility values when only partial information is available about the decision maker's preferences. In order to obtain such utility values it is necessary to establish an analogy between…
We consider an agent interacting with an unknown environment. The environment is a function which maps natural numbers to natural numbers; the agent's set of hypotheses about the environment contains all such functions which are computable…
We initiate a novel direction in randomized social choice by proposing a new definition of agent utility for randomized outcomes. Each agent has a preference over all outcomes and a {\em quantile} parameter. Given a {\em lottery} over the…
Snapshots of "best" (or "worst") experience are known to dominate human memory and may thus also have a significant effect on future behaviour. We consider here a model of repeated decision-making where, at every time step, an agent takes…
Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's…
Maximum entropy reinforcement learning motivates agents to explore states and actions to maximize the entropy of some distribution, typically by providing additional intrinsic rewards proportional to that entropy function. In this paper, we…
The utility company has many motivations for modifying energy consumption patterns of consumers such as revenue decoupling and demand response programs. We model the utility company--consumer interaction as a principal--agent problem. We…
Reinforcement Learning (RL) models have continually evolved to navigate the exploration - exploitation trade-off in uncertain Markov Decision Processes (MDPs). In this study, I leverage the principles of stochastic thermodynamics and system…
A decision maker's utility depends on her action $a\in A \subset \mathbb{R}^d$ and the payoff relevant state of the world $\theta\in \Theta$. One can define the value of acquiring new information as the difference between the maximum…
We propose and design recommendation systems that incentivize efficient exploration. Agents arrive sequentially, choose actions and receive rewards, drawn from fixed but unknown action-specific distributions. The recommendation system…
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new…
Recent work has formalized the reward hypothesis through the lens of expected utility theory, by interpreting reward as utility. Hausner's foundational work showed that dropping the continuity axiom leads to a generalization of expected…
This paper reports experimental data describing the dynamics of three key information-sharing outcomes: quantity of information shared, falsification and accuracy. The experimental design follows a formal model predicting that cooperative…
In this work we investigate the inefficiency of the electricity system with strategic agents. Specifically, we prove that without a proper control the total demand of an inefficient system is at most twice the total demand of the optimal…
Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of…
Human interactions are influenced by emotions, temperament, and affection, often conflicting with individuals' underlying preferences. Without explicit knowledge of those preferences, judging whether behaviour is appropriate becomes…
Active inference, a corollary of the free energy principle, is a formal way of describing the behavior of certain kinds of random dynamical systems that have the appearance of sentience. In this chapter, we describe how active inference…
We consider a sequence of repeated interactions between an agent and an environment. Uncertainty about the environment is captured by a probability distribution over a space of hypotheses, which includes all computable functions. Given a…
Reward functions are central in specifying the task we want a reinforcement learning agent to perform. Given a task and desired optimal behavior, we study the problem of designing informative reward functions so that the designed rewards…