Related papers: Recursive Reward Aggregation

Reinforcement Learning for Joint Optimization of Multiple Rewards

Finding optimal policies which maximize long term rewards of Markov Decision Processes requires the use of dynamic programming and backward induction to solve the Bellman optimality equation. However, many real-world problems require…

Machine Learning · Computer Science 2023-01-10 Mridul Agarwal , Vaneet Aggarwal

Bellman Gradient Iteration for Inverse Reinforcement Learning

This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of…

Machine Learning · Computer Science 2017-07-26 Kun Li , Yanan Sui , Joel W. Burdick

Reinforcement Learning from Bagged Reward

In Reinforcement Learning (RL), it is commonly assumed that an immediate reward signal is generated for each action taken by the agent, helping the agent maximize cumulative rewards to obtain the optimal policy. However, in many real-world…

Machine Learning · Computer Science 2024-10-29 Yuting Tang , Xin-Qiang Cai , Yao-Xiang Ding , Qiyu Wu , Guoqing Liu , Masashi Sugiyama

Reward-Reinforced Reinforcement Learning for Multi-agent Systems

Reinforcement learning algorithms in multi-agent systems deliver highly resilient and adaptable solutions for common problems in telecommunications,aerospace, and industrial robotics. However, achieving an optimal global goal remains a…

Multiagent Systems · Computer Science 2021-05-18 Changgang Zheng , Shufan Yang , Juan Parra-Ullauri , Antonio Garcia-Dominguez , Nelly Bencomo

Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning

Fairness plays a crucial role in various multi-agent systems (e.g., communication networks, financial markets, etc.). Many multi-agent dynamical interactions can be cast as Markov Decision Processes (MDPs). While existing research has…

Machine Learning · Computer Science 2023-06-02 Peizhong Ju , Arnob Ghosh , Ness B. Shroff

A Unified Bellman Equation for Causal Information and Value in Markov Decision Processes

The interaction between an artificial agent and its environment is bi-directional. The agent extracts relevant information from the environment, and affects the environment by its actions in return to accumulate high expected reward.…

Systems and Control · Computer Science 2018-06-06 Stas Tiomkin , Naftali Tishby

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

Reinforcement learning (RL) algorithms assume that users specify tasks by manually writing down a reward function. However, this process can be laborious and demands considerable technical expertise. Can we devise RL algorithms that instead…

Machine Learning · Computer Science 2022-01-03 Benjamin Eysenbach , Sergey Levine , Ruslan Salakhutdinov

Reinforcement Learning Measurement Model

Interactive assessments generate sequential process data that are not well handled by conventional item response models. Existing MDP-based measurement approaches, such as the Markov decision process measurement model (MDP-MM, LaMar, 2018),…

Methodology · Statistics 2026-05-12 Wenqian Xu , Feng Ji

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to…

Machine Learning · Computer Science 2024-12-24 Zhengqi Wu , Renyuan Xu

Maximum Reward Formulation In Reinforcement Learning

Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon). However, several crucial applications in the real world, such as drug discovery,…

Machine Learning · Computer Science 2023-12-20 Sai Krishna Gottipati , Yashaswi Pathak , Rohan Nuttall , Sahir , Raviteja Chunduru , Ahmed Touati , Sriram Ganapathi Subramanian , Matthew E. Taylor , Sarath Chandar

Direct and indirect reinforcement learning

Reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. In this paper, we classify RL into direct and indirect RL according to how they seek the optimal…

Machine Learning · Computer Science 2021-05-12 Yang Guan , Shengbo Eben Li , Jingliang Duan , Jie Li , Yangang Ren , Qi Sun , Bo Cheng

Behavior Alignment via Reward Function Optimization

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that…

Machine Learning · Computer Science 2023-11-01 Dhawal Gupta , Yash Chandak , Scott M. Jordan , Philip S. Thomas , Bruno Castro da Silva

Reinforcement Learning with Non-Cumulative Objective

In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields,…

Machine Learning · Computer Science 2024-04-15 Wei Cui , Wei Yu

Online Inverse Reinforcement Learning via Bellman Gradient Iteration

This paper develops an online inverse reinforcement learning algorithm aimed at efficiently recovering a reward function from ongoing observations of an agent's actions. To reduce the computation time and storage space in reward estimation,…

Robotics · Computer Science 2017-08-01 Kun Li , Joel W. Burdick

Bellman Optimality of Average-Reward Robust Markov Decision Processes with a Constant Gain

Learning and optimal control under robust Markov decision processes (MDPs) have received increasing attention, yet most existing theory, algorithms, and applications focus on finite-horizon or discounted models. Long-run average-reward…

Optimization and Control · Mathematics 2025-12-12 Shengbo Wang , Nian Si

Portfolio Optimization under Recursive Utility via Reinforcement Learning

We study whether a risk-sensitive objective from asset-pricing theory -- recursive utility -- improves reinforcement learning for portfolio allocation. The Bellman equation under recursive utility involves a certainty equivalent (CE) of…

General Finance · Quantitative Finance 2026-03-25 Minkey Chang

Reward Maximisation through Discrete Active Inference

Active inference is a probabilistic framework for modelling the behaviour of biological and artificial agents, which derives from the principle of minimising free energy. In recent years, this framework has successfully been applied to a…

Artificial Intelligence · Computer Science 2022-07-13 Lancelot Da Costa , Noor Sajid , Thomas Parr , Karl Friston , Ryan Smith

Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning

Markov decision processes (MDPs) are used to model a wide variety of applications ranging from game playing over robotics to finance. Their optimal policy typically maximizes the expected sum of rewards given at each step of the decision…

Machine Learning · Computer Science 2025-05-26 Maximilian Nägele , Jan Olle , Thomas Fösel , Remmy Zen , Florian Marquardt

Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return

We study multi-objective reinforcement learning with nonlinear preferences over trajectories. That is, we maximize the expected value of a nonlinear function over accumulated rewards (expected scalarized return or ESR) in a multi-objective…

Machine Learning · Computer Science 2025-02-19 Nianli Peng , Muhang Tian , Brandon Fain

A unified view of entropy-regularized Markov decision processes

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to…

Machine Learning · Computer Science 2017-05-23 Gergely Neu , Anders Jonsson , Vicenç Gómez