Related papers: Relating Reinforcement Learning to Dynamic Program…

Examining average and discounted reward optimality criteria in reinforcement learning

In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality criterion is fundamentally important. Two major optimality criteria are average and discounted rewards. While the latter is more popular, it…

Machine Learning · Computer Science 2022-09-05 Vektor Dewanto , Marcus Gallagher

Analyzing and Bridging the Gap between Maximizing Total Reward and Discounted Reward in Deep Reinforcement Learning

The optimal objective is a fundamental aspect of reinforcement learning (RL), as it determines how policies are evaluated and optimized. While total return maximization is the ideal objective in RL, discounted return maximization is the…

Machine Learning · Computer Science 2025-03-19 Shuyu Yin , Fei Wen , Peilin Liu , Tao Luo

A Comparative Study of Dynamic Programming and Reinforcement Learning in Finite Horizon Dynamic Pricing

This paper provides a systematic comparison between Fitted Dynamic Programming (DP), where demand is estimated from data, and Reinforcement Learning (RL) methods in finite-horizon dynamic pricing problems. We analyze their performance…

General Economics · Economics 2026-04-16 Lev Razumovskiy , Nikolay Karenin

Delayed Geometric Discounts: An Alternative Criterion for Reinforcement Learning

The endeavor of artificial intelligence (AI) is to design autonomous agents capable of achieving complex tasks. Namely, reinforcement learning (RL) proposes a theoretical background to learn optimal behaviors. In practice, RL algorithms…

Machine Learning · Computer Science 2022-09-27 Firas Jarboui , Ahmed Akakzia

Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control

Goal-conditioned reinforcement learning (RL) concerns the problem of training an agent to maximize the probability of reaching target goal states. This paper presents an analysis of the goal-conditioned setting based on optimal control. In…

Machine Learning · Computer Science 2026-05-15 Nathan P. Lawrence , Ali Mesbah

Discount Factor as a Regularizer in Reinforcement Learning

Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor. It is known that applying RL algorithms with a lower discount factor can act as a regularizer,…

Machine Learning · Computer Science 2020-07-07 Ron Amit , Ron Meir , Kamil Ciosek

Reward-Machine-Guided, Self-Paced Reinforcement Learning

Self-paced reinforcement learning (RL) aims to improve the data efficiency of learning by automatically creating sequences, namely curricula, of probability distributions over contexts. However, existing techniques for self-paced RL fail in…

Machine Learning · Computer Science 2023-05-29 Cevahir Koprulu , Ufuk Topcu

To the Max: Reinventing Reward in Reinforcement Learning

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the…

Machine Learning · Computer Science 2025-02-25 Grigorii Veviurko , Wendelin Böhmer , Mathijs de Weerdt

CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment. The learning typically happens through trial and error using explorative methods, such as epsilon-greedy. There are…

Machine Learning · Computer Science 2022-10-06 Per-Arne Andersen , Morten Goodwin , Ole-Christoffer Granmo

Resilient Constrained Reinforcement Learning

We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before training. It is challenging to identify appropriate constraint specifications due to the undefined…

Optimization and Control · Mathematics 2024-01-02 Dongsheng Ding , Zhengyan Huan , Alejandro Ribeiro

Reinforcement Learning with $\omega$-Regular Objectives and Constraints

Reinforcement learning (RL) commonly relies on scalar rewards with limited ability to express temporal, conditional, or safety-critical goals, and can lead to reward hacking. Temporal logic expressible via the more general class of…

Artificial Intelligence · Computer Science 2025-11-26 Dominik Wagner , Leon Witzman , Luke Ong

A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning

Reinforcement Learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from…

Methodology · Statistics 2022-10-06 Mauricio Tec , Yunshan Duan , Peter Müller

Risk-Sensitive and Robust Model-Based Reinforcement Learning and Planning

Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and…

Machine Learning · Computer Science 2023-04-04 Marc Rigter

Searching for Plannable Domains can Speed up Reinforcement Learning

Reinforcement learning (RL) involves sequential decision making in uncertain environments. The aim of the decision-making agent is to maximize the benefit of acting in its environment over an extended period of time. Finding an optimal…

Artificial Intelligence · Computer Science 2007-05-23 Istvan Szita , Balint Takacs , Andras Lorincz

Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards

As the operations of autonomous systems generally affect simultaneously several users, it is crucial that their designs account for fairness considerations. In contrast to standard (deep) reinforcement learning (RL), we investigate the…

Artificial Intelligence · Computer Science 2020-08-19 Umer Siddique , Paul Weng , Matthieu Zimmer

Model-based Reinforcement Learning: A Survey

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper…

Machine Learning · Computer Science 2022-04-01 Thomas M. Moerland , Joost Broekens , Aske Plaat , Catholijn M. Jonker

On Reward-Balancing Methods for Reinforcement Learning

This paper investigates the so-called reward-balancing methods, a novel class of algorithms for solving discounted-return reinforcement learning (RL) problems. These methods consist of iteratively adjusting the reward function to transform…

Optimization and Control · Mathematics 2026-04-23 Simone Baroncini , Bahman Gharesifard , Giuseppe Notarstefano

Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives

The performance of reinforcement learning (RL) algorithms is sensitive to the choice of hyperparameters, with the learning rate being particularly influential. RL algorithms fail to reach convergence or demand an extensive number of samples…

Machine Learning · Computer Science 2024-08-09 Aida Afshar , Aldo Pacchiano

Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies

The objective comparison of Reinforcement Learning (RL) algorithms is notoriously complex as outcomes and benchmarking of performances of different RL approaches are critically sensitive to environmental design, reward structures, and…

Machine Learning · Computer Science 2026-03-19 Sinan Ibrahim , Grégoire Ouerdane , Hadi Salloum , Henni Ouerdane , Stefan Streif , Pavel Osinenko

Combining Automated Optimisation of Hyperparameters and Reward Shape

There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies…

Machine Learning · Computer Science 2024-10-10 Julian Dierkes , Emma Cramer , Holger H. Hoos , Sebastian Trimpe