Related papers: EVAL: EigenVector-based Average-reward Learning

Average-Reward Soft Actor-Critic

The average-reward formulation of reinforcement learning (RL) has drawn increased interest in recent years for its ability to solve temporally-extended problems without relying on discounting. Meanwhile, in the discounted setting,…

Machine Learning · Computer Science 2025-08-06 Jacob Adamczyk , Volodymyr Makarenko , Stas Tiomkin , Rahul V. Kulkarni

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

We develop theory and algorithms for average-reward on-policy Reinforcement Learning (RL). We first consider bounding the difference of the long-term average reward for two policies. We show that previous work based on the discounted return…

Machine Learning · Computer Science 2021-06-15 Yiming Zhang , Keith W. Ross

Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards

As the operations of autonomous systems generally affect simultaneously several users, it is crucial that their designs account for fairness considerations. In contrast to standard (deep) reinforcement learning (RL), we investigate the…

Artificial Intelligence · Computer Science 2020-08-19 Umer Siddique , Paul Weng , Matthieu Zimmer

Examining average and discounted reward optimality criteria in reinforcement learning

In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality criterion is fundamentally important. Two major optimality criteria are average and discounted rewards. While the latter is more popular, it…

Machine Learning · Computer Science 2022-09-05 Vektor Dewanto , Marcus Gallagher

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology,…

Machine Learning · Computer Science 2024-08-26 Vaneet Aggarwal , Washim Uddin Mondal , Qinbo Bai

Regularized Policies are Reward Robust

Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for…

Machine Learning · Computer Science 2021-01-19 Hisham Husain , Kamil Ciosek , Ryota Tomioka

Two Kinds of Learning Algorithms for Continuous-Time VWAP Targeting Execution

The optimal execution problem has always been a continuously focused research issue, and many reinforcement learning (RL) algorithms have been studied. In this article, we consider the execution problem of targeting the volume weighted…

Optimization and Control · Mathematics 2024-11-12 Xingyu Zhou , Wenbin Chen , Mingyu Xu

Average Reward Adjusted Discounted Reinforcement Learning: Near-Blackwell-Optimal Policies for Real-World Applications

Although in recent years reinforcement learning has become very popular the number of successful applications to different kinds of operations research problems is rather scarce. Reinforcement learning is based on the well-studied dynamic…

Machine Learning · Computer Science 2020-04-03 Manuel Schneckenreither

Average-Reward Reinforcement Learning with Trust Region Methods

Most of reinforcement learning algorithms optimize the discounted criterion which is beneficial to accelerate the convergence and reduce the variance of estimates. Although the discounted criterion is appropriate for certain tasks such as…

Machine Learning · Computer Science 2021-11-02 Xiaoteng Ma , Xiaohang Tang , Li Xia , Jun Yang , Qianchuan Zhao

An Online Multiobjective Policy Gradient for Long-run Average-reward Markov Decision Process

We propose a reinforcement learning (RL) framework for multi-objective decision-making, where the agent seeks to optimize a vector of rewards rather than a single scalar value. The objective is to ensure that the time-averaged reward vector…

Systems and Control · Electrical Eng. & Systems 2025-11-18 Rahul Misra , Manuela L. Bujorianu , Rafał Wisniewski

ELO-Rated Sequence Rewards: Advancing Reinforcement Learning Models

Reinforcement Learning (RL) heavily relies on the careful design of the reward function. However, accurately assigning rewards to each state-action pair in Long-Term Reinforcement Learning (LTRL) tasks remains a significant challenge. As a…

Machine Learning · Computer Science 2025-06-03 Qi Ju , Falin Hei , Zhemei Fang , Yunfeng Luo

On Reward-Balancing Methods for Reinforcement Learning

This paper investigates the so-called reward-balancing methods, a novel class of algorithms for solving discounted-return reinforcement learning (RL) problems. These methods consist of iteratively adjusting the reward function to transform…

Optimization and Control · Mathematics 2026-04-23 Simone Baroncini , Bahman Gharesifard , Giuseppe Notarstefano

Adversarial Imitation via Variational Inverse Reinforcement Learning

We consider a problem of learning the reward and policy from expert examples under unknown dynamics. Our proposed method builds on the framework of generative adversarial networks and introduces the empowerment-regularized maximum-entropy…

Machine Learning · Computer Science 2019-02-26 Ahmed H. Qureshi , Byron Boots , Michael C. Yip

A Differential Perspective on Distributional Reinforcement Learning

To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a discounted sum of rewards over time. In this work, we extend distributional RL…

Machine Learning · Computer Science 2026-01-14 Juan Sebastian Rojas , Chi-Guhn Lee

Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

In tabular multi-agent reinforcement learning with average-cost criterion, a team of agents sequentially interacts with the environment and observes local incentives. We focus on the case that the global reward is a sum of local rewards,…

Optimization and Control · Mathematics 2021-10-26 Alec Koppel , Amrit Singh Bedi , Bhargav Ganguly , Vaneet Aggarwal

Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives

Recent advances in reinforcement learning (RL) have renewed interest in reward design for shaping agent behavior, but manually crafting reward functions is tedious and error-prone. A principled alternative is to specify behavioral…

Artificial Intelligence · Computer Science 2026-03-23 Milad Kazemi , Mateo Perez , Fabio Somenzi , Sadegh Soudjani , Ashutosh Trivedi , Alvaro Velasquez

A unified view of entropy-regularized Markov decision processes

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to…

Machine Learning · Computer Science 2017-05-23 Gergely Neu , Anders Jonsson , Vicenç Gómez

Evolutionary Reinforcement Learning: A Survey

Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements…

Neural and Evolutionary Computing · Computer Science 2023-08-31 Hui Bai , Ran Cheng , Yaochu Jin

Statistical analysis of Inverse Entropy-regularized Reinforcement Learning

Inverse reinforcement learning aims to infer the reward function that explains expert behavior observed through trajectories of state--action pairs. A long-standing difficulty in classical IRL is the non-uniqueness of the recovered reward:…

Machine Learning · Statistics 2025-12-09 Denis Belomestny , Alexey Naumov , Sergey Samsonov

Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks

This report presents a solution for the swing-up and stabilisation tasks of the acrobot and the pendubot, developed for the AI Olympics competition at IROS 2024. Our approach employs the Average-Reward Entropy Advantage Policy Optimization…

Robotics · Computer Science 2024-09-16 Jean Seong Bjorn Choe , Bumkyu Choi , Jong-kook Kim