Related papers: An Online Prediction Algorithm for Reinforcement L…

A Cross Entropy based Stochastic Approximation Algorithm for Reinforcement Learning with Linear Function Approximation

In this paper, we provide a new algorithm for the problem of prediction in Reinforcement Learning, \emph{i.e.}, estimating the Value Function of a Markov Reward Process (MRP) using the linear function approximation architecture, with memory…

Systems and Control · Computer Science 2016-09-30 Ajin George Joseph , Shalabh Bhatnagar

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations

We present an iterative inverse reinforcement learning algorithm to infer optimal cost functions in continuous spaces. Based on a popular maximum entropy criteria, our approach iteratively finds a weight improvement step and proposes a…

Machine Learning · Computer Science 2025-05-14 Sarmad Mehrdad , Avadesh Meduri , Ludovic Righetti

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…

Machine Learning · Computer Science 2021-08-24 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?

In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative…

Machine Learning · Computer Science 2024-11-19 Denis Tarasov , Kirill Brilliantov , Dmitrii Kharlapenko

Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time

In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the…

Machine Learning · Computer Science 2024-03-26 Abhijit Mazumdar , Rafal Wisniewski , Manuela L. Bujorianu

Renewal Monte Carlo: Renewal theory based reinforcement learning

In this paper, we present an online reinforcement learning algorithm, called Renewal Monte Carlo (RMC), for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm and retains the advantages…

Machine Learning · Computer Science 2018-04-05 Jayakumar Subramanian , Aditya Mahajan

Conformal Off-Policy Evaluation in Markov Decision Processes

Reinforcement Learning aims at identifying and evaluating efficient control policies from data. In many real-world applications, the learner is not allowed to experiment and cannot gather data in an online manner (this is the case when…

Machine Learning · Computer Science 2024-07-02 Daniele Foffano , Alessio Russo , Alexandre Proutiere

Provably Safe Reinforcement Learning for Stochastic Reach-Avoid Problems with Entropy Regularization

We consider the problem of learning the optimal policy for Markov decision processes with safety constraints. We formulate the problem in a reach-avoid setup. Our goal is to design online reinforcement learning algorithms that ensure safety…

Machine Learning · Computer Science 2026-01-21 Abhijit Mazumdar , Rafal Wisniewski , Manuela L. Bujorianu

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of…

Machine Learning · Computer Science 2025-07-25 Fan Chen , Zeyu Jia , Alexander Rakhlin , Tengyang Xie

Adapting the Function Approximation Architecture in Online Reinforcement Learning

The performance of a reinforcement learning (RL) system depends on the computational architecture used to approximate a value function. Deep learning methods provide both optimization techniques and architectures for approximating nonlinear…

Machine Learning · Computer Science 2021-06-21 John D. Martin , Joseph Modayil

Sample-efficient Cross-Entropy Method for Real-time Planning

Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency…

Machine Learning · Computer Science 2020-08-17 Cristina Pinneri , Shambhuraj Sawant , Sebastian Blaes , Jan Achterhold , Joerg Stueckler , Michal Rolinek , Georg Martius

A Cross Entropy based Optimization Algorithm with Global Convergence Guarantees

The cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure. The Monte-Carlo version of the CE method employs the naive sample averaging technique which is…

Artificial Intelligence · Computer Science 2018-02-01 Ajin George Joseph , Shalabh Bhatnagar

A Structure-aware Online Learning Algorithm for Markov Decision Processes

To overcome the curse of dimensionality and curse of modeling in Dynamic Programming (DP) methods for solving classical Markov Decision Process (MDP) problems, Reinforcement Learning (RL) algorithms are popular. In this paper, we consider…

Machine Learning · Computer Science 2018-11-29 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and…

Machine Learning · Computer Science 2024-03-12 Vincent Leon , S. Rasoul Etesami

Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks

A recently popular approach to solving reinforcement learning is with data from human preferences. In fact, human preference data are now used with classic reinforcement learning algorithms such as actor-critic methods, which involve…

Machine Learning · Computer Science 2024-02-28 Zihao Li , Xiang Ji , Minshuo Chen , Mengdi Wang

Weighted Maximum Entropy Inverse Reinforcement Learning

We study inverse reinforcement learning (IRL) and imitation learning (IM), the problems of recovering a reward or policy function from expert's demonstrated trajectories. We propose a new way to improve the learning process by adding a…

Machine Learning · Computer Science 2022-08-23 The Viet Bui , Tien Mai , Patrick Jaillet

Reinforcement Learning with Quasi-Hyperbolic Discounting

Reinforcement learning has traditionally been studied with exponential discounting or the average reward setup, mainly due to their mathematical tractability. However, such frameworks fall short of accurately capturing human behavior, which…

Machine Learning · Computer Science 2024-09-18 S. R. Eshwar , Mayank Motwani , Nibedita Roy , Gugan Thoppe

Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design

We study reinforcement learning from human feedback in general Markov decision processes, where agents learn from trajectory-level preference comparisons. A central challenge in this setting is to design algorithms that select informative…

Machine Learning · Computer Science 2025-12-05 Andreas Schlaginhaufen , Reda Ouhamma , Maryam Kamgarpour

Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Particularly, our work…

Machine Learning · Computer Science 2024-07-11 Dake Zhang , Boxiang Lyu , Shuang Qiu , Mladen Kolar , Tong Zhang