Related papers: Logistic Q-Learning

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type…

Machine Learning · Computer Science 2026-05-11 Gugan Thoppe , L. A. Prashanth , Ankur Naskar , Sanjay Bhat

Reinforcement Learning under Model Mismatch

We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the…

Machine Learning · Computer Science 2017-11-10 Aurko Roy , Huan Xu , Sebastian Pokutta

Efficient Algorithms for Mitigating Uncertainty and Risk in Reinforcement Learning

This dissertation makes three main contributions. First, We identify a new connection between policy gradient and dynamic programming in MMDPs and propose the Coordinate Ascent Dynamic Programming (CADP) algorithm to compute a Markov policy…

Machine Learning · Computer Science 2025-10-21 Xihong Su

Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization

Reinforcement learning (RL) is a classical tool to solve network control or policy optimization problems in unknown environments. The original Q-learning suffers from performance and complexity challenges across very large networks. Herein,…

Machine Learning · Computer Science 2024-09-02 Talha Bozkus , Urbashi Mitra

Convex Q-Learning, Part 1: Deterministic Optimal Control

It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging: does the projected Bellman equation have a solution? If so, is the solution useful in the sense of generating a good…

Optimization and Control · Mathematics 2020-08-11 Prashant G. Mehta , Sean P. Meyn

A Kernel Loss for Solving the Bellman Equation

Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variant of…

Machine Learning · Computer Science 2020-01-10 Yihao Feng , Lihong Li , Qiang Liu

A General Markov Decision Process Framework for Directly Learning Optimal Control Policies

We consider a new form of reinforcement learning (RL) that is based on opportunities to directly learn the optimal control policy and a general Markov decision process (MDP) framework devised to support these opportunities. Derivations of…

Machine Learning · Computer Science 2021-04-02 Yingdong Lu , Mark S. Squillante , Chai Wah Wu

Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs

Q-learning is a popular Reinforcement Learning (RL) algorithm which is widely used in practice with function approximation (Mnih et al., 2015). In contrast, existing theoretical results are pessimistic about Q-learning. For example, (Baird,…

Machine Learning · Computer Science 2021-10-20 Naman Agarwal , Syomantak Chaudhuri , Prateek Jain , Dheeraj Nagaraj , Praneeth Netrapalli

Robust Losses for Learning Value Functions

Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and…

Machine Learning · Computer Science 2023-04-19 Andrew Patterson , Victor Liao , Martha White

Data-Efficient Quadratic Q-Learning Using LMIs

Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q-learning. In these…

Systems and Control · Electrical Eng. & Systems 2025-04-09 J. S. van Hulst , W. P. M. H. Heemels , D. J. Antunes

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…

Machine Learning · Computer Science 2021-08-24 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

Reinforcement Learning for Optimal Stopping in POMDPs with Application to Quickest Change Detection

The field of quickest change detection (QCD) focuses on the design and analysis of online algorithms that estimate the time at which a significant event occurs. In this paper, design and analysis are cast in a Bayesian framework, where QCD…

Optimization and Control · Mathematics 2025-12-30 Austin Cooper , Sean Meyn

Linear Dynamics meets Linear MDPs: Closed-Form Optimal Policies via Reinforcement Learning

Many applications -- including power systems, robotics, and economics -- involve a dynamical system interacting with a stochastic and hard-to-model environment. We adopt a reinforcement learning approach to control such systems.…

Optimization and Control · Mathematics 2025-08-26 Abed AlRahman Al Makdah , Oliver Kosut , Lalitha Sankar , Shaofeng Zou

Regularized Q-Learning with Linear Function Approximation

Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation,…

Artificial Intelligence · Computer Science 2025-02-11 Jiachen Xi , Alfredo Garcia , Petar Momcilovic

Relative Entropy Regularized Policy Iteration

We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of…

Machine Learning · Computer Science 2018-12-07 Abbas Abdolmaleki , Jost Tobias Springenberg , Jonas Degrave , Steven Bohez , Yuval Tassa , Dan Belov , Nicolas Heess , Martin Riedmiller

A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms

We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman…

Machine Learning · Computer Science 2026-04-01 Weiqin Chen , Mark S. Squillante , Chai Wah Wu , Santiago Paternain

Empirical Evaluation of Policy-Based Reinforcement Learning for Dynamic Service Control in an M/M/1 Queue

While reinforcement learning has been increasingly applied to stochastic control, few studies have systematically examined policy-based methods in queuing environments modeled as a semi-Markov decision process (SMDP). To address this gap,…

Optimization and Control · Mathematics 2026-04-28 Joseph Walton , Gabriel Nicolosi

Online Robust Reinforcement Learning with Model Uncertainty

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust RL, where the uncertainty set is defined to be centering at a…

Machine Learning · Computer Science 2021-10-29 Yue Wang , Shaofeng Zou

Recursive Reinforcement Learning

Recursion is the fundamental paradigm to finitely describe potentially infinite objects. As state-of-the-art reinforcement learning (RL) algorithms cannot directly reason about recursion, they must rely on the practitioner's ingenuity in…

Machine Learning · Computer Science 2022-06-24 Ernst Moritz Hahn , Mateo Perez , Sven Schewe , Fabio Somenzi , Ashutosh Trivedi , Dominik Wojtczak

Successive Over Relaxation Q-Learning

In a discounted reward Markov Decision Process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the…

Machine Learning · Computer Science 2019-06-17 Chandramouli Kamanchi , Raghuram Bharadwaj Diddigi , Shalabh Bhatnagar