English
Related papers

Related papers: Logistic Q-Learning

200 papers

Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type…

Machine Learning · Computer Science 2026-05-11 Gugan Thoppe , L. A. Prashanth , Ankur Naskar , Sanjay Bhat

We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the…

Machine Learning · Computer Science 2017-11-10 Aurko Roy , Huan Xu , Sebastian Pokutta

This dissertation makes three main contributions. First, We identify a new connection between policy gradient and dynamic programming in MMDPs and propose the Coordinate Ascent Dynamic Programming (CADP) algorithm to compute a Markov policy…

Machine Learning · Computer Science 2025-10-21 Xihong Su

Reinforcement learning (RL) is a classical tool to solve network control or policy optimization problems in unknown environments. The original Q-learning suffers from performance and complexity challenges across very large networks. Herein,…

Machine Learning · Computer Science 2024-09-02 Talha Bozkus , Urbashi Mitra

It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging: does the projected Bellman equation have a solution? If so, is the solution useful in the sense of generating a good…

Optimization and Control · Mathematics 2020-08-11 Prashant G. Mehta , Sean P. Meyn

Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variant of…

Machine Learning · Computer Science 2020-01-10 Yihao Feng , Lihong Li , Qiang Liu

We consider a new form of reinforcement learning (RL) that is based on opportunities to directly learn the optimal control policy and a general Markov decision process (MDP) framework devised to support these opportunities. Derivations of…

Machine Learning · Computer Science 2021-04-02 Yingdong Lu , Mark S. Squillante , Chai Wah Wu

Q-learning is a popular Reinforcement Learning (RL) algorithm which is widely used in practice with function approximation (Mnih et al., 2015). In contrast, existing theoretical results are pessimistic about Q-learning. For example, (Baird,…

Machine Learning · Computer Science 2021-10-20 Naman Agarwal , Syomantak Chaudhuri , Prateek Jain , Dheeraj Nagaraj , Praneeth Netrapalli

Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and…

Machine Learning · Computer Science 2023-04-19 Andrew Patterson , Victor Liao , Martha White

Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q-learning. In these…

Systems and Control · Electrical Eng. & Systems 2025-04-09 J. S. van Hulst , W. P. M. H. Heemels , D. J. Antunes

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…

Machine Learning · Computer Science 2021-08-24 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

The field of quickest change detection (QCD) focuses on the design and analysis of online algorithms that estimate the time at which a significant event occurs. In this paper, design and analysis are cast in a Bayesian framework, where QCD…

Optimization and Control · Mathematics 2025-12-30 Austin Cooper , Sean Meyn

Many applications -- including power systems, robotics, and economics -- involve a dynamical system interacting with a stochastic and hard-to-model environment. We adopt a reinforcement learning approach to control such systems.…

Optimization and Control · Mathematics 2025-08-26 Abed AlRahman Al Makdah , Oliver Kosut , Lalitha Sankar , Shaofeng Zou

Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation,…

Artificial Intelligence · Computer Science 2025-02-11 Jiachen Xi , Alfredo Garcia , Petar Momcilovic

We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of…

We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman…

Machine Learning · Computer Science 2026-04-01 Weiqin Chen , Mark S. Squillante , Chai Wah Wu , Santiago Paternain

While reinforcement learning has been increasingly applied to stochastic control, few studies have systematically examined policy-based methods in queuing environments modeled as a semi-Markov decision process (SMDP). To address this gap,…

Optimization and Control · Mathematics 2026-04-28 Joseph Walton , Gabriel Nicolosi

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust RL, where the uncertainty set is defined to be centering at a…

Machine Learning · Computer Science 2021-10-29 Yue Wang , Shaofeng Zou

Recursion is the fundamental paradigm to finitely describe potentially infinite objects. As state-of-the-art reinforcement learning (RL) algorithms cannot directly reason about recursion, they must rely on the practitioner's ingenuity in…

Machine Learning · Computer Science 2022-06-24 Ernst Moritz Hahn , Mateo Perez , Sven Schewe , Fabio Somenzi , Ashutosh Trivedi , Dominik Wojtczak

In a discounted reward Markov Decision Process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the…

Machine Learning · Computer Science 2019-06-17 Chandramouli Kamanchi , Raghuram Bharadwaj Diddigi , Shalabh Bhatnagar
‹ Prev 1 2 3 10 Next ›