Related papers: Unbounded Dynamic Programming via the Q-Transform

Dynamic Optimal Choice When Rewards are Unbounded Below

We propose a new approach to solving dynamic decision problems with rewards that are unbounded below. The approach involves transforming the Bellman equation in order to convert an unbounded problem into a bounded one. The major advantage…

Theoretical Economics · Economics 2019-12-02 Qingyin Ma , John Stachurski

Dynamic Programming Deconstructed: Transformations of the Bellman Equation and Computational Efficiency

Some approaches to solving challenging dynamic programming problems, such as Q-learning, begin by transforming the Bellman equation into an alternative functional equation, in order to open up a new line of attack. Our paper studies this…

Optimization and Control · Mathematics 2019-12-05 Qingyin Ma , John Stachurski

Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results. The Decision Transformer (DT) combines the conditional policy approach and a transformer architecture, showing…

Machine Learning · Computer Science 2023-05-26 Taku Yamagata , Ahmed Khalil , Raul Santos-Rodriguez

New Approach to Bounded Quantum--Mechanical Models

We develop an approach for the treatment of one--dimensional bounded quantum--mechanical models by straightforward modification of a successful method for unbounded ones. We apply the new approach to a simple example and show that it…

Mathematical Physics · Physics 2009-11-13 Francisco M. Fernández

Dynamic Q-planning for Online UAV Path Planning in Unknown and Complex Environments

Unmanned Aerial Vehicles need an online path planning capability to move in high-risk missions in unknown and complex environments to complete them safely. However, many algorithms reported in the literature may not return reliable…

Robotics · Computer Science 2024-02-12 Lidia Gianne Souza da Rocha , Kenny Anderson Queiroz Caldas , Marco Henrique Terra , Fabio Ramos , Kelen Cristiane Teixeira Vivaldini

Differentiable Quantum Programming with Unbounded Loops

The emergence of variational quantum applications has led to the development of automatic differentiation techniques in quantum computing. Recently, Zhu et al. (PLDI 2020) have formulated differentiable quantum programming with bounded…

Quantum Physics · Physics 2022-11-10 Wang Fang , Mingsheng Ying , Xiaodi Wu

Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-\gamma)$, where $\gamma…

Machine Learning · Computer Science 2020-07-09 Adithya M. Devraj , Sean P. Meyn

Convergence of Dynamic Programming on the Semidefinite Cone

The goal of this paper is to investigate new and simple convergence analysis of dynamic programming for linear quadratic regulator problem of discrete-time linear time-invariant systems. In particular, bounds on errors are given in terms of…

Optimization and Control · Mathematics 2021-06-18 Donghwan Lee

Self-Imitation Learning via Generalized Lower Bound Q-learning

Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning. In this work, we propose a n-step lower bound which generalizes the original return-based lower-bound Q-learning, and…

Machine Learning · Computer Science 2021-02-16 Yunhao Tang

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

We present an approach called Q-probing to adapt a pre-trained language model to maximize a task-specific reward function. At a high level, Q-probing sits between heavier approaches such as finetuning and lighter approaches such as few shot…

Machine Learning · Computer Science 2024-06-04 Kenneth Li , Samy Jelassi , Hugh Zhang , Sham Kakade , Martin Wattenberg , David Brandfonbrener

Evolutionary learning of interpretable decision trees

Reinforcement learning techniques achieved human-level performance in several tasks in the last decade. However, in recent years, the need for interpretability emerged: we want to be able to understand how a system works and the reasons…

Machine Learning · Computer Science 2023-01-13 Leonardo Lucio Custode , Giovanni Iacca

Penalized Q-Learning for Dynamic Treatment Regimes

A dynamic treatment regime effectively incorporates both accrued information and long-term effects of treatment from specially designed clinical trials. As these become more and more popular in conjunction with longitudinal data from…

Methodology · Statistics 2011-08-29 Rui Song , Weiwei Wang , Donglin Zeng , Michael R. Kosorok

Transfer Reinforcement Learning under Unobserved Contextual Information

In this paper, we study a transfer reinforcement learning problem where the state transitions and rewards are affected by the environmental context. Specifically, we consider a demonstrator agent that has access to a context-aware policy…

Machine Learning · Computer Science 2020-03-11 Yan Zhang , Michael M. Zavlanos

Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach

Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems where an agent aims to maximize the entropy regularized value function. Despite its empirical success, there have been limited…

Machine Learning · Computer Science 2024-09-06 Narim Jeong , Donghwan Lee

Boosting Soft Q-Learning by Bounding

An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we…

Machine Learning · Computer Science 2024-06-27 Jacob Adamczyk , Volodymyr Makarenko , Stas Tiomkin , Rahul V. Kulkarni

Safe Q-learning for continuous-time linear systems

Q-learning is a promising method for solving optimal control problems for uncertain systems without the explicit need for system identification. However, approaches for continuous-time Q-learning have limited provable safety guarantees,…

Systems and Control · Electrical Eng. & Systems 2024-01-30 Soutrik Bandyopadhyay , Shubhendu Bhasin

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to…

Robotics · Computer Science 2023-10-18 Yevgen Chebotar , Quan Vuong , Alex Irpan , Karol Hausman , Fei Xia , Yao Lu , Aviral Kumar , Tianhe Yu , Alexander Herzog , Karl Pertsch , Keerthana Gopalakrishnan , Julian Ibarz , Ofir Nachum , Sumedh Sontakke , Grecia Salazar , Huong T Tran , Jodilyn Peralta , Clayton Tan , Deeksha Manjunath , Jaspiar Singht , Brianna Zitkovich , Tomas Jackson , Kanishka Rao , Chelsea Finn , Sergey Levine

Deep Constrained Q-learning

In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a list of constraints. Classical methods based on reward shaping, i.e. a weighted combination of…

Machine Learning · Computer Science 2020-09-15 Gabriel Kalweit , Maria Huegle , Moritz Werling , Joschka Boedecker

Provably Efficient Reward Transfer in Reinforcement Learning with Discrete Markov Decision Processes

In this paper, we propose a new solution to reward adaptation (RA) in reinforcement learning, where the agent adapts to a target reward function based on one or more existing source behaviors learned a priori under the same domain dynamics…

Machine Learning · Computer Science 2025-10-23 Kevin Vora , Yu Zhang

Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills

We consider the problem of learning useful robotic skills from previously collected offline data without access to manually specified rewards or additional online exploration, a setting that is becoming increasingly important for scaling…

Robotics · Computer Science 2021-06-14 Yevgen Chebotar , Karol Hausman , Yao Lu , Ted Xiao , Dmitry Kalashnikov , Jake Varley , Alex Irpan , Benjamin Eysenbach , Ryan Julian , Chelsea Finn , Sergey Levine