Related papers: Efficient Reinforcement Learning in Deterministic …

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call "optimistic closure," which is…

Machine Learning · Statistics 2019-12-10 Yining Wang , Ruosong Wang , Simon S. Du , Akshay Krishnamurthy

Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Equivalence of Optimality Criteria for Markov Decision Process and Model Predictive Control

This paper shows that the optimal policy and value functions of a Markov Decision Process (MDP), either discounted or not, can be captured by a finite-horizon undiscounted Optimal Control Problem (OCP), even if based on an inexact model.…

Systems and Control · Electrical Eng. & Systems 2023-02-08 Arash Bahari Kordabad , Mario Zanon , Sebastien Gros

Optimistic Proximal Policy Optimization

Reinforcement Learning, a machine learning framework for training an autonomous agent based on rewards, has shown outstanding results in various domains. However, it is known that learning a good policy is difficult in a domain where…

Machine Learning · Computer Science 2019-06-27 Takahisa Imagawa , Takuya Hiraoka , Yoshimasa Tsuruoka

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of…

Machine Learning · Computer Science 2025-07-25 Fan Chen , Zeyu Jia , Alexander Rakhlin , Tengyang Xie

Reinforcement Learning Based Optimal Camera Placement for Depth Observation of Indoor Scenes

Exploring the most task-friendly camera setting -- optimal camera placement (OCP) problem -- in tasks that use multiple cameras is of great importance. However, few existing OCP solutions specialize in depth observation of indoor scenes,…

Computer Vision and Pattern Recognition · Computer Science 2021-10-22 Yichuan Chen , Manabu Tsukada , Hiroshi Esaki

Provably Efficient Exploration in Policy Optimization

While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably…

Machine Learning · Computer Science 2024-04-02 Qi Cai , Zhuoran Yang , Chi Jin , Zhaoran Wang

Deep Exploration via Randomized Value Functions

We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to…

Machine Learning · Statistics 2019-09-25 Ian Osband , Benjamin Van Roy , Daniel Russo , Zheng Wen

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and…

Machine Learning · Computer Science 2024-03-12 Vincent Leon , S. Rasoul Etesami

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

Offline RL via Feature-Occupancy Gradient Ascent

We study offline Reinforcement Learning in large infinite-horizon discounted Markov Decision Processes (MDPs) when the reward and transition models are linearly realizable under a known feature map. Starting from the classic linear-program…

Machine Learning · Computer Science 2024-05-24 Gergely Neu , Nneka Okolo

Optimistic Reinforcement Learning with Quantile Objectives

Reinforcement Learning (RL) has achieved tremendous success in recent years. However, the classical foundations of RL do not account for the risk sensitivity of the objective function, which is critical in various fields, including…

Machine Learning · Computer Science 2025-11-14 Mohammad Alipour-Vaezi , Huaiyang Zhong , Kwok-Leung Tsui , Sajad Khodadadian

Exploration-exploitation trade-off for continuous-time episodic reinforcement learning with linear-convex models

We develop a probabilistic framework for analysing model-based reinforcement learning in the episodic setting. We then apply it to study finite-time horizon stochastic control problems with linear dynamics but unknown coefficients and…

Machine Learning · Computer Science 2021-12-22 Lukasz Szpruch , Tanut Treetanthiploet , Yufei Zhang

Online Reinforcement Learning with Uncertain Episode Lengths

Existing episodic reinforcement algorithms assume that the length of an episode is fixed across time and known a priori. In this paper, we consider a general framework of episodic reinforcement learning when the length of each episode is…

Machine Learning · Computer Science 2023-02-08 Debmalya Mandal , Goran Radanovic , Jiarui Gan , Adish Singla , Rupak Majumdar

Offline Reinforcement Learning with Additional Covering Distributions

We study learning optimal policies from a logged dataset, i.e., offline RL, with function approximation. Despite the efforts devoted, existing algorithms with theoretic finite-sample guarantees typically assume exploratory data coverage or…

Machine Learning · Computer Science 2023-05-25 Chenjie Mao

Orchestrated Value Mapping for Reinforcement Learning

We present a general convergent class of reinforcement learning algorithms that is founded on two distinct principles: (1) mapping value estimates to a different space using arbitrary functions from a broad class, and (2) linearly…

Machine Learning · Computer Science 2022-03-18 Mehdi Fatemi , Arash Tavakoli

The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning

In this theoretical paper we are concerned with the problem of learning a value function by a smooth general function approximator, to solve a deterministic episodic control problem in a large continuous state space. It is shown that…

Machine Learning · Computer Science 2011-01-04 Michael Fairbank , Eduardo Alonso

Provably Safe Reinforcement Learning for Stochastic Reach-Avoid Problems with Entropy Regularization

We consider the problem of learning the optimal policy for Markov decision processes with safety constraints. We formulate the problem in a reach-avoid setup. Our goal is to design online reinforcement learning algorithms that ensure safety…

Machine Learning · Computer Science 2026-01-21 Abhijit Mazumdar , Rafal Wisniewski , Manuela L. Bujorianu

Learning to Control in Metric Space with Optimal Regret

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is…

Machine Learning · Computer Science 2019-05-07 Lin F. Yang , Chengzhuo Ni , Mengdi Wang

Model-Free Adaptive Optimal Control of Episodic Fixed-Horizon Manufacturing Processes using Reinforcement Learning

A self-learning optimal control algorithm for episodic fixed-horizon manufacturing processes with time-discrete control actions is proposed and evaluated on a simulated deep drawing process. The control model is built during consecutive…

Systems and Control · Computer Science 2020-01-07 Johannes Dornheim , Norbert Link , Peter Gumbsch