Related papers: Practical Open-Loop Optimistic Planning

Optimal Limited Contingency Planning

For a given problem, the optimal Markov policy can be considerred as a conditional or contingent plan containing a (potentially large) number of branches. Unfortunately, there are applications where it is desirable to strictly limit the…

Artificial Intelligence · Computer Science 2012-12-12 Nicolas Meuleau , David Smith

An optimistic planning algorithm for switched discrete-time LQR

We introduce TROOP, a tree-based Riccati optimistic online planner, that is designed to generate near-optimal control laws for discrete-time switched linear systems with switched quadratic costs. The key challenge that we address is…

Optimization and Control · Mathematics 2025-08-27 Mathieu Granzotto , Romain Postoyan , Dragan Nešić , Jamal Daafouz , Lucian Buşoniu

Model-Based Offline Planning with Trajectory Pruning

The recent offline reinforcement learning (RL) studies have achieved much progress to make RL usable in real-world systems by learning policies from pre-collected datasets without environment interaction. Unfortunately, existing offline RL…

Artificial Intelligence · Computer Science 2022-04-22 Xianyuan Zhan , Xiangyu Zhu , Haoran Xu

Adaptive Online Planning for Continual Lifelong Learning

We study learning control in an online reset-free lifelong learning scenario, where mistakes can compound catastrophically into the future and the underlying dynamics of the environment may change. Traditional model-free policy learning…

Machine Learning · Computer Science 2020-06-30 Kevin Lu , Igor Mordatch , Pieter Abbeel

Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes

In offline reinforcement learning (RL), the absence of active exploration calls for attention on the model robustness to tackle the sim-to-real gap, where the discrepancy between the simulated and deployed environments can significantly…

Machine Learning · Computer Science 2024-06-28 He Wang , Laixi Shi , Yuejie Chi

Online Semi-infinite Linear Programming: Efficient Algorithms via Function Approximation

We consider the dynamic resource allocation problem where the decision space is finite-dimensional, yet the solution must satisfy a large or even infinite number of constraints revealed via streaming data or oracle feedback. We model this…

Machine Learning · Computer Science 2026-03-18 Yiming Zong , Jiashuo Jiang

Optimistic planning for the near-optimal control of nonlinear switched discrete-time systems with stability guarantees

Originating in the artificial intelligence literature, optimistic planning (OP) is an algorithm that generates near-optimal control inputs for generic nonlinear discrete-time systems whose input set is finite. This technique is therefore…

Optimization and Control · Mathematics 2019-08-06 Mathieu Granzotto , Romain Postoyan , Lucian Buşoniu , Dragan Nešić , Jamal Daafouz

Optimistic Model Rollouts for Pessimistic Offline Policy Optimization

Model-based offline reinforcement learning (RL) has made remarkable progress, offering a promising avenue for improving generalization with synthetic model rollouts. Existing works primarily focus on incorporating pessimism for policy…

Machine Learning · Computer Science 2024-01-12 Yuanzhao Zhai , Yiying Li , Zijian Gao , Xudong Gong , Kele Xu , Dawei Feng , Ding Bo , Huaimin Wang

Infrequent Resolving Algorithm for Online Linear Programming

Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management, order fulfillment and advertising. Existing OLP…

Data Structures and Algorithms · Computer Science 2025-11-18 Guokai Li , Zizhuo Wang , Jingwei Zhang

CROP: Conservative Reward for Model-based Offline Policy Optimization

Offline reinforcement learning (RL) aims to optimize a policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges because of their capability to mitigate…

Machine Learning · Computer Science 2026-04-14 Hao Li , Xiao-Hu Zhou , Shu-Hai Li , Mei-Jiang Gui , Xiao-Liang Xie , Shi-Qi Liu , Shuang-Yi Wang , Zhen-Qiu Feng , Zeng-Guang Hou

Wait-Less Offline Tuning and Re-solving for Online Decision Making

Online linear programming (OLP) has found broad applications in revenue management and resource allocation. State-of-the-art OLP algorithms achieve low regret by repeatedly solving linear programming (LP) subproblems that incorporate…

Machine Learning · Statistics 2025-11-04 Jingruo Sun , Wenzhi Gao , Ellen Vitercik , Yinyu Ye

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

The proximal policy optimization (PPO) algorithm stands as one of the most prosperous methods in the field of reinforcement learning (RL). Despite its success, the theoretical understanding of PPO remains deficient. Specifically, it is…

Machine Learning · Computer Science 2023-06-09 Han Zhong , Tong Zhang

Optimal Online Algorithms for One-Way Trading and Online Knapsack Problems: A Unified Competitive Analysis

We study two canonical online optimization problems under capacity/budget constraints: the fractional one-way trading problem (OTP) and the integral online knapsack problem (OKP) under an infinitesimal assumption. Under the competitive…

Data Structures and Algorithms · Computer Science 2020-09-23 Ying Cao , Bo Sun , Danny H. K. Tsang

Adaptive Robust Online Portfolio Selection

The online portfolio selection (OLPS) problem differs from classical portfolio model problems, as it involves making sequential investment decisions. Many OLPS strategies described in the literature capture market movement based on various…

Portfolio Management · Quantitative Finance 2022-06-03 Man Yiu Tsang , Tony Sit , Hoi Ying Wong

Online Policy Optimization for Robust MDP

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight…

Machine Learning · Computer Science 2022-09-29 Jing Dong , Jingwei Li , Baoxiang Wang , Jingzhao Zhang

Online Learning for Obstacle Avoidance

We approach the fundamental problem of obstacle avoidance for robotic systems via the lens of online learning. In contrast to prior work that either assumes worst-case realizations of uncertainty in the environment or a stationary…

Robotics · Computer Science 2023-11-07 David Snyder , Meghan Booker , Nathaniel Simon , Wenhan Xia , Daniel Suo , Elad Hazan , Anirudha Majumdar

Online Planning Algorithms for POMDPs

Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their…

Artificial Intelligence · Computer Science 2014-01-16 Stéphane Ross , Joelle Pineau , Sébastien Paquet , Brahim Chaib-draa

Online Linear Programming with Batching

We study Online Linear Programming (OLP) with batching. The planning horizon is cut into $K$ batches, and the decisions on customers arriving within a batch can be delayed to the end of their associated batch. Compared with OLP without…

Machine Learning · Computer Science 2024-08-02 Haoran Xu , Peter W. Glynn , Yinyu Ye

Conservative Optimistic Policy Optimization via Multiple Importance Sampling

Reinforcement Learning (RL) has been able to solve hard problems such as playing Atari games or solving the game of Go, with a unified approach. Yet modern deep RL approaches are still not widely used in real-world applications. One reason…

Machine Learning · Computer Science 2021-03-08 Achraf Azize , Othman Gaizi

Off-Policy Learning in Large Action Spaces: Optimization Matters More Than Estimation

Off-policy evaluation (OPE) and off-policy learning (OPL) are foundational for decision-making in offline contextual bandits. Recent advances in OPL primarily optimize OPE estimators with improved statistical properties, assuming that…

Machine Learning · Statistics 2025-09-04 Imad Aouali , Otmane Sakhi