English
Related papers

Related papers: Offline Policy Optimization with Eligible Actions

200 papers

We study the problem of offline policy optimization in stochastic contextual bandit problems, where the goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavior policy. Rather than making…

Machine Learning · Computer Science 2023-09-28 Germano Gabbianelli , Gergely Neu , Matteo Papini

Offline policy learning aims to use historical data to learn an optimal personalized decision rule. In the standard estimate-then-optimize framework, reweighting-based methods (e.g., inverse propensity weighting or doubly robust estimators)…

Optimization and Control · Mathematics 2026-01-21 Jingren Liu , Hanzhang Qin , Junyi Liu , Mabel C. Chou , Jong-Shi Pang

Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to obtain unbiased estimates under another distribution. However,…

Machine Learning · Computer Science 2023-06-28 Kristopher De Asis , Eric Graves , Richard S. Sutton

In many domains, the exploration process of reinforcement learning will be too costly as it requires trying out suboptimal policies, resulting in a need for off-policy evaluation, in which a target policy is evaluated based on data…

Machine Learning · Computer Science 2024-05-07 David M. Bossens , Philip S. Thomas

Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning…

We consider the problem of off-policy evaluation in Markov decision processes. Off-policy evaluation is the task of evaluating the expected return of one policy with data generated by a different, behavior policy. Importance sampling is a…

Machine Learning · Computer Science 2019-05-13 Josiah P. Hanna , Scott Niekum , Peter Stone

Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a…

Machine Learning · Computer Science 2024-11-04 Allen Nie , Yash Chandak , Christina J. Yuan , Anirudhan Badrinath , Yannis Flet-Berliac , Emma Brunskil

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they…

Machine Learning · Computer Science 2025-03-18 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana

Evaluating and optimizing policies in the presence of unobserved confounders is a problem of growing interest in offline reinforcement learning. Using conventional methods for offline RL in the presence of confounding can not only lead to…

Machine Learning · Statistics 2023-11-08 Chinmaya Kausik , Yangyi Lu , Kevin Tan , Maggie Makar , Yixin Wang , Ambuj Tewari

In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a…

Machine Learning · Computer Science 2021-07-13 Phillip Swazinna , Steffen Udluft , Daniel Hein , Thomas Runkler

Offline optimization is an emerging problem in many experimental engineering domains including protein, drug or aircraft design, where online experimentation to collect evaluation data is too expensive or dangerous. To avoid that, one has…

Machine Learning · Computer Science 2024-05-10 Yassine Chemingui , Aryan Deshwal , Trong Nghia Hoang , Janardhan Rao Doppa

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse.…

Machine Learning · Computer Science 2018-11-01 Alberto Maria Metelli , Matteo Papini , Francesco Faccio , Marcello Restelli

Off-policy evaluation methods are important in recommendation systems and search engines, where data collected under an existing logging policy is used to estimate the performance of a new proposed policy. A common approach to this problem…

Machine Learning · Computer Science 2023-01-04 Jaron J. R. Lee , David Arbour , Georgios Theocharous

A central challenge to applying many off-policy reinforcement learning algorithms to real world problems is the variance introduced by importance sampling. In off-policy learning, the agent learns about a different policy than the one being…

Machine Learning · Computer Science 2022-06-20 Eric Graves , Sina Ghiassian

Large scale reinforcement learning has become a central tool for improving reasoning in large language models. At this scale, generation is often lagged or asynchronous, so updates are performed on data collected by older policies. This…

Machine Learning · Computer Science 2026-05-28 Otmane Sakhi , Aleksei Arzhantsev , Imad Aouali , Flavian Vasile

In applying reinforcement learning (RL) to high-stakes domains, quantitative and qualitative evaluation using observational data can help practitioners understand the generalization performance of new policies. However, this type of…

Machine Learning · Computer Science 2023-10-27 Shengpu Tang , Jenna Wiens

The ability to exploit prior experience to solve novel problems rapidly is a hallmark of biological learning systems and of great practical importance for artificial ones. In the meta reinforcement learning literature much recent work has…

Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation…

Machine Learning · Computer Science 2020-10-23 Ruosong Wang , Dean P. Foster , Sham M. Kakade

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

Before A/B testing online a new version of a recommender system, it is usual to perform some offline evaluations on historical data. We focus on evaluation methods that compute an estimator of the potential uplift in revenue that could…

Machine Learning · Statistics 2018-01-23 Alexandre Gilotte , Clément Calauzènes , Thomas Nedelec , Alexandre Abraham , Simon Dollé
‹ Prev 1 2 3 10 Next ›