Related papers: Off-Policy Evaluation with Policy-Dependent Optimi…

Pessimistic Off-Policy Optimization for Learning to Rank

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are…

Machine Learning · Computer Science 2024-10-23 Matej Cief , Branislav Kveton , Michal Kompan

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes…

Machine Learning · Computer Science 2023-01-26 Yash Chandak , Shiv Shankar , Nathaniel D. Bastian , Bruno Castro da Silva , Emma Brunskil , Philip S. Thomas

Optimal Baseline Corrections for Off-Policy Contextual Bandits

The off-policy learning paradigm allows for recommender systems and general ranking applications to be framed as decision-making problems, where we aim to learn decision policies that optimize an unbiased offline estimate of an online…

Machine Learning · Computer Science 2024-08-15 Shashank Gupta , Olivier Jeunen , Harrie Oosterhuis , Maarten de Rijke

Positivity-free Policy Learning with Observational Data

Policy learning utilizing observational data is pivotal across various domains, with the objective of learning the optimal treatment assignment policy while adhering to specific constraints such as fairness, budget, and simplicity. This…

Methodology · Statistics 2023-10-12 Pan Zhao , Antoine Chambaz , Julie Josse , Shu Yang

Online Off-policy Prediction

This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving,…

Machine Learning · Computer Science 2018-11-08 Sina Ghiassian , Andrew Patterson , Martha White , Richard S. Sutton , Adam White

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a…

Artificial Intelligence · Computer Science 2020-12-23 Nymisha Bandi , Theja Tulabandhula

Adaptive Estimator Selection for Off-Policy Evaluation

We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant…

Machine Learning · Computer Science 2020-08-25 Yi Su , Pavithra Srinath , Akshay Krishnamurthy

Reliable Off-policy Evaluation for Reinforcement Learning

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy.…

Machine Learning · Computer Science 2022-11-04 Jie Wang , Rui Gao , Hongyuan Zha

Generalizing Off-Policy Evaluation From a Causal Perspective For Sequential Decision-Making

Assessing the effects of a policy based on observational data from a different policy is a common problem across several high-stake decision-making domains, and several off-policy evaluation (OPE) techniques have been proposed. However,…

Machine Learning · Computer Science 2022-01-21 Sonali Parbhoo , Shalmali Joshi , Finale Doshi-Velez

Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning

We study the efficient off-policy evaluation of natural stochastic policies, which are defined in terms of deviations from the behavior policy. This is a departure from the literature on off-policy evaluation where most work consider the…

Machine Learning · Computer Science 2020-11-05 Nathan Kallus , Masatoshi Uehara

Semi-Parametric Efficient Policy Learning with Continuous Actions

We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value…

Econometrics · Economics 2019-07-23 Mert Demirer , Vasilis Syrgkanis , Greg Lewis , Victor Chernozhukov

$\Delta\text{-}{\rm OPE}$: Off-Policy Estimation with Pairs of Policies

The off-policy paradigm casts recommendation as a counterfactual decision-making task, allowing practitioners to unbiasedly estimate online metrics using offline data. This leads to effective evaluation metrics, as well as learning…

Machine Learning · Computer Science 2024-09-17 Olivier Jeunen , Aleksei Ustimenko

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history…

Machine Learning · Computer Science 2020-02-25 Yaqi Duan , Mengdi Wang

Understanding the Effect of Stochasticity in Policy Optimization

We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions. First, we show that the preferability of optimization methods depends critically on whether stochastic versus exact gradients…

Machine Learning · Computer Science 2021-11-01 Jincheng Mei , Bo Dai , Chenjun Xiao , Csaba Szepesvari , Dale Schuurmans

Estimation of Bounds on Potential Outcomes For Decision Making

Estimation of individual treatment effects is commonly used as the basis for contextual decision making in fields such as healthcare, education, and economics. However, it is often sufficient for the decision maker to have estimates of…

Machine Learning · Computer Science 2020-08-13 Maggie Makar , Fredrik D. Johansson , John Guttag , David Sontag

Predictive Performance Comparison of Decision Policies Under Confounding

Predictive models are often introduced to decision-making tasks under the rationale that they improve performance over an existing decision-making policy. However, it is challenging to compare predictive performance against an existing…

Machine Learning · Computer Science 2024-06-13 Luke Guerdan , Amanda Coston , Kenneth Holstein , Zhiwei Steven Wu

A Survey of Contextual Optimization Methods for Decision Making under Uncertainty

Recently there has been a surge of interest in operations research (OR) and the machine learning (ML) community in combining prediction algorithms and optimization techniques to solve decision-making problems in the face of uncertainty.…

Optimization and Control · Mathematics 2025-11-11 Utsav Sadana , Abhilash Chenreddy , Erick Delage , Alexandre Forel , Emma Frejinger , Thibaut Vidal

Matching-Based Policy Learning

The beneficial effects of treatments vary across individuals in most studies. Treatment heterogeneity motivates practitioners to search for the optimal policy based on personal characteristics. A long-standing common practice in policy…

Statistics Theory · Mathematics 2025-01-06 Xuqiao Li , Ying Yan

Efficient and Sharp Off-Policy Learning under Unobserved Confounding

We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assumes unconfoundedness, meaning that no…

Machine Learning · Computer Science 2026-02-18 Konstantin Hess , Dennis Frauen , Valentyn Melnychuk , Stefan Feuerriegel

Model-Free and Model-Based Policy Evaluation when Causality is Uncertain

When decision-makers can directly intervene, policy evaluation algorithms give valid causal estimates. In off-policy evaluation (OPE), there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior…

Machine Learning · Computer Science 2022-04-05 David Bruns-Smith