English
Related papers

Related papers: Off-Policy Evaluation with Policy-Dependent Optimi…

200 papers

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are…

Machine Learning · Computer Science 2024-10-23 Matej Cief , Branislav Kveton , Michal Kompan

Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes…

Machine Learning · Computer Science 2023-01-26 Yash Chandak , Shiv Shankar , Nathaniel D. Bastian , Bruno Castro da Silva , Emma Brunskil , Philip S. Thomas

The off-policy learning paradigm allows for recommender systems and general ranking applications to be framed as decision-making problems, where we aim to learn decision policies that optimize an unbiased offline estimate of an online…

Machine Learning · Computer Science 2024-08-15 Shashank Gupta , Olivier Jeunen , Harrie Oosterhuis , Maarten de Rijke

Policy learning utilizing observational data is pivotal across various domains, with the objective of learning the optimal treatment assignment policy while adhering to specific constraints such as fairness, budget, and simplicity. This…

Methodology · Statistics 2023-10-12 Pan Zhao , Antoine Chambaz , Julie Josse , Shu Yang

This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving,…

Machine Learning · Computer Science 2018-11-08 Sina Ghiassian , Andrew Patterson , Martha White , Richard S. Sutton , Adam White

The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a…

Artificial Intelligence · Computer Science 2020-12-23 Nymisha Bandi , Theja Tulabandhula

We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant…

Machine Learning · Computer Science 2020-08-25 Yi Su , Pavithra Srinath , Akshay Krishnamurthy

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy.…

Machine Learning · Computer Science 2022-11-04 Jie Wang , Rui Gao , Hongyuan Zha

Assessing the effects of a policy based on observational data from a different policy is a common problem across several high-stake decision-making domains, and several off-policy evaluation (OPE) techniques have been proposed. However,…

Machine Learning · Computer Science 2022-01-21 Sonali Parbhoo , Shalmali Joshi , Finale Doshi-Velez

We study the efficient off-policy evaluation of natural stochastic policies, which are defined in terms of deviations from the behavior policy. This is a departure from the literature on off-policy evaluation where most work consider the…

Machine Learning · Computer Science 2020-11-05 Nathan Kallus , Masatoshi Uehara

We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value…

Econometrics · Economics 2019-07-23 Mert Demirer , Vasilis Syrgkanis , Greg Lewis , Victor Chernozhukov

The off-policy paradigm casts recommendation as a counterfactual decision-making task, allowing practitioners to unbiasedly estimate online metrics using offline data. This leads to effective evaluation metrics, as well as learning…

Machine Learning · Computer Science 2024-09-17 Olivier Jeunen , Aleksei Ustimenko

This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history…

Machine Learning · Computer Science 2020-02-25 Yaqi Duan , Mengdi Wang

We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions. First, we show that the preferability of optimization methods depends critically on whether stochastic versus exact gradients…

Machine Learning · Computer Science 2021-11-01 Jincheng Mei , Bo Dai , Chenjun Xiao , Csaba Szepesvari , Dale Schuurmans

Estimation of individual treatment effects is commonly used as the basis for contextual decision making in fields such as healthcare, education, and economics. However, it is often sufficient for the decision maker to have estimates of…

Machine Learning · Computer Science 2020-08-13 Maggie Makar , Fredrik D. Johansson , John Guttag , David Sontag

Predictive models are often introduced to decision-making tasks under the rationale that they improve performance over an existing decision-making policy. However, it is challenging to compare predictive performance against an existing…

Machine Learning · Computer Science 2024-06-13 Luke Guerdan , Amanda Coston , Kenneth Holstein , Zhiwei Steven Wu

Recently there has been a surge of interest in operations research (OR) and the machine learning (ML) community in combining prediction algorithms and optimization techniques to solve decision-making problems in the face of uncertainty.…

Optimization and Control · Mathematics 2025-11-11 Utsav Sadana , Abhilash Chenreddy , Erick Delage , Alexandre Forel , Emma Frejinger , Thibaut Vidal

The beneficial effects of treatments vary across individuals in most studies. Treatment heterogeneity motivates practitioners to search for the optimal policy based on personal characteristics. A long-standing common practice in policy…

Statistics Theory · Mathematics 2025-01-06 Xuqiao Li , Ying Yan

We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assumes unconfoundedness, meaning that no…

Machine Learning · Computer Science 2026-02-18 Konstantin Hess , Dennis Frauen , Valentyn Melnychuk , Stefan Feuerriegel

When decision-makers can directly intervene, policy evaluation algorithms give valid causal estimates. In off-policy evaluation (OPE), there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior…

Machine Learning · Computer Science 2022-04-05 David Bruns-Smith
‹ Prev 1 2 3 10 Next ›