Related papers: Offline Multi-Action Policy Learning: Generalizati…

Policy Learning with Observational Data

In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application-specific constraints, such as budget, fairness, simplicity, or other functional form constraints. For example,…

Statistics Theory · Mathematics 2020-09-08 Susan Athey , Stefan Wager

Semi-Parametric Efficient Policy Learning with Continuous Actions

We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value…

Econometrics · Economics 2019-07-23 Mert Demirer , Vasilis Syrgkanis , Greg Lewis , Victor Chernozhukov

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a…

Artificial Intelligence · Computer Science 2020-12-23 Nymisha Bandi , Theja Tulabandhula

Robust Offline Policy Learning with Observational Data from Multiple Sources

We consider the problem of using observational bandit feedback data from multiple heterogeneous data sources to learn a personalized decision policy that robustly generalizes across diverse target settings. To achieve this, we propose a…

Machine Learning · Computer Science 2024-10-14 Aldo Gael Carranza , Susan Athey

Optimal Policy Learning with Observational Data in Multi-Action Scenarios: Estimation, Risk Preference, and Potential Failures

This paper deals with optimal policy learning (OPL) with observational data, i.e. data-driven optimal decision-making, in multi-action (or multi-arm) settings, where a finite set of decision options is available. It is organized in three…

Machine Learning · Statistics 2024-04-01 Giovanni Cerulli

Pessimistic Off-Policy Optimization for Learning to Rank

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are…

Machine Learning · Computer Science 2024-10-23 Matej Cief , Branislav Kveton , Michal Kompan

Policy Learning under Biased Sample Selection

Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed…

Econometrics · Economics 2023-04-25 Lihua Lei , Roshni Sahoo , Stefan Wager

Online Learning with Off-Policy Feedback

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead…

Machine Learning · Computer Science 2022-07-20 Germano Gabbianelli , Matteo Papini , Gergely Neu

Off-policy Learning for Multiple Loggers

It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning usually harms user experiences, it is…

Machine Learning · Statistics 2019-08-06 Li He , Long Xia , Wei Zeng , Zhi-Ming Ma , Yihong Zhao , Dawei Yin

On Minimax Optimal Offline Policy Evaluation

This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the multi-armed bandit case, establish a minimax…

Artificial Intelligence · Computer Science 2014-09-15 Lihong Li , Remi Munos , Csaba Szepesvari

More Efficient Policy Learning via Optimal Retargeting

Policy learning can be used to extract individualized treatment regimes from observational data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a commonplace lack of overlap in the data for different…

Machine Learning · Statistics 2020-12-04 Nathan Kallus

Online Off-policy Prediction

This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving,…

Machine Learning · Computer Science 2018-11-08 Sina Ghiassian , Andrew Patterson , Martha White , Richard S. Sutton , Adam White

Regularizing Fairness in Optimal Policy Learning with Distributional Targets

A decision maker typically (i) incorporates training data to learn about the relative effectiveness of treatments, and (ii) chooses an implementation mechanism that implies an ``optimal'' predicted outcome distribution according to some…

Econometrics · Economics 2025-05-29 Anders Bredahl Kock , David Preinerstorfer

Offline Policy Optimization with Eligible Actions

Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in…

Machine Learning · Computer Science 2022-07-05 Yao Liu , Yannis Flet-Berliac , Emma Brunskill

Optimal Policy Trees

We propose an approach for learning optimal tree-based prescription policies directly from data, combining methods for counterfactual estimation from the causal inference literature with recent advances in training globally-optimal decision…

Machine Learning · Computer Science 2020-12-07 Maxime Amram , Jack Dunn , Ying Daisy Zhuo

Contextual Inverse Optimization: Offline and Online Learning

We study the problems of offline and online contextual optimization with feedback information, where instead of observing the loss, we observe, after-the-fact, the optimal action an oracle with full knowledge of the objective function would…

Machine Learning · Computer Science 2023-07-04 Omar Besbes , Yuri Fonseca , Ilan Lobel

Batch Policy Learning under Constraints

When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We…

Machine Learning · Computer Science 2019-03-22 Hoang M. Le , Cameron Voloshin , Yisong Yue

A Distributional View on Multi-Objective Policy Optimization

Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over…

Machine Learning · Computer Science 2020-05-18 Abbas Abdolmaleki , Sandy H. Huang , Leonard Hasenclever , Michael Neunert , H. Francis Song , Martina Zambelli , Murilo F. Martins , Nicolas Heess , Raia Hadsell , Martin Riedmiller

Treatment recommendation with distributional targets

We study the problem of a decision maker who must provide the best possible treatment recommendation based on an experiment. The desirability of the outcome distribution resulting from the policy recommendation is measured through a…

Econometrics · Economics 2022-04-06 Anders Bredahl Kock , David Preinerstorfer , Bezirgen Veliyev

Policy Learning with Abstention

Policy learning algorithms are widely used in areas such as personalized medicine and advertising to develop individualized treatment regimes. However, most methods force a decision even when predictions are uncertain, which is risky in…

Machine Learning · Computer Science 2026-01-30 Ayush Sawarni , Jikai Jin , Justin Whitehouse , Vasilis Syrgkanis