English
Related papers

Related papers: Offline Multi-Action Policy Learning: Generalizati…

200 papers

In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application-specific constraints, such as budget, fairness, simplicity, or other functional form constraints. For example,…

Statistics Theory · Mathematics 2020-09-08 Susan Athey , Stefan Wager

We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value…

Econometrics · Economics 2019-07-23 Mert Demirer , Vasilis Syrgkanis , Greg Lewis , Victor Chernozhukov

The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a…

Artificial Intelligence · Computer Science 2020-12-23 Nymisha Bandi , Theja Tulabandhula

We consider the problem of using observational bandit feedback data from multiple heterogeneous data sources to learn a personalized decision policy that robustly generalizes across diverse target settings. To achieve this, we propose a…

Machine Learning · Computer Science 2024-10-14 Aldo Gael Carranza , Susan Athey

This paper deals with optimal policy learning (OPL) with observational data, i.e. data-driven optimal decision-making, in multi-action (or multi-arm) settings, where a finite set of decision options is available. It is organized in three…

Machine Learning · Statistics 2024-04-01 Giovanni Cerulli

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are…

Machine Learning · Computer Science 2024-10-23 Matej Cief , Branislav Kveton , Michal Kompan

Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed…

Econometrics · Economics 2023-04-25 Lihua Lei , Roshni Sahoo , Stefan Wager

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead…

Machine Learning · Computer Science 2022-07-20 Germano Gabbianelli , Matteo Papini , Gergely Neu

It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning usually harms user experiences, it is…

Machine Learning · Statistics 2019-08-06 Li He , Long Xia , Wei Zeng , Zhi-Ming Ma , Yihong Zhao , Dawei Yin

This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the multi-armed bandit case, establish a minimax…

Artificial Intelligence · Computer Science 2014-09-15 Lihong Li , Remi Munos , Csaba Szepesvari

Policy learning can be used to extract individualized treatment regimes from observational data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a commonplace lack of overlap in the data for different…

Machine Learning · Statistics 2020-12-04 Nathan Kallus

This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving,…

Machine Learning · Computer Science 2018-11-08 Sina Ghiassian , Andrew Patterson , Martha White , Richard S. Sutton , Adam White

A decision maker typically (i) incorporates training data to learn about the relative effectiveness of treatments, and (ii) chooses an implementation mechanism that implies an ``optimal'' predicted outcome distribution according to some…

Econometrics · Economics 2025-05-29 Anders Bredahl Kock , David Preinerstorfer

Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in…

Machine Learning · Computer Science 2022-07-05 Yao Liu , Yannis Flet-Berliac , Emma Brunskill

We propose an approach for learning optimal tree-based prescription policies directly from data, combining methods for counterfactual estimation from the causal inference literature with recent advances in training globally-optimal decision…

Machine Learning · Computer Science 2020-12-07 Maxime Amram , Jack Dunn , Ying Daisy Zhuo

We study the problems of offline and online contextual optimization with feedback information, where instead of observing the loss, we observe, after-the-fact, the optimal action an oracle with full knowledge of the objective function would…

Machine Learning · Computer Science 2023-07-04 Omar Besbes , Yuri Fonseca , Ilan Lobel

When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We…

Machine Learning · Computer Science 2019-03-22 Hoang M. Le , Cameron Voloshin , Yisong Yue

Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over…

We study the problem of a decision maker who must provide the best possible treatment recommendation based on an experiment. The desirability of the outcome distribution resulting from the policy recommendation is measured through a…

Econometrics · Economics 2022-04-06 Anders Bredahl Kock , David Preinerstorfer , Bezirgen Veliyev

Policy learning algorithms are widely used in areas such as personalized medicine and advertising to develop individualized treatment regimes. However, most methods force a decision even when predictions are uncertain, which is risky in…

Machine Learning · Computer Science 2026-01-30 Ayush Sawarni , Jikai Jin , Justin Whitehouse , Vasilis Syrgkanis
‹ Prev 1 2 3 10 Next ›