Related papers: Debiasing In-Sample Policy Performance for Small-D…

Optimization Under Unknown Constraints

Optimization of complex functions, such as the output of computer simulators, is a difficult task that has received much attention in the literature. A less studied problem is that of optimization under unknown constraints, i.e., when the…

Methodology · Statistics 2010-07-06 Robert B. Gramacy , Herbert K. H. Lee

Empirical Likelihood for Contextual Bandits

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence…

Machine Learning · Computer Science 2020-10-20 Nikos Karampatziakis , John Langford , Paul Mineiro

Optimal Baseline Corrections for Off-Policy Contextual Bandits

The off-policy learning paradigm allows for recommender systems and general ranking applications to be framed as decision-making problems, where we aim to learn decision policies that optimize an unbiased offline estimate of an online…

Machine Learning · Computer Science 2024-08-15 Shashank Gupta , Olivier Jeunen , Harrie Oosterhuis , Maarten de Rijke

Optimize-via-Predict: Realizing out-of-sample optimality in data-driven optimization

We examine a stochastic formulation for data-driven optimization wherein the decision-maker is not privy to the true distribution, but has knowledge that it lies in some hypothesis set and possesses a historical data set, from which…

Optimization and Control · Mathematics 2023-09-21 Gar Goei Loke , Taozeng Zhu , Ruiting Zuo

Statistically Efficient Off-Policy Policy Gradients

Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value. In this paper, we consider the statistically efficient estimation of policy gradients from…

Machine Learning · Statistics 2020-02-21 Nathan Kallus , Masatoshi Uehara

Practical Improvements of A/B Testing with Off-Policy Estimation

We address the problem of A/B testing, a widely used protocol for evaluating the potential improvement achieved by a new decision system compared to a baseline. This protocol segments the population into two subgroups, each exposed to a…

Machine Learning · Statistics 2025-06-16 Otmane Sakhi , Alexandre Gilotte , David Rohde

Efficient hyperparameter optimization by way of PAC-Bayes bound minimization

Identifying optimal values for a high-dimensional set of hyperparameters is a problem that has received growing attention given its importance to large-scale machine learning applications such as neural architecture search. Recently…

Machine Learning · Statistics 2020-08-17 John J. Cherian , Andrew G. Taube , Robert T. McGibbon , Panagiotis Angelikopoulos , Guy Blanc , Michael Snarski , Daniel D. Richman , John L. Klepeis , David E. Shaw

Off-Policy Evaluation with Policy-Dependent Optimization Response

The intersection of causal inference and machine learning for decision-making is rapidly expanding, but the default decision criterion remains an \textit{average} of individual causal outcomes across a population. In practice, various…

Machine Learning · Computer Science 2022-11-08 Wenshuo Guo , Michael I. Jordan , Angela Zhou

Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach

In order for reinforcement learning techniques to be useful in real-world decision making processes, they must be able to produce robust performance from limited data. Deep policy optimization methods have achieved impressive results on…

Machine Learning · Computer Science 2020-12-22 James Queeney , Ioannis Ch. Paschalidis , Christos G. Cassandras

Understanding the Risks and Rewards of Combining Unbiased and Possibly Biased Estimators, with Applications to Causal Inference

Several problems in statistics involve the combination of high-variance unbiased estimators with low-variance estimators that are only unbiased under strong assumptions. A notable example is the estimation of causal effects while combining…

Methodology · Statistics 2023-05-25 Michael Oberst , Alexander D'Amour , Minmin Chen , Yuyan Wang , David Sontag , Steve Yadlowsky

Stochastic Optimization Algorithms for Problems with Controllable Biased Oracles

Motivated by emerging applications in machine learning, we consider an optimization problem in a general form where the gradient of the objective function is available through a biased stochastic oracle. We assume a bias-control parameter…

Optimization and Control · Mathematics 2026-02-10 Yin Liu , Sam Davanloo Tajbakhsh

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

Debiased Ill-Posed Regression

In various statistical settings, the goal is to estimate a function which is restricted by the statistical model only through a conditional moment restriction. Prominent examples include the nonparametric instrumental variable framework for…

Methodology · Statistics 2025-05-28 AmirEmad Ghassami , James M. Robins , Andrea Rotnitzky

Unbiased Estimation of the Value of an Optimized Policy

Randomized trials, also known as A/B tests, are used to select between two policies: a control and a treatment. Given a corresponding set of features, we can ideally learn an optimized policy P that maps the A/B test data features to action…

Machine Learning · Computer Science 2018-06-08 Elon Portugaly , Joseph J. Pfeiffer

Adaptive Estimator Selection for Off-Policy Evaluation

We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant…

Machine Learning · Computer Science 2020-08-25 Yi Su , Pavithra Srinath , Akshay Krishnamurthy

Adaptive Data Debiasing through Bounded Exploration

Biases in existing datasets used to train algorithmic decision rules can raise ethical and economic concerns due to the resulting disparate treatment of different groups. We propose an algorithm for sequentially debiasing such datasets…

Machine Learning · Computer Science 2023-01-11 Yifan Yang , Yang Liu , Parinaz Naghizadeh

Debiasing Conditional Stochastic Optimization

In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient…

Machine Learning · Computer Science 2023-12-05 Lie He , Shiva Prasad Kasiviswanathan

Adaptive Bounded Exploration and Intermediate Actions for Data Debiasing

The performance of algorithmic decision rules is largely dependent on the quality of training datasets available to them. Biases in these datasets can raise economic and ethical concerns due to the resulting algorithms' disparate treatment…

Machine Learning · Computer Science 2025-04-14 Yifan Yang , Yang Liu , Parinaz Naghizadeh

A Graduated Filter Method for Large Scale Robust Estimation

Due to the highly non-convex nature of large-scale robust parameter estimation, avoiding poor local minima is challenging in real-world applications where input data is contaminated by a large or unknown fraction of outliers. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2020-03-23 Huu Le , Christopher Zach

Assessing solution quality in risk-averse stochastic programs

In optimization problems, the quality of a candidate solution can be characterized by the optimality gap. For most stochastic optimization problems, this gap must be statistically estimated. We show that for risk-averse problems, standard…

Optimization and Control · Mathematics 2025-05-05 E. Ruben van Beesten , Nick W. Koning , David P. Morton