Related papers: Adaptive Combinatorial Allocation

Adaptive Policy Learning Under Unknown Network Interference

Adaptive experimentation under unknown network interference requires solving two coupled problems: (i) learning the underlying dynamics of interference among units and (ii) using these dynamics to inform treatment allocation in order to…

Machine Learning · Statistics 2026-05-13 Aidan Gleich , Eric Laber , Alexander Volfovsky

Distributed Thompson Sampling

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with…

Artificial Intelligence · Computer Science 2021-09-10 Jing Dong , Tan Li , Shaolei Ren , Linqi Song

Adaptive Decision-Making with Constraints and Dependent Losses: Performance Guarantees and Applications to Online and Nonlinear Identification

We consider adaptive decision-making problems where an agent optimizes a cumulative performance objective by repeatedly choosing among a finite set of options. Compared to the classical prediction-with-expert-advice set-up, we consider…

Machine Learning · Computer Science 2023-04-10 Michael Muehlebach

Learning to Optimize Via Posterior Sampling

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling,…

Machine Learning · Computer Science 2014-02-04 Daniel Russo , Benjamin Van Roy

First-Order Bayesian Regret Analysis of Thompson Sampling

We address online combinatorial optimization when the player has a prior over the adversary's sequence of losses. In this framework, Russo and Van Roy proposed an information-theoretic analysis of Thompson Sampling based on the information…

Machine Learning · Computer Science 2022-04-05 Sébastien Bubeck , Mark Sellke

Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling

Using bandit algorithms to conduct adaptive randomised experiments can minimise regret, but it poses major challenges for statistical inference (e.g., biased estimators, inflated type-I error and reduced power). Recent attempts to address…

Machine Learning · Statistics 2021-11-02 Nina Deliu , Joseph J. Williams , Sofia S. Villar

Batched Thompson Sampling

We introduce a novel anytime Batched Thompson sampling policy for multi-armed bandits where the agent observes the rewards of her actions and adjusts her policy only at the end of a small number of batches. We show that this policy…

Machine Learning · Computer Science 2021-10-04 Cem Kalkanli , Ayfer Ozgur

Combining Outcome-Based and Preference-Based Matching: A Constrained Priority Mechanism

We introduce a constrained priority mechanism that combines outcome-based matching from machine-learning with preference-based allocation schemes common in market design. Using real-world data, we illustrate how our mechanism could be…

General Economics · Economics 2020-08-13 Avidit Acharya , Kirk Bansak , Jens Hainmueller

On the Prior Sensitivity of Thompson Sampling

The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently…

Machine Learning · Computer Science 2016-07-22 Che-Yu Liu , Lihong Li

Combinatorial Allocation Bandits with Nonlinear Arm Utility

A matching platform is a system that matches different types of participants, such as companies and job-seekers. In such a platform, merely maximizing the number of matches can result in matches being concentrated on highly popular…

Machine Learning · Computer Science 2026-03-10 Yuki Shibukawa , Koichi Tanaka , Yuta Saito , Shinji Ito

Asymptotic Convergence of Thompson Sampling

Thompson sampling has been shown to be an effective policy across a variety of online learning tasks. Many works have analyzed the finite time performance of Thompson sampling, and proved that it achieves a sub-linear regret under a broad…

Machine Learning · Computer Science 2020-11-10 Cem Kalkanli , Ayfer Ozgur

Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous…

Machine Learning · Computer Science 2023-09-28 Danil Provodin , Pratik Gajane , Mykola Pechenizkiy , Maurits Kaptein

Better Optimism By Bayes: Adaptive Planning with Rich Models

The computational costs of inference and planning have confined Bayesian model-based reinforcement learning to one of two dismal fates: powerful Bayes-adaptive planning but only for simplistic models, or powerful, Bayesian non-parametric…

Artificial Intelligence · Computer Science 2014-02-11 Arthur Guez , David Silver , Peter Dayan

Adaptive Model Selection Framework: An Application to Airline Pricing

Multiple machine learning and prediction models are often used for the same prediction or recommendation task. In our recent work, where we develop and deploy airline ancillary pricing models in an online setting, we found that among…

Machine Learning · Computer Science 2019-05-23 Naman Shukla , Arinbjörn Kolbeinsson , Lavanya Marla , Kartik Yellepeddi

Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification

Practitioners conducting adaptive experiments often encounter two competing priorities: maximizing total welfare (or `reward') through effective treatment assignment and swiftly concluding experiments to implement population-wide…

Machine Learning · Computer Science 2024-07-31 Chao Qin , Daniel Russo

Information-Theoretic Confidence Bounds for Reinforcement Learning

We integrate information-theoretic concepts into the design and analysis of optimistic algorithms and Thompson sampling. By making a connection between information-theoretic quantities and confidence bounds, we obtain results that relate…

Machine Learning · Statistics 2019-11-25 Xiuyuan Lu , Benjamin Van Roy

Adaptive Multi-Round Allocation with Stochastic Arrivals

We study a sequential resource allocation problem motivated by adaptive network recruitment, in which a limited budget of identical resources must be allocated over multiple rounds to individuals with stochastic referral capacity.…

Artificial Intelligence · Computer Science 2026-05-13 Yuqi Pan , Davin Choo , Haichuan Wang , Milind Tambe , Alastair van Heerden , Cheryl Johnson

Thompson Sampling for Infinite-Horizon Discounted Decision Processes

This paper develops a viable notion of learning for sampling-based algorithms that applies in broader settings than previously considered. More specifically, we model a discounted infinite-horizon MDPs with Borel state and action spaces,…

Machine Learning · Statistics 2026-04-09 Daniel Adelman , Cagla Keceli , Alba V. Olivares-Nadal

Thompson Sampling with Unrestricted Delays

We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary…

Machine Learning · Computer Science 2022-05-24 Han Wu , Stefan Wager

Thompson Sampling for Combinatorial Semi-Bandits

In this paper, we study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We first analyze the standard TS algorithm for the general CMAB model when the outcome…

Machine Learning · Computer Science 2022-06-22 Siwei Wang , Wei Chen