English
Related papers

Related papers: Asymptotically Optimal Information-Directed Sampli…

200 papers

Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one…

Machine Learning · Statistics 2025-03-10 Piotr M. Suder , Eric Laber

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS)…

Machine Learning · Statistics 2020-02-27 Johannes Kirschner , Tor Lattimore , Andreas Krause

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which…

Machine Learning · Statistics 2021-06-01 Botao Hao , Tor Lattimore , Wei Deng

Many high-dimensional online decision-making problems can be modeled as stochastic sparse linear bandits. Most existing algorithms are designed to achieve optimal worst-case regret in either the data-rich regime, where polynomial dependence…

Machine Learning · Computer Science 2025-10-29 Ludovic Schwartz , Hamish Flynn , Gergely Neu

In the stochastic bandit problem, the goal is to maximize an unknown function via a sequence of noisy evaluations. Typically, the observation noise is assumed to be independent of the evaluation point and to satisfy a tail bound uniformly…

Machine Learning · Statistics 2018-04-20 Johannes Kirschner , Andreas Krause

We consider Bayesian optimization in settings where observations can be adversarially biased, for example by an uncontrolled hidden confounder. Our first contribution is a reduction of the confounded setting to the dueling bandit model.…

Machine Learning · Statistics 2021-06-10 Johannes Kirschner , Andreas Krause

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the optimism…

Machine Learning · Statistics 2016-10-17 Tor Lattimore , Csaba Szepesvari

In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit the structure of the problem and have been shown to be asymptotically suboptimal. In this paper, we follow recent approaches of deriving…

Machine Learning · Computer Science 2020-11-23 Andrea Tirinzoni , Matteo Pirotta , Marcello Restelli , Alessandro Lazaric

Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm. However, it is still unclear what is the right form of information ratio to optimize when contextual…

Machine Learning · Computer Science 2022-06-10 Botao Hao , Tor Lattimore , Chao Qin

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory…

Machine Learning · Computer Science 2024-06-28 Gergely Neu , Matteo Papini , Ludovic Schwartz

We propose information-directed sampling -- a new approach to online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback. Each action is sampled in a manner…

Machine Learning · Computer Science 2017-07-10 Daniel Russo , Benjamin Van Roy

The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directed Sampling (IDS) policies, a class of heuristics…

Machine Learning · Computer Science 2025-12-24 Annika Hirling , Giorgio Nicoletti , Antonio Celani

We consider the best-k-arm identification problem for multi-armed bandits, where the objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort. We characterize the necessary…

Machine Learning · Statistics 2023-07-18 Wei You , Chao Qin , Zihao Wang , Shuoguang Yang

For the model of constrained multi-armed bandit, we show that by construction there exists an index-based deterministic asymptotically optimal algorithm. The optimality is achieved by the convergence of the probability of choosing an…

Optimization and Control · Mathematics 2020-07-30 Hyeong Soo Chang

For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the $\epsilon_t$-greedy strategy. We…

Optimization and Control · Mathematics 2018-05-04 Hyeong Soo Chang

Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent…

Machine Learning · Computer Science 2023-11-15 Johannes Kirschner , Tor Lattimore , Andreas Krause

Past research on interactive decision making problems (bandits, reinforcement learning, etc.) mostly focuses on the minimax regret that measures the algorithm's performance on the hardest instance. However, an ideal algorithm should adapt…

Machine Learning · Computer Science 2023-06-13 Kefan Dong , Tengyu Ma

Consider the problem of a controller sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $X^i_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $X^i_k$ denotes the outcome from population…

Machine Learning · Statistics 2015-09-25 Wesley Cowan , Michael N. Katehakis

We consider linear stochastic bandits where the set of actions is an ellipsoid. We provide the first known minimax optimal algorithm for this problem. We first derive a novel information-theoretic lower bound on the regret of any algorithm,…

Machine Learning · Statistics 2025-02-25 Raymond Zhang , Hedi Hadiji , Richard Combes

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and…

Machine Learning · Statistics 2017-09-21 Pierre Ménard , Aurélien Garivier
‹ Prev 1 2 3 10 Next ›