Related papers: Optimistic Information Directed Sampling

Learning to Optimize via Information-Directed Sampling

We propose information-directed sampling -- a new approach to online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback. Each action is sampled in a manner…

Machine Learning · Computer Science 2017-07-10 Daniel Russo , Benjamin Van Roy

Information Directed Sampling and Bandits with Heteroscedastic Noise

In the stochastic bandit problem, the goal is to maximize an unknown function via a sequence of noisy evaluations. Typically, the observation noise is assumed to be independent of the evaluation point and to satisfy a tail bound uniformly…

Machine Learning · Statistics 2018-04-20 Johannes Kirschner , Andreas Krause

Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits

Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one…

Machine Learning · Statistics 2025-03-10 Piotr M. Suder , Eric Laber

Sparse Optimistic Information Directed Sampling

Many high-dimensional online decision-making problems can be modeled as stochastic sparse linear bandits. Most existing algorithms are designed to achieve optimal worst-case regret in either the data-rich regime, where polynomial dependence…

Machine Learning · Computer Science 2025-10-29 Ludovic Schwartz , Hamish Flynn , Gergely Neu

Information Directed Sampling for Sparse Linear Bandits

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which…

Machine Learning · Statistics 2021-06-01 Botao Hao , Tor Lattimore , Wei Deng

First-Order Bayesian Regret Analysis of Thompson Sampling

We address online combinatorial optimization when the player has a prior over the adversary's sequence of losses. In this framework, Russo and Van Roy proposed an information-theoretic analysis of Thompson Sampling based on the information…

Machine Learning · Computer Science 2022-04-05 Sébastien Bubeck , Mark Sellke

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts. We adapt the information-theoretic perspective of \cite{RvR16} to the contextual setting…

Machine Learning · Computer Science 2023-03-07 Gergely Neu , Julia Olkhovskaya , Matteo Papini , Ludovic Schwartz

Regret Bounds for Information-Directed Reinforcement Learning

Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for Markov Decision Processes (MDPs) is still limited. We develop novel…

Machine Learning · Computer Science 2022-11-28 Botao Hao , Tor Lattimore

An Information-Theoretic Analysis for Thompson Sampling with Many Actions

Information-theoretic Bayesian regret bounds of Russo and Van Roy capture the dependence of regret on prior uncertainty. However, this dependence is through entropy, which can become arbitrarily large as the number of actions increases. We…

Machine Learning · Statistics 2020-07-09 Shi Dong , Benjamin Van Roy

Bayesian Online Model Selection

Online model selection in Bayesian bandits raises a fundamental exploration challenge: When an environment instance is sampled from a prior distribution, how can we design an adaptive strategy that explores multiple bandit learners and…

Machine Learning · Computer Science 2026-02-23 Aida Afshar , Yuke Zhang , Aldo Pacchiano

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary. While existing approaches all require carefully constructing optimistic and biased loss…

Machine Learning · Computer Science 2020-11-02 Chung-Wei Lee , Haipeng Luo , Chen-Yu Wei , Mengxiao Zhang

High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions

Motivated by applications in online bidding and sleeping bandits, we examine the problem of contextual bandits with cross learning, where the learner observes the loss associated with the action across all possible contexts, not just the…

Machine Learning · Computer Science 2025-01-27 Ruiyuan Huang , Zengfeng Huang

Contextual Information-Directed Sampling

Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm. However, it is still unclear what is the right form of information ratio to optimize when contextual…

Machine Learning · Computer Science 2022-06-10 Botao Hao , Tor Lattimore , Chao Qin

Information-directed sampling for bandits: a primer

The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directed Sampling (IDS) policies, a class of heuristics…

Machine Learning · Computer Science 2025-12-24 Annika Hirling , Giorgio Nicoletti , Antonio Celani

Bias-Robust Bayesian Optimization via Dueling Bandits

We consider Bayesian optimization in settings where observations can be adversarially biased, for example by an uncontrolled hidden confounder. Our first contribution is a reduction of the confounded setting to the dueling bandit model.…

Machine Learning · Statistics 2021-06-10 Johannes Kirschner , Andreas Krause

Early Stopping in Contextual Bandits and Inferences

Bandit algorithms sequentially accumulate data using adaptive sampling policies, offering flexibility for real-world applications. However, excessive sampling can be costly, motivating the devolopment of early stopping methods and reliable…

Statistics Theory · Mathematics 2025-02-06 Zihan Cui

Bandits with Partially Observable Confounded Data

We study linear contextual bandits with access to a large, confounded, offline dataset that was sampled from some fixed policy. We show that this problem is closely related to a variant of the bandit problem with side information. We…

Machine Learning · Computer Science 2021-08-11 Guy Tennenholtz , Uri Shalit , Shie Mannor , Yonathan Efroni

Asymptotically Optimal Information-Directed Sampling

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist…

Machine Learning · Statistics 2021-07-05 Johannes Kirschner , Tor Lattimore , Claire Vernade , Csaba Szepesvári

An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces

This paper studies the Bayesian regret of the Thompson Sampling algorithm for bandit problems, building on the information-theoretic framework introduced by Russo and Van Roy (2015). Specifically, it extends the rate-distortion analysis of…

Machine Learning · Statistics 2025-02-05 Amaury Gouverneur , Borja Rodriguez Gálvez , Tobias Oechtering , Mikael Skoglund

Bayesian Design Principles for Frequentist Sequential Learning

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization…

Machine Learning · Computer Science 2024-02-12 Yunbei Xu , Assaf Zeevi