Related papers: Learning to Optimize via Information-Directed Samp…

An Information-Theoretic Analysis of Thompson Sampling

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and…

Machine Learning · Computer Science 2015-06-09 Daniel Russo , Benjamin Van Roy

Optimistic Information Directed Sampling

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory…

Machine Learning · Computer Science 2024-06-28 Gergely Neu , Matteo Papini , Ludovic Schwartz

Information Directed Sampling for Sparse Linear Bandits

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which…

Machine Learning · Statistics 2021-06-01 Botao Hao , Tor Lattimore , Wei Deng

A Bit Better? Quantifying Information for Bandit Learning

The information ratio offers an approach to assessing the efficacy with which an agent balances between exploration and exploitation. Originally, this was defined to be the ratio between squared expected regret and the mutual information…

Machine Learning · Computer Science 2021-02-19 Adithya M. Devraj , Benjamin Van Roy , Kuang Xu

Sparse Optimistic Information Directed Sampling

Many high-dimensional online decision-making problems can be modeled as stochastic sparse linear bandits. Most existing algorithms are designed to achieve optimal worst-case regret in either the data-rich regime, where polynomial dependence…

Machine Learning · Computer Science 2025-10-29 Ludovic Schwartz , Hamish Flynn , Gergely Neu

Information-directed sampling for bandits: a primer

The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directed Sampling (IDS) policies, a class of heuristics…

Machine Learning · Computer Science 2025-12-24 Annika Hirling , Giorgio Nicoletti , Antonio Celani

Information Directed Sampling for Linear Partial Monitoring

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS)…

Machine Learning · Statistics 2020-02-27 Johannes Kirschner , Tor Lattimore , Andreas Krause

An Information-Theoretic Analysis of Nonstationary Bandit Learning

In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes…

Machine Learning · Computer Science 2023-12-27 Seungki Min , Daniel Russo

Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits

Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one…

Machine Learning · Statistics 2025-03-10 Piotr M. Suder , Eric Laber

Regret Bounds for Information-Directed Reinforcement Learning

Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for Markov Decision Processes (MDPs) is still limited. We develop novel…

Machine Learning · Computer Science 2022-11-28 Botao Hao , Tor Lattimore

Asymptotically Optimal Information-Directed Sampling

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist…

Machine Learning · Statistics 2021-07-05 Johannes Kirschner , Tor Lattimore , Claire Vernade , Csaba Szepesvári

Regret in Online Combinatorial Optimization

We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the best loss she would have…

Machine Learning · Computer Science 2013-04-02 Jean-Yves Audibert , Sébastien Bubeck , Gábor Lugosi

Sampling-based Incremental Information Gathering with Applications to Robotic Exploration and Environmental Monitoring

In this article, we propose a sampling-based motion planning algorithm equipped with an information-theoretic convergence criterion for incremental informative motion planning. The proposed approach allows dense map representations and…

Robotics · Computer Science 2019-05-24 Maani Ghaffari Jadidi , Jaime Valls Miro , Gamini Dissanayake

Efficient Online Learning for Optimizing Value of Information: Theory and Application to Interactive Troubleshooting

We consider the optimal value of information (VoI) problem, where the goal is to sequentially select a set of tests with a minimal cost, so that one can efficiently make the best decision based on the observed outcomes. Existing algorithms…

Artificial Intelligence · Computer Science 2017-07-18 Yuxin Chen , Jean-Michel Renders , Morteza Haghir Chehreghani , Andreas Krause

A Tutorial on Thompson Sampling

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information…

Machine Learning · Computer Science 2020-07-16 Daniel Russo , Benjamin Van Roy , Abbas Kazerouni , Ian Osband , Zheng Wen

An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits

We study the performance of the Thompson Sampling algorithm for logistic bandit problems. In this setting, an agent receives binary rewards with probabilities determined by a logistic function, $\exp(\beta \langle a, \theta…

Machine Learning · Statistics 2025-02-21 Amaury Gouverneur , Borja Rodríguez-Gálvez , Tobias J. Oechtering , Mikael Skoglund

Efficient learning by implicit exploration in bandit problems with side observations

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition…

Machine Learning · Computer Science 2026-04-28 Tomas Kocak , Gergely Neu , Michal Valko , Remi Munos

On Adaptivity in Information-constrained Online Learning

We study how to adapt to smoothly-varying ('easy') environments in well-known online learning problems where acquiring information is expensive. For the problem of label efficient prediction, which is a budgeted version of prediction with…

Machine Learning · Computer Science 2019-12-09 Siddharth Mitra , Aditya Gopalan

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling

The literature on bandit learning and regret analysis has focused on contexts where the goal is to converge on an optimal action in a manner that limits exploration costs. One shortcoming imposed by this orientation is that it does not…

Machine Learning · Computer Science 2017-05-01 Daniel Russo , David Tse , Benjamin Van Roy

Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits

We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we…

Machine Learning · Statistics 2025-10-23 Yuzhou Gu , Yanjun Han , Jian Qian