English
Related papers

Related papers: Learning to Optimize via Information-Directed Samp…

200 papers

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and…

Machine Learning · Computer Science 2015-06-09 Daniel Russo , Benjamin Van Roy

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory…

Machine Learning · Computer Science 2024-06-28 Gergely Neu , Matteo Papini , Ludovic Schwartz

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which…

Machine Learning · Statistics 2021-06-01 Botao Hao , Tor Lattimore , Wei Deng

The information ratio offers an approach to assessing the efficacy with which an agent balances between exploration and exploitation. Originally, this was defined to be the ratio between squared expected regret and the mutual information…

Machine Learning · Computer Science 2021-02-19 Adithya M. Devraj , Benjamin Van Roy , Kuang Xu

Many high-dimensional online decision-making problems can be modeled as stochastic sparse linear bandits. Most existing algorithms are designed to achieve optimal worst-case regret in either the data-rich regime, where polynomial dependence…

Machine Learning · Computer Science 2025-10-29 Ludovic Schwartz , Hamish Flynn , Gergely Neu

The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directed Sampling (IDS) policies, a class of heuristics…

Machine Learning · Computer Science 2025-12-24 Annika Hirling , Giorgio Nicoletti , Antonio Celani

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS)…

Machine Learning · Statistics 2020-02-27 Johannes Kirschner , Tor Lattimore , Andreas Krause

In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes…

Machine Learning · Computer Science 2023-12-27 Seungki Min , Daniel Russo

Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one…

Machine Learning · Statistics 2025-03-10 Piotr M. Suder , Eric Laber

Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for Markov Decision Processes (MDPs) is still limited. We develop novel…

Machine Learning · Computer Science 2022-11-28 Botao Hao , Tor Lattimore

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist…

Machine Learning · Statistics 2021-07-05 Johannes Kirschner , Tor Lattimore , Claire Vernade , Csaba Szepesvári

We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the best loss she would have…

Machine Learning · Computer Science 2013-04-02 Jean-Yves Audibert , Sébastien Bubeck , Gábor Lugosi

In this article, we propose a sampling-based motion planning algorithm equipped with an information-theoretic convergence criterion for incremental informative motion planning. The proposed approach allows dense map representations and…

Robotics · Computer Science 2019-05-24 Maani Ghaffari Jadidi , Jaime Valls Miro , Gamini Dissanayake

We consider the optimal value of information (VoI) problem, where the goal is to sequentially select a set of tests with a minimal cost, so that one can efficiently make the best decision based on the observed outcomes. Existing algorithms…

Artificial Intelligence · Computer Science 2017-07-18 Yuxin Chen , Jean-Michel Renders , Morteza Haghir Chehreghani , Andreas Krause

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information…

Machine Learning · Computer Science 2020-07-16 Daniel Russo , Benjamin Van Roy , Abbas Kazerouni , Ian Osband , Zheng Wen

We study the performance of the Thompson Sampling algorithm for logistic bandit problems. In this setting, an agent receives binary rewards with probabilities determined by a logistic function, $\exp(\beta \langle a, \theta…

Machine Learning · Statistics 2025-02-21 Amaury Gouverneur , Borja Rodríguez-Gálvez , Tobias J. Oechtering , Mikael Skoglund

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition…

Machine Learning · Computer Science 2026-04-28 Tomas Kocak , Gergely Neu , Michal Valko , Remi Munos

We study how to adapt to smoothly-varying ('easy') environments in well-known online learning problems where acquiring information is expensive. For the problem of label efficient prediction, which is a budgeted version of prediction with…

Machine Learning · Computer Science 2019-12-09 Siddharth Mitra , Aditya Gopalan

The literature on bandit learning and regret analysis has focused on contexts where the goal is to converge on an optimal action in a manner that limits exploration costs. One shortcoming imposed by this orientation is that it does not…

Machine Learning · Computer Science 2017-05-01 Daniel Russo , David Tse , Benjamin Van Roy

We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we…

Machine Learning · Statistics 2025-10-23 Yuzhou Gu , Yanjun Han , Jian Qian
‹ Prev 1 2 3 10 Next ›