Related papers: Contextual Information-Directed Sampling

Information-directed sampling for bandits: a primer

The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directed Sampling (IDS) policies, a class of heuristics…

Machine Learning · Computer Science 2025-12-24 Annika Hirling , Giorgio Nicoletti , Antonio Celani

Selectively Contextual Bandits

Contextual bandits are widely used in industrial personalization systems. These online learning frameworks learn a treatment assignment policy in the presence of treatment effects that vary with the observed contextual features of the…

Machine Learning · Computer Science 2022-05-11 Claudia Roberts , Maria Dimakopoulou , Qifeng Qiao , Ashok Chandrashekhar , Tony Jebara

Contextual Bandits for adapting to changing User preferences over time

Contextual bandits provide an effective way to model the dynamic data problem in ML by leveraging online (incremental) learning to continuously adjust the predictions based on changing environment. We explore details on contextual bandits,…

Machine Learning · Computer Science 2020-09-24 Dattaraj Rao

Contextual Bandits with Budgeted Information Reveal

Contextual bandit algorithms are commonly used in digital health to recommend personalized treatments. However, to ensure the effectiveness of the treatments, patients are often requested to take actions that have no immediate benefit to…

Machine Learning · Computer Science 2024-03-14 Kyra Gan , Esmaeil Keyvanshokooh , Xueqing Liu , Susan Murphy

Efficient Contextual Bandits with Uninformed Feedback Graphs

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (2023) studies the contextual…

Machine Learning · Computer Science 2024-02-14 Mengxiao Zhang , Yuheng Zhang , Haipeng Luo , Paul Mineiro

Optimistic Information Directed Sampling

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory…

Machine Learning · Computer Science 2024-06-28 Gergely Neu , Matteo Papini , Ludovic Schwartz

Multi-Task Learning for Contextual Bandits

Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad…

Machine Learning · Statistics 2017-05-25 Aniket Anand Deshmukh , Urun Dogan , Clayton Scott

Information Directed Sampling for Linear Partial Monitoring

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS)…

Machine Learning · Statistics 2020-02-27 Johannes Kirschner , Tor Lattimore , Andreas Krause

A Contextual Bandit Bake-off

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these…

Machine Learning · Statistics 2021-06-08 Alberto Bietti , Alekh Agarwal , John Langford

Active Learning for Stochastic Contextual Linear Bandits

A key goal in stochastic contextual linear bandits is to efficiently learn a near-optimal policy. Prior algorithms for this problem learn a policy by strategically sampling actions but naively (passively) sampling contexts from the…

Machine Learning · Computer Science 2026-05-26 Emma Brunskill , Ishani Karmarkar , Zhaoqi Li

Contextual Bandit with Adaptive Feature Extraction

We consider an online decision making setting known as contextual bandit problem, and propose an approach for improving contextual bandit performance by using an adaptive feature extraction (representation learning) based on online…

Artificial Intelligence · Computer Science 2020-09-15 Baihan Lin , Djallel Bouneffouf , Guillermo Cecchi , Irina Rish

Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits

Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one…

Machine Learning · Statistics 2025-03-10 Piotr M. Suder , Eric Laber

Asymptotically Optimal Information-Directed Sampling

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist…

Machine Learning · Statistics 2021-07-05 Johannes Kirschner , Tor Lattimore , Claire Vernade , Csaba Szepesvári

Information Directed Sampling for Sparse Linear Bandits

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which…

Machine Learning · Statistics 2021-06-01 Botao Hao , Tor Lattimore , Wei Deng

Practical Contextual Bandits with Feedback Graphs

While contextual bandit has a mature theory, effectively leveraging different feedback patterns to enhance the pace of learning remains unclear. Bandits with feedback graphs, which interpolates between the full information and bandit…

Machine Learning · Computer Science 2023-10-30 Mengxiao Zhang , Yuheng Zhang , Olga Vrousgou , Haipeng Luo , Paul Mineiro

Online learning with Corrupted context: Corrupted Contextual Bandits

We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the context used at each decision may be corrupted ("useless context"). This…

Machine Learning · Computer Science 2020-06-30 Djallel Bouneffouf

Information Directed Sampling for Stochastic Bandits with Graph Feedback

We consider stochastic multi-armed bandit problems with graph feedback, where the decision maker is allowed to observe the neighboring actions of the chosen action. We allow the graph structure to vary with time and consider both…

Machine Learning · Computer Science 2017-11-10 Fang Liu , Swapna Buccapatnam , Ness Shroff

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

Contextual bandits are widely-used in the study of learning-based control policies for finite action spaces. While the problem is well-studied for bandits with perfectly observed context vectors, little is known about the case of…

Machine Learning · Statistics 2022-02-03 Hongju Park , Mohamad Kazem Shirani Faradonbeh

BanditRank: Learning to Rank Using Contextual Bandits

We propose an extensible deep learning method that uses reinforcement learning to train neural networks for offline ranking in information retrieval (IR). We call our method BanditRank as it treats ranking as a contextual bandit problem. In…

Information Retrieval · Computer Science 2019-10-24 Phanideep Gampa , Sumio Fujita

Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits

An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an…

Machine Learning · Computer Science 2023-11-06 Aditya Ramesh , Paulo Rauber , Michelangelo Conserva , Jürgen Schmidhuber