Related papers: Bayesian Online Model Selection

Model Selection in Contextual Stochastic Bandit Problems

We study bandit model selection in stochastic environments. Our approach relies on a meta-algorithm that selects between candidate base algorithms. We develop a meta-algorithm-base algorithm abstraction that can work with general classes of…

Machine Learning · Computer Science 2022-12-06 Aldo Pacchiano , My Phan , Yasin Abbasi-Yadkori , Anup Rao , Julian Zimmert , Tor Lattimore , Csaba Szepesvari

Model selection for contextual bandits

We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation. Our main result is a new model selection guarantee for…

Machine Learning · Computer Science 2019-11-15 Dylan J. Foster , Akshay Krishnamurthy , Haipeng Luo

Data-Driven Online Model Selection With Regret Guarantees

We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies…

Machine Learning · Computer Science 2024-01-24 Aldo Pacchiano , Christoph Dann , Claudio Gentile

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by…

Machine Learning · Computer Science 2020-06-11 Yasin Abbasi-Yadkori , Aldo Pacchiano , My Phan

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

We study the problem of $K$-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online…

Machine Learning · Computer Science 2022-02-15 Aadirupa Saha , Pierre Gaillard

Best of Both Worlds Model Selection

We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees. Our…

Machine Learning · Computer Science 2022-07-01 Aldo Pacchiano , Christoph Dann , Claudio Gentile

Anytime Model Selection in Linear Bandits

Model selection in the context of bandit optimization is a challenging problem, as it requires balancing exploration and exploitation not only for action selection, but also for model selection. One natural approach is to rely on online…

Machine Learning · Statistics 2023-11-14 Parnian Kassraie , Nicolas Emmenegger , Andreas Krause , Aldo Pacchiano

Offline Local Search for Online Stochastic Bandits

Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps, each time selecting an action and learning the cost of that action. The…

Machine Learning · Computer Science 2026-04-13 Gerdus Benadè , Rathish Das , Thomas Lavastida

Optimistic Information Directed Sampling

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory…

Machine Learning · Computer Science 2024-06-28 Gergely Neu , Matteo Papini , Ludovic Schwartz

Bayesian Design Principles for Frequentist Sequential Learning

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization…

Machine Learning · Computer Science 2024-02-12 Yunbei Xu , Assaf Zeevi

The Impact of Batch Learning in Stochastic Bandits

We consider a special case of bandit problems, namely batched bandits. Motivated by natural restrictions of recommender systems and e-commerce platforms, we assume that a learning agent observes responses batched in groups over a certain…

Machine Learning · Computer Science 2021-11-04 Danil Provodin , Pratik Gajane , Mykola Pechenizkiy , Maurits Kaptein

Meta-Learning for Simple Regret Minimization

We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a learning agent interacts with a sequence of bandit tasks, which are sampled i.i.d.\ from an unknown prior distribution, and learns its…

Machine Learning · Computer Science 2023-07-06 Mohammadjavad Azizi , Branislav Kveton , Mohammad Ghavamzadeh , Sumeet Katariya

Bayesian Regret Minimization in Offline Bandits

We study how to make decisions that minimize Bayesian regret in offline linear bandits. Prior work suggests that one must take actions with maximum lower confidence bound (LCB) on their reward. We argue that the reliance on LCB is…

Machine Learning · Computer Science 2024-07-04 Marek Petrik , Guy Tennenholtz , Mohammad Ghavamzadeh

Online Stochastic Linear Optimization under One-bit Feedback

In this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement…

Machine Learning · Computer Science 2015-09-28 Lijun Zhang , Tianbao Yang , Rong Jin , Zhi-Hua Zhou

Pareto Optimal Model Selection in Linear Bandits

We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by $d_\star$) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers…

Machine Learning · Statistics 2022-03-17 Yinglun Zhu , Robert Nowak

Online Learning and Bandits with Queried Hints

We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at each step, the online policy can probe and find out which of a small number ($k$) of choices has better reward (or loss) before making its…

Data Structures and Algorithms · Computer Science 2022-11-08 Aditya Bhaskara , Sreenivas Gollapudi , Sungjin Im , Kostas Kollias , Kamesh Munagala

Non-stochastic Bandits With Evolving Observations

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the…

Machine Learning · Computer Science 2024-05-28 Yogev Bar-On , Yishay Mansour

Empirical Bayes Regret Minimization

Most bandit algorithm designs are purely theoretical. Therefore, they have strong regret guarantees, but also are often too conservative in practice. In this work, we pioneer the idea of algorithm design by minimizing the empirical Bayes…

Machine Learning · Computer Science 2020-06-12 Chih-Wei Hsu , Branislav Kveton , Ofer Meshi , Martin Mladenov , Csaba Szepesvari

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards

In this paper, we consider the problem of sleeping bandits with stochastic action sets and adversarial rewards. In this setting, in contrast to most work in bandits, the actions may not be available at all times. For instance, some products…

Machine Learning · Computer Science 2020-08-11 Aadirupa Saha , Pierre Gaillard , Michal Valko

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently. Existing techniques often require careful adaptation to every…

Machine Learning · Computer Science 2023-02-21 Christoph Dann , Chen-Yu Wei , Julian Zimmert