Related papers: Algorithm Selection as a Bandit Problem with Unbou…

Online Model Selection: a Rested Bandit Formulation

Motivated by a natural problem in online model selection with bandit information, we introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm expected losses decrease with the number of times the arm…

Machine Learning · Statistics 2020-12-08 Leonardo Cella , Claudio Gentile , Massimiliano Pontil

More Adaptive Algorithms for Adversarial Bandits

We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem). When instantiated differently, our algorithm achieves various new data-dependent regret…

Machine Learning · Computer Science 2018-06-08 Chen-Yu Wei , Haipeng Luo

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by…

Machine Learning · Computer Science 2020-06-11 Yasin Abbasi-Yadkori , Aldo Pacchiano , My Phan

Adversarial bandit optimization for approximately linear functions

We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player's choice. We…

Machine Learning · Computer Science 2026-01-07 Zhuoyu Cheng , Kohei Hatano , Eiji Takimoto

Model Selection in Contextual Stochastic Bandit Problems

We study bandit model selection in stochastic environments. Our approach relies on a meta-algorithm that selects between candidate base algorithms. We develop a meta-algorithm-base algorithm abstraction that can work with general classes of…

Machine Learning · Computer Science 2022-12-06 Aldo Pacchiano , My Phan , Yasin Abbasi-Yadkori , Anup Rao , Julian Zimmert , Tor Lattimore , Csaba Szepesvari

Adversarial Bandit Optimization with Globally Bounded Perturbations to Linear Losses

We study a class of adversarial bandit optimization problems in which the loss functions may be non-convex and non-smooth. In each round, the learner observes a loss that consists of an underlying linear component together with an…

Machine Learning · Computer Science 2026-03-30 Zhuoyu Cheng , Kohei Hatano , Eiji Takimoto

Fractional Moments on Bandit Problems

Reinforcement learning addresses the dilemma between exploration to find profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments…

Machine Learning · Computer Science 2012-02-20 Ananda Narayanan B , Balaraman Ravindran

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of…

Machine Learning · Computer Science 2022-10-18 Viktor Bengs , Eyke Hüllermeier

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the stochastic linear bandit setup with a fixed confident constraint. In the considered best action identification problem, instead of minimizing the accumulative regret as…

Machine Learning · Computer Science 2018-12-04 Jun Geng , Lifeng Lai

Regret in Online Combinatorial Optimization

We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the best loss she would have…

Machine Learning · Computer Science 2013-04-02 Jean-Yves Audibert , Sébastien Bubeck , Gábor Lugosi

Online Stochastic Linear Optimization under One-bit Feedback

In this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement…

Machine Learning · Computer Science 2015-09-28 Lijun Zhang , Tianbao Yang , Rong Jin , Zhi-Hua Zhou

Regret Bound Balancing and Elimination for Model Selection in Bandits and RL

We propose a simple model selection approach for algorithms in stochastic bandit and reinforcement learning problems. As opposed to prior work that (implicitly) assumes knowledge of the optimal regret, we only require that each base…

Machine Learning · Computer Science 2020-12-25 Aldo Pacchiano , Christoph Dann , Claudio Gentile , Peter Bartlett

Regret Bounds for Batched Bandits

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number…

Data Structures and Algorithms · Computer Science 2020-02-19 Hossein Esfandiari , Amin Karbasi , Abbas Mehrabian , Vahab Mirrokni

Algorithms for Linear Bandits on Polyhedral Sets

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama

Online learning in bandits with predicted context

We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications…

Machine Learning · Statistics 2024-03-19 Yongyi Guo , Ziping Xu , Susan Murphy

Online Learning with Off-Policy Feedback

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead…

Machine Learning · Computer Science 2022-07-20 Germano Gabbianelli , Matteo Papini , Gergely Neu

Adaptive Bandit Algorithms for Contextual Matching Markets

We study bandit learning in matching markets, where players and arms constitute the two market sides, and the players' utilities are linear in the arm contexts. In each round, new arms arrive with observable contexts. Then, the algorithm…

Machine Learning · Computer Science 2026-05-28 Shiyun Lin , Simon Mauras , Vianney Perchet , Nadav Merlis

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary. While existing approaches all require carefully constructing optimistic and biased loss…

Machine Learning · Computer Science 2020-11-02 Chung-Wei Lee , Haipeng Luo , Chen-Yu Wei , Mengxiao Zhang

Preselection Bandits

In this paper, we introduce the Preselection Bandit problem, in which the learner preselects a subset of arms (choice alternatives) for a user, which then chooses the final arm from this subset. The learner is not aware of the user's…

Machine Learning · Computer Science 2021-12-23 Viktor Bengs , Eyke Hüllermeier

Non-stochastic Bandits With Evolving Observations

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the…

Machine Learning · Computer Science 2024-05-28 Yogev Bar-On , Yishay Mansour