English
Related papers

Related papers: A Novel Confidence-Based Algorithm for Structured …

200 papers

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is…

Machine Learning · Computer Science 2014-11-12 Tor Lattimore , Remi Munos

We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods…

Machine Learning · Statistics 2020-07-03 Rémy Degenne , Han Shao , Wouter M. Koolen

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret. Besides the theoretical results, the new algorithm is simple, efficient and…

Machine Learning · Computer Science 2016-02-25 Tor Lattimore

We consider a stochastic bandit problem with countably many arms that belong to a finite set of types, each characterized by a unique mean reward. In addition, there is a fixed distribution over types which sets the proportion of each type…

Machine Learning · Computer Science 2021-05-25 Anand Kalvit , Assaf Zeevi

We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and…

Machine Learning · Computer Science 2022-06-16 Emil Carlsson , Devdatt Dubhashi , Fredrik D. Johansson

We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in…

Machine Learning · Statistics 2025-06-18 Seok-Jin Kim , Gi-Soo Kim , Min-hwan Oh

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales…

Machine Learning · Computer Science 2017-06-06 Joon Kwon , Vianney Perchet , Claire Vernade

We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous…

Machine Learning · Computer Science 2015-05-19 Alexandra Carpentier , Michal Valko

We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem). When instantiated differently, our algorithm achieves various new data-dependent regret…

Machine Learning · Computer Science 2018-06-08 Chen-Yu Wei , Haipeng Luo

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number…

Data Structures and Algorithms · Computer Science 2020-02-19 Hossein Esfandiari , Amin Karbasi , Abbas Mehrabian , Vahab Mirrokni

We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (e.g. the maximal difference between two losses in a given round). Despite a recent…

Machine Learning · Computer Science 2020-01-03 Nicolò Cesa-Bianchi , Ohad Shamir

This paper investigates stochastic multi-armed bandit algorithms that are robust to adversarial attacks, where an attacker can first observe the learner's action and {then} alter their reward observation. We study two cases of this model,…

Machine Learning · Computer Science 2024-08-19 Xuchuang Wang , Jinhang Zuo , Xutong Liu , John C. S. Lui , Mohammad Hajiesmaili

We study the problem of corralling stochastic bandit algorithms, that is combining multiple bandit algorithms designed for a stochastic environment, with the goal of devising a corralling algorithm that performs almost as well as the best…

Machine Learning · Computer Science 2021-03-02 Raman Arora , Teodor V. Marinov , Mehryar Mohri

We consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms' rewards are stochastic and independent over time, but that the absolute difference between the expected…

Machine Learning · Computer Science 2021-10-26 Ramakrishnan Krishnamurthy , Aditya Gopalan

We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope 2009, Yu 2011). The set of arms is either…

Machine Learning · Computer Science 2014-05-21 Richard Combes , Alexandre Proutiere

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e. those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. $arm$). We study a particular case of the rested…

Machine Learning · Statistics 2024-11-28 Marco Fiandri , Alberto Maria Metelli , Francesco Trov`o

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special…

Machine Learning · Computer Science 2018-10-30 Julian Zimmert , Yevgeny Seldin

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama

We introduce and study a new class of stochastic bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms (so that their rewards…

Machine Learning · Computer Science 2020-04-03 Simon Lindståhl , Alexandre Proutiere , Andreas Johnsson

Motivated by a natural problem in online model selection with bandit information, we introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm expected losses decrease with the number of times the arm…

Machine Learning · Statistics 2020-12-08 Leonardo Cella , Claudio Gentile , Massimiliano Pontil
‹ Prev 1 2 3 10 Next ›