English
Related papers

Related papers: Linearly Parameterized Bandits

200 papers

In this paper, we consider the problem of multi-armed bandits with a large, possibly infinite number of correlated arms. We assume that the arms have Bernoulli distributed rewards, independent across time, where the probabilities of success…

Machine Learning · Computer Science 2011-11-21 Chong Jiang , R. Srikant

We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in…

Machine Learning · Statistics 2025-06-18 Seok-Jin Kim , Gi-Soo Kim , Min-hwan Oh

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama

We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward.…

Statistics Theory · Mathematics 2010-03-09 Philippe Rigollet , Assaf Zeevi

In the classic multi-armed bandits problem, the goal is to have a policy for dynamically operating arms that each yield stochastic rewards with unknown means. The key metric of interest is regret, defined as the gap between the expected…

Optimization and Control · Mathematics 2010-11-23 Yi Gai , Bhaskar Krishnamachari , Rahul Jain

Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the…

Machine Learning · Computer Science 2018-03-22 Onur Atan , Cem Tekin , Mihaela van der Schaar

We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is…

Machine Learning · Computer Science 2020-03-03 Semih Cayci , Atilla Eryilmaz , R. Srikant

We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies, whose expected cumulative reward over the course of $T$ rounds is maximum, and each has an expected cost below a…

Machine Learning · Computer Science 2020-06-20 Aldo Pacchiano , Mohammad Ghavamzadeh , Peter Bartlett , Heinrich Jiang

We consider a stochastic continuum armed bandit problem where the arms are indexed by the $\ell_2$ ball $B_{d}(1+\nu)$ of radius $1+\nu$ in $\mathbb{R}^d$. The reward functions $r :B_{d}(1+\nu) \rightarrow \mathbb{R}$ are considered to…

Machine Learning · Statistics 2017-05-31 Hemant Tyagi , Sebastian Stich , Bernd Gärtner

We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of actions. Each action can be completed at most once and results in a stochastic reward with unknown mean. The goal of the agent is to…

Statistics Theory · Mathematics 2020-11-04 Solenne Gaucher

We introduce the safe linear stochastic bandit framework---a generalization of linear stochastic bandits---where, in each stage, the learner is required to select an arm with an expected reward that is no less than a predetermined (safe)…

Machine Learning · Statistics 2019-11-22 Kia Khezeli , Eilyan Bitar

We study the linear bandit problem that accounts for partially observable features. Without proper handling, unobserved features can lead to linear regret in the decision horizon $T$, as their influence on rewards is unknown. To tackle this…

Machine Learning · Statistics 2025-08-19 Wonyoung Kim , Sungwoo Park , Garud Iyengar , Assaf Zeevi , Min-hwan Oh

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…

Machine Learning · Computer Science 2021-01-05 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $\theta$, with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the…

Machine Learning · Computer Science 2019-05-28 Chao Tao , Saùl Blanco , Jian Peng , Yuan Zhou

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the…

Machine Learning · Computer Science 2022-06-28 Yifan Lin , Yuhao Wang , Enlu Zhou

We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a…

Machine Learning · Computer Science 2025-01-28 Ahmed Ben Yahmed , Clément Calauzènes , Vianney Perchet

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales…

Machine Learning · Computer Science 2017-06-06 Joon Kwon , Vianney Perchet , Claire Vernade

We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player's choice. We…

Machine Learning · Computer Science 2026-01-07 Zhuoyu Cheng , Kohei Hatano , Eiji Takimoto

We consider a linear stochastic bandit problem where the dimension $K$ of the unknown parameter $\theta$ is larger than the sampling budget $n$. In such cases, it is in general impossible to derive sub-linear regret bounds since usual…

Statistics Theory · Mathematics 2012-05-23 Alexandra Carpentier , Rémi Munos

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…

Statistics Theory · Mathematics 2013-03-29 Junya Honda , Akimichi Takemura
‹ Prev 1 2 3 10 Next ›