Related papers: Linearly Parameterized Bandits

Parametrized Stochastic Multi-armed Bandits with Binary Rewards

In this paper, we consider the problem of multi-armed bandits with a large, possibly infinite number of correlated arms. We assume that the arms have Bernoulli distributed rewards, independent across time, where the probabilities of success…

Machine Learning · Computer Science 2011-11-21 Chong Jiang , R. Srikant

Experimental Design for Semiparametric Bandits

We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in…

Machine Learning · Statistics 2025-06-18 Seok-Jin Kim , Gi-Soo Kim , Min-hwan Oh

Algorithms for Linear Bandits on Polyhedral Sets

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama

Nonparametric Bandits with Covariates

We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward.…

Statistics Theory · Mathematics 2010-03-09 Philippe Rigollet , Assaf Zeevi

Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards

In the classic multi-armed bandits problem, the goal is to have a policy for dynamically operating arms that each yield stochastic rewards with unknown means. The key metric of interest is regret, defined as the gap between the expected…

Optimization and Control · Mathematics 2010-11-23 Yi Gai , Bhaskar Krishnamachari , Rahul Jain

Global Bandits

Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the…

Machine Learning · Computer Science 2018-03-22 Onur Atan , Cem Tekin , Mihaela van der Schaar

Budget-Constrained Bandits over General Cost and Reward Distributions

We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is…

Machine Learning · Computer Science 2020-03-03 Semih Cayci , Atilla Eryilmaz , R. Srikant

Stochastic Bandits with Linear Constraints

We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies, whose expected cumulative reward over the course of $T$ rounds is maximum, and each has an expected cost below a…

Machine Learning · Computer Science 2020-06-20 Aldo Pacchiano , Mohammad Ghavamzadeh , Peter Bartlett , Heinrich Jiang

Stochastic continuum armed bandit problem of few linear parameters in high dimensions

We consider a stochastic continuum armed bandit problem where the arms are indexed by the $\ell_2$ ball $B_{d}(1+\nu)$ of radius $1+\nu$ in $\mathbb{R}^d$. The reward functions $r :B_{d}(1+\nu) \rightarrow \mathbb{R}$ are considered to…

Machine Learning · Statistics 2017-05-31 Hemant Tyagi , Sebastian Stich , Bernd Gärtner

Finite Continuum-Armed Bandits

We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of actions. Each action can be completed at most once and results in a stochastic reward with unknown mean. The goal of the agent is to…

Statistics Theory · Mathematics 2020-11-04 Solenne Gaucher

Safe Linear Stochastic Bandits

We introduce the safe linear stochastic bandit framework---a generalization of linear stochastic bandits---where, in each stage, the learner is required to select an arm with an expected reward that is no less than a predetermined (safe)…

Machine Learning · Statistics 2019-11-22 Kia Khezeli , Eilyan Bitar

Linear Bandits with Partially Observable Features

We study the linear bandit problem that accounts for partially observable features. Without proper handling, unobserved features can lead to linear regret in the decision horizon $T$, as their influence on rewards is unknown. To tackle this…

Machine Learning · Statistics 2025-08-19 Wonyoung Kim , Sungwoo Park , Garud Iyengar , Assaf Zeevi , Min-hwan Oh

Be Greedy in Multi-Armed Bandits

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…

Machine Learning · Computer Science 2021-01-05 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

Thresholding Bandit with Optimal Aggregate Regret

We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $\theta$, with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the…

Machine Learning · Computer Science 2019-05-28 Chao Tao , Saùl Blanco , Jian Peng , Yuan Zhou

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the…

Machine Learning · Computer Science 2022-06-28 Yifan Lin , Yuhao Wang , Enlu Zhou

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a…

Machine Learning · Computer Science 2025-01-28 Ahmed Ben Yahmed , Clément Calauzènes , Vianney Perchet

Sparse Stochastic Bandits

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales…

Machine Learning · Computer Science 2017-06-06 Joon Kwon , Vianney Perchet , Claire Vernade

Adversarial bandit optimization for approximately linear functions

We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player's choice. We…

Machine Learning · Computer Science 2026-01-07 Zhuoyu Cheng , Kohei Hatano , Eiji Takimoto

Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit

We consider a linear stochastic bandit problem where the dimension $K$ of the unknown parameter $\theta$ is larger than the sampling budget $n$. In such cases, it is in general impossible to derive sub-linear regret bounds since usual…

Statistics Theory · Mathematics 2012-05-23 Alexandra Carpentier , Rémi Munos

Stochastic Bandit Based on Empirical Moments

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…

Statistics Theory · Mathematics 2013-03-29 Junya Honda , Akimichi Takemura