English
Related papers

Related papers: Bandit Problems with Side Observations

200 papers

This paper considers stochastic bandits with side observations, a model that accounts for both the exploration/exploitation dilemma and relationships between arms. In this setting, after pulling an arm i, the decision maker also observes…

Machine Learning · Computer Science 2012-10-19 Stephane Caron , Branislav Kveton , Marc Lelarge , Smriti Bhagat

We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward.…

Statistics Theory · Mathematics 2010-03-09 Philippe Rigollet , Assaf Zeevi

In this paper, we investigate a largely extended version of classical MAB problem, called networked combinatorial bandit problems. In particular, we consider the setting of a decision maker over a networked bandits as follows: each time a…

Machine Learning · Computer Science 2015-03-23 Shaojie Tang , Yaqin Zhou

We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. In addition to observing the reward of the chosen action, the decision maker gets side observations on the reward he…

Machine Learning · Computer Science 2011-10-26 Shie Mannor , Ohad Shamir

We consider a bandit problem where at any time, the decision maker can add new arms to her consideration set. A new arm is queried at a cost from an "arm-reservoir" containing finitely many "arm-types," each characterized by a distinct mean…

Machine Learning · Computer Science 2022-10-10 Anand Kalvit , Assaf Zeevi

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…

Statistics Theory · Mathematics 2013-03-29 Junya Honda , Akimichi Takemura

Reinforcement learning addresses the dilemma between exploration to find profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments…

Machine Learning · Computer Science 2012-02-20 Ananda Narayanan B , Balaraman Ravindran

A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to…

Machine Learning · Computer Science 2020-03-03 Xiao Xu , Fang Dong , Yanghua Li , Shaojian He , Xin Li

The combinatorial stochastic semi-bandit problem is an extension of the classical multi-armed bandit problem in which an algorithm pulls more than one arm at each stage and the rewards of all pulled arms are revealed. One difference with…

Machine Learning · Computer Science 2016-12-07 Rémy Degenne , Vianney Perchet

This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm's reward distribution. A major obstacle in this setting is the existence of compound…

Machine Learning · Computer Science 2023-12-21 Wen Huang , Xintao Wu

We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection. The learner can only sample arms via certain exploration bundles, which we refer to as boxes. In…

Machine Learning · Computer Science 2023-05-11 Kota Srinivas Reddy , P. N. Karthik , Nikhil Karamchandani , Jayakrishnan Nair

Contextual bandits constitute a classical framework for decision-making under uncertainty. In this setting, the goal is to learn the arms of highest reward subject to contextual information, while the unknown reward parameters of each arm…

Machine Learning · Statistics 2024-02-19 Hongju Park , Mohamad Kazem Shirani Faradonbeh

This paper studies the Best-of-K Bandit game: At each time the player chooses a subset S among all N-choose-K possible options and observes reward max(X(i) : i in S) where X is a random vector drawn from a joint distribution. The objective…

Machine Learning · Computer Science 2016-03-22 Max Simchowitz , Kevin Jamieson , Benjamin Recht

We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist. Assoc. 74 (1979) 799--806], which involves sequential sampling from two populations: one whose characteristics are known, and one which depends on an unknown parameter…

Probability · Mathematics 2009-09-02 Alexander Goldenshluger , Assaf Zeevi

We study the experimentation dynamics of a decision maker (DM) in a two-armed bandit setup (Bolton and Harris (1999)), where the agent holds ambiguous beliefs regarding the distribution of the return process of one arm and is certain about…

Theoretical Economics · Economics 2021-04-02 Farzad Pourbabaee

We introduce and study a new class of stochastic bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms (so that their rewards…

Machine Learning · Computer Science 2020-04-03 Simon Lindståhl , Alexandre Proutiere , Andreas Johnsson

We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows…

Statistics Theory · Mathematics 2013-05-27 Vianney Perchet , Philippe Rigollet

We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards. However, playing an arm blocks it (across all contexts) for a…

Machine Learning · Computer Science 2020-06-18 Soumya Basu , Orestis Papadigenopoulos , Constantine Caramanis , Sanjay Shakkottai

Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences.…

Machine Learning · Computer Science 2019-01-28 Lawrence Chan , Dylan Hadfield-Menell , Siddhartha Srinivasa , Anca Dragan

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…

Machine Learning · Computer Science 2022-10-28 Nadav Merlis , Yonathan Efroni , Shie Mannor
‹ Prev 1 2 3 10 Next ›