Related papers: Online Algorithm for Unsupervised Sequential Selec…

Thompson Sampling for Unsupervised Sequential Selection

Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS)…

Machine Learning · Computer Science 2020-09-17 Arun Verma , Manjesh K. Hanawal , Nandyala Hemachandra

Sequential Learning without Feedback

In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn}…

Machine Learning · Computer Science 2016-10-19 Manjesh Hanawal , Csaba Szepesvari , Venkatesh Saligrama

Online Algorithm for Unsupervised Sensor Selection

In many security and healthcare systems, the detection and diagnosis systems use a sequence of sensors/tests. Each test outputs a prediction of the latent state and carries an inherent cost. However, the correctness of the predictions…

Machine Learning · Computer Science 2019-03-05 Arun Verma , Manjesh K. Hanawal , Csaba Szepesvári , Venkatesh Saligrama

Risk-Aware Algorithms for Adversarial Contextual Bandits

In this work we consider adversarial contextual bandits with risk constraints. At each round, nature prepares a context, a cost for each arm, and additionally a risk for each arm. The learner leverages the context to pull an arm and then…

Machine Learning · Computer Science 2016-10-18 Wen Sun , Debadeepta Dey , Ashish Kapoor

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the…

Machine Learning · Computer Science 2020-03-26 P Sharoff , Nishant A. Mehta , Ravi Ganti

Cost-aware Cascading Bandits

In this paper, we propose a cost-aware cascading bandits model, a new variant of multi-armed ban- dits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an ordered list of…

Machine Learning · Computer Science 2018-05-23 Ruida Zhou , Chao Gan , Jing Yan , Cong Shen

Bandits with Dynamic Arm-acquisition Costs

We consider a bandit problem where at any time, the decision maker can add new arms to her consideration set. A new arm is queried at a cost from an "arm-reservoir" containing finitely many "arm-types," each characterized by a distinct mean…

Machine Learning · Computer Science 2022-10-10 Anand Kalvit , Assaf Zeevi

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe…

Machine Learning · Computer Science 2020-04-15 Yanjun Han , Zhengqing Zhou , Zhengyuan Zhou , Jose Blanchet , Peter W. Glynn , Yinyu Ye

Combinatorial Bandits without Total Order for Arms

We consider the combinatorial bandits problem, where at each time step, the online learner selects a size-$k$ subset $s$ from the arms set $\mathcal{A}$, where $\left|\mathcal{A}\right| = n$, and observes a stochastic reward of each arm in…

Machine Learning · Computer Science 2021-03-05 Shuo Yang , Tongzheng Ren , Inderjit S. Dhillon , Sujay Sanghavi

Stochastic Conservative Contextual Linear Bandits

Many physical systems have underlying safety considerations that require that the strategy deployed ensures the satisfaction of a set of constraints. Further, often we have only partial information on the state of the system. We study the…

Machine Learning · Computer Science 2022-03-30 Jiabin Lin , Xian Yeow Lee , Talukder Jubery , Shana Moothedath , Soumik Sarkar , Baskar Ganapathysubramanian

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is…

Machine Learning · Computer Science 2022-10-17 Jasmin Brandt , Viktor Bengs , Björn Haddenhorst , Eyke Hüllermeier

Contextual Blocking Bandits

We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards. However, playing an arm blocks it (across all contexts) for a…

Machine Learning · Computer Science 2020-06-18 Soumya Basu , Orestis Papadigenopoulos , Constantine Caramanis , Sanjay Shakkottai

Stochastic Rising Bandits

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested…

Machine Learning · Computer Science 2022-12-08 Alberto Maria Metelli , Francesco Trovò , Matteo Pirola , Marcello Restelli

An Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures

We propose online algorithms for sequential learning in the contextual multi-armed bandit setting. Our approach is to partition the context space and then optimally combine all of the possible mappings between the partition regions and the…

Machine Learning · Computer Science 2017-12-11 Mohammadreza Mohaghegh Neyshabouri , Kaan Gokcesu , Huseyin Ozkan , Suleyman S. Kozat

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of…

Machine Learning · Computer Science 2022-10-18 Viktor Bengs , Eyke Hüllermeier

Synopsis: Sequential Decision Problems with Weak Feedback

This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…

Machine Learning · Computer Science 2023-01-30 Arun Verma

Sequential Decision Problems with Weak Feedback

This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…

Machine Learning · Computer Science 2022-12-23 Arun Verma

Learning Contextual Bandits in a Non-stationary Environment

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually…

Machine Learning · Computer Science 2018-05-25 Qingyun Wu , Naveen Iyer , Hongning Wang

Regret Minimization in Stochastic Contextual Dueling Bandits

We consider the problem of stochastic $K$-armed dueling bandit in the contextual setting, where at each round the learner is presented with a context set of $K$ items, each represented by a $d$-dimensional feature vector, and the goal of…

Machine Learning · Computer Science 2021-05-11 Aadirupa Saha , Aditya Gopalan

Online Model Selection: a Rested Bandit Formulation

Motivated by a natural problem in online model selection with bandit information, we introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm expected losses decrease with the number of times the arm…

Machine Learning · Statistics 2020-12-08 Leonardo Cella , Claudio Gentile , Massimiliano Pontil