Related papers: An adaptive stochastic optimization algorithm for …

Optimal Resource Allocation with Semi-Bandit Feedback

We study a sequential resource allocation problem involving a fixed number of recurring jobs. At each time-step the manager should distribute available resources among the jobs in order to maximise the expected number of completed jobs.…

Machine Learning · Computer Science 2014-06-17 Tor Lattimore , Koby Crammer , Csaba Szepesvári

Combinatorial Multi-armed Bandits for Resource Allocation

We study the sequential resource allocation problem where a decision maker repeatedly allocates budgets between resources. Motivating examples include allocating limited computing time or wireless spectrum bands to multiple users (i.e.,…

Machine Learning · Computer Science 2021-05-11 Jinhang Zuo , Carlee Joe-Wong

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the…

Machine Learning · Computer Science 2020-03-26 P Sharoff , Nishant A. Mehta , Ravi Ganti

On Adaptivity in Non-stationary Stochastic Optimization With Bandit Feedback

In this paper we study the non-stationary stochastic optimization question with bandit feedback and dynamic regret measures. The seminal work of Besbes et al. (2015) shows that, when aggregated function changes is known a priori, a simple…

Machine Learning · Statistics 2022-10-12 Yining Wang

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed…

Machine Learning · Statistics 2020-10-07 Niladri S. Chatterji , Vidya Muthukumar , Peter L. Bartlett

Structure Adaptive Algorithms for Stochastic Bandits

We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods…

Machine Learning · Statistics 2020-07-03 Rémy Degenne , Han Shao , Wouter M. Koolen

Optimal Regularized Online Allocation by Adaptive Re-Solving

This paper introduces a dual-based algorithm framework for solving the regularized online resource allocation problems, which have potentially non-concave cumulative rewards, hard resource constraints, and a non-separable regularizer. Under…

Machine Learning · Computer Science 2023-07-18 Wanteng Ma , Ying Cao , Danny H. K. Tsang , Dong Xia

Bayesian Design Principles for Frequentist Sequential Learning

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization…

Machine Learning · Computer Science 2024-02-12 Yunbei Xu , Assaf Zeevi

On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards

We consider a combinatorial generalization of the classical multi-armed bandit problem that is defined as follows. There is a given bipartite graph of $M$ users and $N \geq M$ resources. For each user-resource pair $(i,j)$, there is an…

Optimization and Control · Mathematics 2015-03-17 Yi Gai , Bhaskar Krishnamachari , Mingyan Liu

A Better Resource Allocation Algorithm with Semi-Bandit Feedback

We study a sequential resource allocation problem between a fixed number of arms. On each iteration the algorithm distributes a resource among the arms in order to maximize the expected success rate. Allocating more of the resource to a…

Machine Learning · Computer Science 2018-03-29 Yuval Dagan , Koby Crammer

Confounded Budgeted Causal Bandits

We study the problem of learning 'good' interventions in a stochastic environment modeled by its underlying causal graph. Good interventions refer to interventions that maximize rewards. Specifically, we consider the setting of a…

Machine Learning · Computer Science 2024-01-17 Fateme Jamshidi , Jalal Etesami , Negar Kiyavash

Simple regret for infinitely many armed bandits

We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous…

Machine Learning · Computer Science 2015-05-19 Alexandra Carpentier , Michal Valko

Adaptive Optimization for Stochastic Renewal Systems

This paper considers online optimization for a system that performs a sequence of back-to-back tasks. Each task can be processed in one of multiple processing modes that affect the duration of the task, the reward earned, and an additional…

Optimization and Control · Mathematics 2024-01-17 Michael J. Neely

Finite Continuum-Armed Bandits

We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of actions. Each action can be completed at most once and results in a stochastic reward with unknown mean. The goal of the agent is to…

Statistics Theory · Mathematics 2020-11-04 Solenne Gaucher

Opportunistic Scheduling over Renewal Systems: An Empirical Method

This paper considers an opportunistic scheduling problem over a renewal system. A controller observes a random event at the beginning of each renewal frame and then chooses an action in response to the event, which affects the duration of…

Optimization and Control · Mathematics 2019-06-10 Xiaohan Wei , Michael J. Neely

Online Stochastic Allocation of Reusable Resources

We study a multi-objective model on the allocation of reusable resources under model uncertainty. Heterogeneous customers arrive sequentially according to a latent stochastic process, request for certain amounts of resources, and occupy…

Optimization and Control · Mathematics 2023-08-02 Xilin Zhang , Wang Chi Cheung

Complexity Analysis of a Countable-armed Bandit Problem

We consider a stochastic multi-armed bandit (MAB) problem motivated by ``large'' action spaces, and endowed with a population of arms containing exactly $K$ arm-types, each characterized by a distinct mean reward. The decision maker is…

Machine Learning · Computer Science 2023-01-19 Anand Kalvit , Assaf Zeevi

Efficient online algorithms for fast-rate regret bounds under sparsity

We consider the online convex optimization problem. In the setting of arbitrary sequences and finite set of parameters, we establish a new fast-rate quantile regret bound. Then we investigate the optimization into the L1-ball by…

Statistics Theory · Mathematics 2018-05-24 Pierre Gaillard , Olivier Wintenberger

An Adaptive Algorithm for Finite Stochastic Partial Monitoring

We present a new anytime algorithm that achieves near-optimal regret for any instance of finite stochastic partial monitoring. In particular, the new algorithm achieves the minimax regret, within logarithmic factors, for both "easy" and…

Machine Learning · Computer Science 2012-07-03 Gabor Bartok , Navid Zolghadr , Csaba Szepesvari

Stochastic Direct Search Method for Blind Resource Allocation

Motivated by programmatic advertising optimization, we consider the task of sequentially allocating budget across a set of resources. At every time step, a feasible allocation is chosen and only a corresponding random return is observed.…

Artificial Intelligence · Computer Science 2024-10-02 Juliette Achddou , Olivier Cappe , Aurélien Garivier