Related papers: Online Learning for Function Placement in Serverle…

Impact of Representation Learning in Linear Bandits

We study how representation learning can improve the efficiency of bandit problems. We study the setting where we play $T$ linear bandits with dimension $d$ concurrently, and these $T$ bandit tasks share a common $k (\ll d)$ dimensional…

Machine Learning · Computer Science 2021-05-06 Jiaqi Yang , Wei Hu , Jason D. Lee , Simon S. Du

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory…

Machine Learning · Computer Science 2023-01-31 Uri Sherman , Tomer Koren , Yishay Mansour

Efficient Optimal Learning for Contextual Bandits

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal…

Machine Learning · Computer Science 2011-06-17 Miroslav Dudik , Daniel Hsu , Satyen Kale , Nikos Karampatziakis , John Langford , Lev Reyzin , Tong Zhang

Offline Local Search for Online Stochastic Bandits

Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps, each time selecting an action and learning the cost of that action. The…

Machine Learning · Computer Science 2026-04-13 Gerdus Benadè , Rathish Das , Thomas Lavastida

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). For a given integer…

Machine Learning · Computer Science 2022-10-20 MohammadJavad Azizi , Thang Duong , Yasin Abbasi-Yadkori , András György , Claire Vernade , Mohammad Ghavamzadeh

Online Optimization for Network Resource Allocation and Comparison with Reinforcement Learning Techniques

We tackle in this paper an online network resource allocation problem with job transfers. The network is composed of many servers connected by communication links. The system operates in discrete time; at each time slot, the administrator…

Machine Learning · Statistics 2023-11-17 Ahmed Sid-Ali , Ioannis Lambadaris , Yiqiang Q. Zhao , Gennady Shaikhet , Amirhossein Asgharnia

Online Learning with Composite Loss Functions

We study a new class of online learning problems where each of the online algorithm's actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its…

Machine Learning · Computer Science 2014-05-20 Ofer Dekel , Jian Ding , Tomer Koren , Yuval Peres

Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization

Offline policy learning (OPL) leverages existing data collected a priori for policy optimization without any active exploration. Despite the prevalence and recent interest in this problem, its theoretical and algorithmic foundations in…

Machine Learning · Computer Science 2022-03-15 Thanh Nguyen-Tang , Sunil Gupta , A. Tuan Nguyen , Svetha Venkatesh

An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret

Recently a multi-agent variant of the classical multi-armed bandit was proposed to tackle fairness issues in online learning. Inspired by a long line of work in social choice and economics, the goal is to optimize the Nash social welfare…

Machine Learning · Computer Science 2022-09-27 Matthew Jones , Huy Lê Nguyen , Thy Nguyen

Near-optimal Representation Learning for Linear Bandits and Linear RL

This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play $M$ linear bandits with dimension $d$ concurrently,…

Machine Learning · Computer Science 2021-02-09 Jiachen Hu , Xiaoyu Chen , Chi Jin , Lihong Li , Liwei Wang

An Online Algorithm for Computation Offloading in Non-Stationary Environments

We consider the latency minimization problem in a task-offloading scenario, where multiple servers are available to the user equipment for outsourcing computational tasks. To account for the temporally dynamic nature of the wireless links…

Signal Processing · Electrical Eng. & Systems 2020-06-23 Aniq Ur Rahman , Gourab Ghatak , Antonio De Domenico

Stochastic Bandits with Delay-Dependent Payoffs

Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.…

Machine Learning · Statistics 2020-02-20 Leonardo Cella , Nicolò Cesa-Bianchi

Complete Policy Regret Bounds for Tallying Bandits

Policy regret is a well established notion of measuring the performance of an online learning algorithm against an adaptive adversary. We study restrictions on the adversary that enable efficient minimization of the \emph{complete policy…

Machine Learning · Statistics 2022-04-26 Dhruv Malik , Yuanzhi Li , Aarti Singh

Online Learning for Approximately-Convex Functions with Long-term Adversarial Constraints

We study an online learning problem with long-term budget constraints in the adversarial setting. In this problem, at each round $t$, the learner selects an action from a convex decision set, after which the adversary reveals a cost…

Machine Learning · Computer Science 2025-08-26 Dhruv Sarkar , Samrat Mukhopadhyay , Abhishek Sinha

Offline Planning and Online Learning under Recovering Rewards

Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the…

Machine Learning · Statistics 2021-12-23 David Simchi-Levi , Zeyu Zheng , Feng Zhu

On learning Whittle index policy for restless bandits with scalable regret

Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies based on data when the system model is unknown. However, the cumulative regret of most RL algorithms scales as $\tilde O(\mathsf{S}…

Machine Learning · Computer Science 2023-04-28 Nima Akbarzadeh , Aditya Mahajan

A Time and Space Efficient Algorithm for Contextual Linear Bandits

We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been…

Data Structures and Algorithms · Computer Science 2014-07-08 José Bento , Stratis Ioannidis , S. Muthukrishnan , Jinyun Yan

Online Budgeted Learning for Classifier Induction

In real-world machine learning applications, there is a cost associated with sampling of different features. Budgeted learning can be used to select which feature-values to acquire from each instance in a dataset, such that the best model…

Machine Learning · Computer Science 2019-03-14 Eran Fainman , Bracha Shapira , Lior Rokach , Yisroel Mirsky

Learning The Best Expert Efficiently

We consider online learning problems where the aim is to achieve regret which is efficient in the sense that it is the same order as the lowest regret amongst K experts. This is a substantially stronger requirement that achieving…

Machine Learning · Computer Science 2019-11-12 Daron Anderson , Douglas J. Leith

Smoothed Online Learning is as Easy as Statistical Learning

Much of modern learning theory has been split between two regimes: the classical offline setting, where data arrive independently, and the online setting, where data arrive adversarially. While the former model is often both computationally…

Machine Learning · Statistics 2022-06-01 Adam Block , Yuval Dagan , Noah Golowich , Alexander Rakhlin