Related papers: Online Learning for Function Placement in Serverle…
We study how representation learning can improve the efficiency of bandit problems. We study the setting where we play $T$ linear bandits with dimension $d$ concurrently, and these $T$ bandit tasks share a common $k (\ll d)$ dimensional…
We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory…
We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal…
Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps, each time selecting an action and learning the cost of that action. The…
We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). For a given integer…
We tackle in this paper an online network resource allocation problem with job transfers. The network is composed of many servers connected by communication links. The system operates in discrete time; at each time slot, the administrator…
We study a new class of online learning problems where each of the online algorithm's actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its…
Offline policy learning (OPL) leverages existing data collected a priori for policy optimization without any active exploration. Despite the prevalence and recent interest in this problem, its theoretical and algorithmic foundations in…
Recently a multi-agent variant of the classical multi-armed bandit was proposed to tackle fairness issues in online learning. Inspired by a long line of work in social choice and economics, the goal is to optimize the Nash social welfare…
This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play $M$ linear bandits with dimension $d$ concurrently,…
We consider the latency minimization problem in a task-offloading scenario, where multiple servers are available to the user equipment for outsourcing computational tasks. To account for the temporally dynamic nature of the wireless links…
Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.…
Policy regret is a well established notion of measuring the performance of an online learning algorithm against an adaptive adversary. We study restrictions on the adversary that enable efficient minimization of the \emph{complete policy…
We study an online learning problem with long-term budget constraints in the adversarial setting. In this problem, at each round $t$, the learner selects an action from a convex decision set, after which the adversary reveals a cost…
Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the…
Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies based on data when the system model is unknown. However, the cumulative regret of most RL algorithms scales as $\tilde O(\mathsf{S}…
We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been…
In real-world machine learning applications, there is a cost associated with sampling of different features. Budgeted learning can be used to select which feature-values to acquire from each instance in a dataset, such that the best model…
We consider online learning problems where the aim is to achieve regret which is efficient in the sense that it is the same order as the lowest regret amongst K experts. This is a substantially stronger requirement that achieving…
Much of modern learning theory has been split between two regimes: the classical offline setting, where data arrive independently, and the online setting, where data arrive adversarially. While the former model is often both computationally…