Related papers: A note on continuous-time online learning
We consider a family of learning strategies for online optimization problems that evolve in continuous time and we show that they lead to no regret. From a more traditional, discrete-time viewpoint, this continuous-time approach allows us…
We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal…
Online learning algorithms for dynamical systems provide finite time guarantees for control in the presence of sequentially revealed cost functions. We pose the classical linear quadratic tracking problem in the framework of online…
Prediction with experts' advice is one of the most fundamental problems in online learning and captures many of its technical challenges. A recent line of work has looked at online learning through the lens of differential equations and…
We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead…
We consider systems that require timely monitoring of sources over a communication network, where the cost of delayed information is unknown, time-varying and possibly adversarial. For the single source monitoring problem, we design…
Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm's ability to learn is the game-theoretic notion of regret. We argue that the…
We revisit the question of reducing online learning to approximate optimization of the offline problem. In this setting, we give two algorithms with near-optimal performance in the full information setting: they guarantee optimal regret and…
We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the…
We present methods for online linear optimization that take advantage of benign (as opposed to worst-case) sequences. Specifically if the sequence encountered by the learner is described well by a known "predictable process", the algorithms…
Recent literature on online learning has focused on developing adaptive algorithms that take advantage of a regularity of the sequence of observations, yet retain worst-case performance guarantees. A complementary direction is to develop…
We study linear contextual bandits with access to a large, confounded, offline dataset that was sampled from some fixed policy. We show that this problem is closely related to a variant of the bandit problem with side information. We…
In the random-order model for online learning, the sequence of losses is chosen upfront by an adversary and presented to the learner after a random permutation. Any random-order input is \emph{asymptotically} equivalent to a stochastic…
We study a class of adversarial bandit optimization problems in which the loss functions may be non-convex and non-smooth. In each round, the learner observes a loss that consists of an underlying linear component together with an…
We study an online mixed discrete and continuous optimization problem where a decision maker interacts with an unknown environment for a number of $T$ rounds. At each round, the decision maker needs to first jointly choose a discrete and a…
We study a new class of online learning problems where each of the online algorithm's actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its…
This work focuses on the setting of dynamic regret in the context of online learning with full information. In particular, we analyze regret bounds with respect to the temporal variability of the loss functions. By assuming that the…
In this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement…
We consider a variant of the classical online linear optimization problem in which at every step, the online player receives a "hint" vector before choosing the action for that round. Rather surprisingly, it was shown that if the hint…
The goal of a learner, in standard online learning, is to have the cumulative loss not much larger compared with the best-performing function from some fixed class. Numerous algorithms were shown to have this gap arbitrarily close to zero,…