Related papers: Minimax Optimal Algorithms for Unconstrained Linea…
We study algorithms for online linear optimization in Hilbert spaces, focusing on the case where the player is unconstrained. We develop a novel characterization of a large class of minimax algorithms, recovering, and even improving,…
In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed…
We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical…
We introduce a simple but general online learning framework in which a learner plays against an adversary in a vector-valued game that changes every round. Even though the learner's objective is not convex-concave (and so the minimax…
We consider the problem of online learning and its application to solving minimax games. For the online learning problem, Follow the Perturbed Leader (FTPL) is a widely studied algorithm which enjoys the optimal $O(T^{1/2})$ worst-case…
We study fast rates of convergence in the setting of nonparametric online regression, namely where regret is defined with respect to an arbitrary function class which has bounded complexity. Our contributions are two-fold: - In the…
We consider regret minimization in repeated games with non-convex loss functions. Minimizing the standard notion of regret is computationally intractable. Thus, we define a natural notion of regret which permits efficient optimization and…
We introduce an online convex optimization algorithm which utilizes projected subgradient descent with optimal adaptive learning rates. Our method provides second-order minimax-optimal dynamic regret guarantee (i.e. dependent on the sum of…
We address the online linear optimization problem when the actions of the forecaster are represented by binary vectors. Our goal is to understand the magnitude of the minimax regret for the worst possible set of actions. We study the…
We consider online learning when the time horizon is unknown. We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according…
We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to Holder-smooth…
Some of the most compelling applications of online convex optimization, including online prediction and classification, are unconstrained: the natural feasible set is R^n. Existing algorithms fail to achieve sub-linear regret in this…
Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret…
We consider linear stochastic bandits where the set of actions is an ellipsoid. We provide the first known minimax optimal algorithm for this problem. We first derive a novel information-theoretic lower bound on the regret of any algorithm,…
We study online convex optimization under stochastic sub-gradient observation faults, where we introduce adaptive algorithms with minimax optimal regret guarantees. We specifically study scenarios where our sub-gradient observations can be…
We study Online Convex Optimization (OCO) with adversarial constraints, where an online algorithm must make sequential decisions to minimize both convex loss functions and cumulative constraint violations. We focus on a setting where the…
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small…
We consider online learning in multi-player smooth monotone games. Existing algorithms have limitations such as (1) being only applicable to strongly monotone games; (2) lacking the no-regret guarantee; (3) having only asymptotic or slow…
We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $\sqrt{d n \log N}$ for any finite action set with $N$…
In the convex optimization approach to online regret minimization, many methods have been developed to guarantee a $O(\sqrt{T})$ bound on regret for subdifferentiable convex loss functions with bounded subgradients, by using a reduction to…