Related papers: Optimal anytime regret with two experts
To deal with changing environments, a new performance measure -- adaptive regret, defined as the maximum static regret over any interval, was proposed in online learning. Under the setting of online convex optimization, several algorithms…
We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical…
Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say 'predict 0' or 'predict 1', and our payoff is +1 if the prediction is correct and -1 otherwise. We will say that at…
We design differentially private algorithms for the problem of prediction with expert advice under dynamic regret, also known as tracking the best expert. Our work addresses three natural types of adversaries, stochastic with shifting…
A new algorithm for regret minimization in online convex optimization is described. The regret of the algorithm after $T$ time periods is $O(\sqrt{T \log T})$ - which is the minimum possible up to a logarithmic term. In addition, the new…
We introduce an online convex optimization algorithm which utilizes projected subgradient descent with optimal adaptive learning rates. Our method provides second-order minimax-optimal dynamic regret guarantee (i.e. dependent on the sum of…
We study online learning settings in which experts act strategically to maximize their influence on the learning algorithm's predictions by potentially misreporting their beliefs about a sequence of binary events. Our goal is twofold.…
This paper addresses an online convex optimization problem where the cost function at each step depends on a history of past decisions (i.e., memory), and the decision maker has access to limited predictions of future cost values within a…
In practical applications, data is used to make decisions in two steps: estimation and optimization. First, a machine learning model estimates parameters for a structural model relating decisions to outcomes. Second, a decision is chosen to…
In this paper we study the non-stationary stochastic optimization question with bandit feedback and dynamic regret measures. The seminal work of Besbes et al. (2015) shows that, when aggregated function changes is known a priori, a simple…
We consider the fixed-budget best arm identification problem with rewards following normal distributions. In this problem, the forecaster is given $K$ arms (or treatments) and $T$ time steps. The forecaster attempts to find the arm with the…
This study presents an effective global optimization technique designed for multivariate functions that are H\"older continuous. Unlike traditional methods that construct lower bounding proxy functions, this algorithm employs a…
In adversarial multi-armed bandits, two performance measures are commonly used: static regret, which compares the learner to the best fixed arm, and dynamic regret, which compares it to the best sequence of arms. While optimal algorithms…
This paper considers a variant of the online paging problem, where the online algorithm has access to multiple predictors, each producing a sequence of predictions for the page arrival times. The predictors may have occasional prediction…
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic innovation: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step,…
We consider a family of learning strategies for online optimization problems that evolve in continuous time and we show that they lead to no regret. From a more traditional, discrete-time viewpoint, this continuous-time approach allows us…
We consider online algorithms under both the competitive ratio criteria and the regret minimization one. Our main goal is to build a unified methodology that would be able to guarantee both criteria simultaneously. For a general class of…
We design and analyze minimax-optimal algorithms for online linear optimization games where the player's choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a post-hoc benchmark…
To deal with changing environments, a new performance measure -- adaptive regret, defined as the maximum static regret over any interval, was proposed in online learning. Under the setting of online convex optimization, several algorithms…
In this paper, we provide a generic anytime lower bounding procedure for minmax regret optimization problems. We show that the lower bound obtained is always at least as accurate as the lower bound recently proposed by Chassein and Goerigk…