Related papers: Optimal anytime regret with two experts
Many techniques for online optimization problems involve making decisions based solely on presently available information: fewer works take advantage of potential predictions. In this paper, we discuss the problem of online convex…
We consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms' rewards are stochastic and independent over time, but that the absolute difference between the expected…
In online learning the performance of an algorithm is typically compared to the performance of a fixed function from some class, with a quantity called regret. Forster proposed a last-step min-max algorithm which was somewhat simpler than…
This paper addresses Online Convex Optimization (OCO) problems where the constraints have additive perturbations that (i) vary over time and (ii) are not known at the time to make a decision. Perturbations may not be i.i.d. generated and…
Making judicious channel access and transmission scheduling decisions is essential for improving performance as well as energy and spectral efficiency in multichannel wireless systems. This problem has been a subject of extensive study in…
Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem. Despite extensive research over the past decades aimed at understanding their asymptotic and (near) minimax…
We study the $K$-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes \emph{preference-based feedback} suggesting that one…
Recent literature on online learning has focused on developing adaptive algorithms that take advantage of a regularity of the sequence of observations, yet retain worst-case performance guarantees. A complementary direction is to develop…
We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem). When instantiated differently, our algorithm achieves various new data-dependent regret…
We consider the setting of online logistic regression and consider the regret with respect to the 2-ball of radius B. It is known (see [Hazan et al., 2014]) that any proper algorithm which has logarithmic regret in the number of samples…
We study online conformal prediction for non-stationary data streams subject to unknown distribution drift. While most prior work studied this problem under adversarial settings and/or assessed performance in terms of gaps of time-averaged…
We study an online mixed discrete and continuous optimization problem where a decision maker interacts with an unknown environment for a number of $T$ rounds. At each round, the decision maker needs to first jointly choose a discrete and a…
We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al., 2021). Our first algorithm is computationally efficient…
This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we…
In this paper we propose a novel experimental design-based algorithm to minimize regret in online stochastic linear and combinatorial bandits. While existing literature tends to focus on optimism-based algorithms--which have been shown to…
We study online prediction where regret of the algorithm is measured against a benchmark defined via evolving constraints. This framework captures online prediction on graphs, as well as other prediction problems with combinatorial…
We propose a novel approach for analyzing dynamic regret of first-order constrained online convex optimization algorithms for strongly convex and Lipschitz-smooth objectives. Crucially, we provide a general analysis that is applicable to a…
Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent is unaware of the environment…
Practical online learning tasks are often naturally defined on unconstrained domains, where optimal algorithms for general convex losses are characterized by the notion of comparator adaptivity. In this paper, we design such algorithms in…
In this paper, we consider the problem of scheduling jobs on parallel identical machines, where the processing times of jobs are uncertain: only interval bounds of processing times are known. The optimality criterion of a schedule is the…