Related papers: Solving optimization problems with Blackwell appro…
We develop new parameter-free and scale-free algorithms for solving convex-concave saddle-point problems. Our results are based on a new simple regret minimizer, the Conic Blackwell Algorithm$^+$ (CBA$^+$), which attains $O(1/\sqrt{T})$…
Blackwell approachability is a framework for reasoning about repeated games with vector-valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the next payoff vector is given, and the decision maker tries to…
In this paper, we introduce the first algorithmic framework for Blackwell approachability on the sequence-form polytope, the class of convex polytopes capturing the strategies of players in extensive-form games (EFGs). This leads to a new…
Stochastic optimization finds a wide range of applications in operations research and management science. However, existing stochastic optimization techniques usually require the information of random samples (e.g., demands in the…
This paper considers the problem of minimizing an expectation function over a closed convex set, coupled with a {\color{black} functional or expectation} constraint on either decision variables or problem parameters. We first present a new…
In this paper, we consider constrained optimization problems with convex, smooth objective and constraints. We propose a new stochastic gradient algorithm, called the Stochastic Moving Ball Approximation (SMBA) method, to solve this class…
In this paper, we propose consensus-based optimization for saddle point problems (CBO-SP), a novel multi-particle metaheuristic derivative-free optimization method capable of provably finding global Nash equilibria. Following the idea of…
We study the regret performance of Sample Average Approximation (SAA) for data-driven newsvendor problems with general convex inventory costs. In literature, the optimality of SAA has not been fully established under both \alpha-global…
In recent years, significant attention has been directed towards learning average-reward Markov Decision Processes (MDPs). However, existing algorithms either suffer from sub-optimal regret guarantees or computational inefficiencies. In…
We study the online saddle point problem, an online learning problem where at each iteration a pair of actions need to be chosen without knowledge of the current and future (convex-concave) payoff functions. The objective is to minimize the…
Blackwell approachability, regret minimization and calibration are three criteria evaluating a strategy (or an algorithm) in different sequential decision problems, or repeated games between a player and Nature. Although they have at first…
Bandit Convex Optimization is a fundamental class of sequential decision-making problems, where the learner selects actions from a continuous domain and observes a loss (but not its gradient) at only one point per round. We study this…
We consider the problem of minimizing different notions of swap regret in online optimization. These forms of regret are tightly connected to correlated equilibrium concepts in games, and have been more recently shown to guarantee…
Blackwell's celebrated approachability theory provides a general framework for a variety of learning problems, including regret minimization. However, Blackwell's proof and implicit algorithm measure approachability using the $\ell_2$…
An elegant characterization of the complexity of constraint satisfaction problems has emerged in the form of the the algebraic dichotomy conjecture of [BKJ00]. Roughly speaking, the characterization asserts that a CSP {\Lambda} is tractable…
Approachability theory, introduced by Blackwell (1956), provides fundamental results on repeated games with vector-valued payoffs, and has been usefully applied since in the theory of learning in games and to learning algorithms in the…
A new algorithm for regret minimization in online convex optimization is described. The regret of the algorithm after $T$ time periods is $O(\sqrt{T \log T})$ - which is the minimum possible up to a logarithmic term. In addition, the new…
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions $\f_t$ admit a "pseudo-1d" structure, i.e. $\f_t(\w) = \loss_t(\pred_t(\w))$ where the output of $\pred_t$ is…
The notion of approachability in repeated games with vector payoffs was introduced by Blackwell in the 1950s, along with geometric conditions for approachability and corresponding strategies that rely on computing {\em steering directions}…
Projection-based algorithms for Constrained Online Convex Optimization (COCO) achieve optimal $\mathcal{O}(T^{1/2})$ regret guarantees but face scalability challenges due to the computational complexity of projections. To circumvent this,…