Ruichen Jiang
We study online linear optimization with matrix variables constrained by the operator norm, a setting where the geometry renders designing data-dependent and efficient adaptive algorithms challenging. The best-known adaptive regret bounds…
A recent breakthrough in nonconvex optimization is the online-to-nonconvex conversion framework of [Cutkosky et al., 2023], which reformulates the task of finding an $\varepsilon$-first-order stationary point as an online learning problem.…
In this paper, we study the problem of solving a simple bilevel optimization problem, where the upper-level objective is minimized over the solution set of the lower-level problem. We focus on the general setting in which both the upper-…
In this paper, we explore the non-asymptotic global convergence rates of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method implemented with exact line search. Notably, due to Dixon's equivalence result, our findings are also applicable to…
The performance of an optimizer on large-scale deep learning models depends critically on fine-tuning the learning rate, often requiring an extensive grid search over base learning rates, schedules, and other hyperparameters. In this paper,…
Adaptive gradient methods, such as AdaGrad, are among the most successful optimization algorithms for neural network training. While these methods are known to achieve better dimensional dependence than stochastic gradient descent (SGD) for…
In this paper, we present the first explicit and non-asymptotic global convergence rates of the BFGS method when implemented with an inexact line search scheme satisfying the Armijo-Wolfe conditions. We show that BFGS achieves a global…
We study the problem of finding an $\epsilon$-first-order stationary point (FOSP) of a smooth function, given access only to gradient information. The best-known gradient query complexity for this task, assuming both the gradient and…
We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving…
Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the…
In this paper, we propose a quasi-Newton method for solving smooth and monotone nonlinear equations, including unconstrained minimization and minimax optimization as special cases. For the strongly monotone setting, we establish two global…
In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem. We present a novel bilevel…
Bilevel optimization is an important class of optimization problems where one optimization problem is nested within another. While various methods have emerged to address unconstrained general bilevel optimization problems, there has been a…
The optimistic gradient method has seen increasing popularity for solving convex-concave saddle point problems. To analyze its iteration complexity, a recent work [arXiv:1906.01115] proposed an interesting perspective that interprets this…
Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and…
In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic…
Quasi-Newton algorithms are among the most popular iterative methods for solving unconstrained minimization problems, largely due to their favorable superlinear convergence property. However, existing results for these algorithms are…
In this paper, we propose an accelerated quasi-Newton proximal extragradient (A-QPNE) method for solving unconstrained smooth convex optimization problems. With access only to the gradients of the objective, we prove that our method can…
In this paper, we study a class of bilevel optimization problems, also known as simple bilevel optimization, where we minimize a smooth objective function over the optimal solution set of another convex constrained optimization problem.…
One of the key challenges of learning an online recommendation model is the temporal domain shift, which causes the mismatch between the training and testing data distribution and hence domain generalization error. To overcome, we propose…