Related papers: A Split-Client Approach to Second-Order Optimizati…
We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving large-scale partially separable optimization problems. The algorithm is based on a local…
We establish a local $\mathcal{O}(k^{-2})$ rate for the gradient update $x^{k+1}=x^k-\nabla f(x^k)/\sqrt{H\|\nabla f(x^k)\|}$ under a $2H$-Hessian--Lipschitz assumption. Regime detection relies on Hessian--vector products, avoiding Hessian…
Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years,…
Fine-tuning large models on edge devices is severely hindered by the memory-intensive backpropagation (BP) in standard frameworks like federated learning and split learning. While substituting BP with zeroth-order optimization can…
We analyze Newton's method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the…
This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(\sigma / \sqrt{T})$ convergence when the oracle feedback is stochastic with variance $\sigma^2$,…
Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…
We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient…
In this work we derive a second-order approach to bilevel optimization, a type of mathematical programming in which the solution to a parameterized optimization problem (the "lower" problem) is itself to be optimized (in the "upper"…
Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second…
First-order optimizers are reliable but slow in sharp, anisotropic regions. We study a curvature-adaptive method that periodically sketches a low-rank Hessian subspace via Hessian--vector products and preconditions gradients only in that…
In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. This technique relies on the computation of the {\em curvature}, a second order…
In this paper, we present a new Hyperfast Second-Order Method with convergence rate $O(N^{-5})$ up to a logarithmic factor for the convex function with Lipshitz the third derivative. This method based on two ideas. The first comes from the…
Heavy-tailed noise is pervasive in modern machine learning applications, arising from data heterogeneity, outliers, and non-stationary stochastic environments. While second-order methods can significantly accelerate convergence in…
In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations of the Cubically regularized Newton method for solving general non-convex optimization problems. For that, we employ finite difference…
This paper explores second-order optimization methods in Federated Learning (FL), addressing the critical challenges of slow convergence and the excessive communication rounds required to achieve optimal performance from the global model.…
Accelerating the convergence of second-order optimization, particularly Newton-type methods, remains a pivotal challenge in algorithmic research. In this paper, we extend previous work on the \textbf{Quadratic Gradient (QG)} and rigorously…
A class of second-order algorithms is proposed for minimizing smooth nonconvex functions that alternates between regularized Newton and negative curvature steps in an iteration-dependent subspace. In most cases, the Hessian matrix is…
We here adapt an extended version of the adaptive cubic regularisation method with dynamic inexact Hessian information for nonconvex optimisation in [3] to the stochastic optimisation setting. While exact function evaluations are still…