Related papers: Block Mean Approximation for Efficient Second Orde…

A Full Adagrad algorithm with O(Nd) operations

A novel approach is given to overcome the computational challenges of the full-matrix Adaptive Gradient algorithm (Full AdaGrad) in stochastic optimization. By developing a recursive method that estimates the inverse of the square root of…

Statistics Theory · Mathematics 2025-02-28 Antoine Godichon-Baggioni , Wei Lu , Bruno Portier

Second-order Information in First-order Optimization Methods

In this paper, we try to uncover the second-order essence of several first-order optimization methods. For Nesterov Accelerated Gradient, we rigorously prove that the algorithm makes use of the difference between past and current gradients,…

Machine Learning · Computer Science 2019-12-23 Yuzheng Hu , Licong Lin , Shange Tang

Scalable Second Order Optimization for Deep Learning

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second…

Machine Learning · Computer Science 2021-03-08 Rohan Anil , Vineet Gupta , Tomer Koren , Kevin Regan , Yoram Singer

Inexact Successive Quadratic Approximation for Regularized Optimization

Successive quadratic approximations, or second-order proximal methods, are useful for minimizing functions that are a sum of a smooth part and a convex, possibly nonsmooth part that promotes regularization. Most analyses of iteration…

Optimization and Control · Mathematics 2019-01-25 Ching-pei Lee , Stephen J. Wright

Approximate Newton Methods

Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention…

Numerical Analysis · Computer Science 2020-03-24 Haishan Ye , Luo Luo , Zhihua Zhang

Accelerated Dual Descent for Network Optimization

Dual descent methods are commonly used to solve network optimization problems because their implementation can be distributed through the network. However, their convergence rates are typically very slow. This paper introduces a family of…

Optimization and Control · Mathematics 2011-04-07 M. Zargham , A. Ribeiro , A. Jadbabaie , A. Ozdaglar

Computing the Best Approximation Over the Intersection of a Polyhedral Set and the Doubly Nonnegative Cone

This paper introduces an efficient algorithm for computing the best approximation of a given matrix onto the intersection of linear equalities, inequalities and the doubly nonnegative cone (the cone of all positive semidefinite matrices…

Optimization and Control · Mathematics 2018-03-20 Ying Cui , Defeng Sun , Kim-Chuan Toh

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

Scalable Adaptive Stochastic Optimization Using Random Projections

Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by…

Machine Learning · Statistics 2016-11-22 Gabriel Krummenacher , Brian McWilliams , Yannic Kilcher , Joachim M. Buhmann , Nicolai Meinshausen

Memory-Usage Advantageous Block Recursive Matrix Inverse

The inversion of extremely high order matrices has been a challenging task because of the limited processing and memory capacity of conventional computers. In a scenario in which the data does not fit in memory, it is worth to consider…

Numerical Analysis · Mathematics 2018-05-08 Iria C. S. Cosme , Isaac F. Fernandes , João L. de Carvalho , Samuel Xavier-de-Souza

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

In several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients. Maintaining these per-parameter…

Machine Learning · Computer Science 2018-04-13 Noam Shazeer , Mitchell Stern

MARS: A second-order reduction algorithm for high-dimensional sparse precision matrices estimation

Estimation of the precision matrix (or inverse covariance matrix) is of great importance in statistical data analysis and machine learning. However, as the number of parameters scales quadratically with the dimension $p$, computation…

Computation · Statistics 2022-11-02 Qian LI , Binyan Jiang , Defeng Sun

First Demonstration of Second-order Training of Deep Neural Networks with In-memory Analog Matrix Computing

Second-order optimization methods, which leverage curvature information, offer faster and more stable convergence than first-order methods such as stochastic gradient descent (SGD) and Adam. However, their practical adoption is hindered by…

Emerging Technologies · Computer Science 2025-12-08 Saitao Zhang , Yubiao Luo , Shiqing Wang , Pushen Zuo , Yongxiang Li , Lunshuai Pan , Zheng Miao , Zhong Sun

AdaSub: Stochastic Optimization Using Second-Order Information in Low-Dimensional Subspaces

We introduce AdaSub, a stochastic optimization algorithm that computes a search direction based on second-order information in a low-dimensional subspace that is defined adaptively based on available current and past information. Compared…

Optimization and Control · Mathematics 2023-11-08 João Victor Galvão da Mata , Martin S. Andersen

Beyond Newton: a new root-finding fixed-point iteration for nonlinear equations

Finding roots of equations is at the heart of most computational science. A well-known and widely used iterative algorithm is the Newton's method. However, its convergence depends heavily on the initial guess, with poor choices often…

Numerical Analysis · Mathematics 2020-04-09 Ankush Aggarwal , Sanjay Pant

Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently…

Machine Learning · Computer Science 2017-09-18 Sébastien M. R. Arnold , Chunming Wang

Improving the accuracy of the fast inverse square root algorithm

We present improved algorithms for fast calculation of the inverse square root for single-precision floating-point numbers. The algorithms are much more accurate than the famous fast inverse square root algorithm and have the same or…

Numerical Analysis · Computer Science 2018-02-22 Cezary J. Walczyk , Leonid V. Moroz , Jan L. Cieśliński

A computationally efficient arc-search interior-point algorithm for nonlinear constrained optimization

This paper proposes an arc-search interior-point algorithm for the nonlinear constrained optimization problem. The proposed algorithm uses the second-order derivatives to construct a search arc that approaches the optimizer. Because the arc…

Optimization and Control · Mathematics 2025-06-13 Yaguang Yang

Polynomial Approximation to the Inverse of a Large Matrix

The inverse of a large matrix can often be accurately approximated by a polynomial of degree significantly lower than the order of the matrix. The iteration polynomial generated by a run of the GMRES algorithm is a good candidate, and its…

Numerical Analysis · Mathematics 2025-02-26 Mark Embree , Joel A. Henningsen , Jordan Jackson , Ronald B. Morgan

An Iterative Block Matrix Inversion (IBMI) Algorithm for Symmetric Positive Definite Matrices with Applications to Covariance Matrices

Obtaining the inverse of a large symmetric positive definite matrix $\mathcal{A}\in\mathbb{R}^{p\times p}$ is a continual challenge across many mathematical disciplines. The computational complexity associated with direct methods can be…

Numerical Analysis · Mathematics 2025-09-03 Ann Paterson , Jennifer Pestana , Victorita Dolean