English
Related papers

Related papers: Block Mean Approximation for Efficient Second Orde…

200 papers

A novel approach is given to overcome the computational challenges of the full-matrix Adaptive Gradient algorithm (Full AdaGrad) in stochastic optimization. By developing a recursive method that estimates the inverse of the square root of…

Statistics Theory · Mathematics 2025-02-28 Antoine Godichon-Baggioni , Wei Lu , Bruno Portier

In this paper, we try to uncover the second-order essence of several first-order optimization methods. For Nesterov Accelerated Gradient, we rigorously prove that the algorithm makes use of the difference between past and current gradients,…

Machine Learning · Computer Science 2019-12-23 Yuzheng Hu , Licong Lin , Shange Tang

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second…

Machine Learning · Computer Science 2021-03-08 Rohan Anil , Vineet Gupta , Tomer Koren , Kevin Regan , Yoram Singer

Successive quadratic approximations, or second-order proximal methods, are useful for minimizing functions that are a sum of a smooth part and a convex, possibly nonsmooth part that promotes regularization. Most analyses of iteration…

Optimization and Control · Mathematics 2019-01-25 Ching-pei Lee , Stephen J. Wright

Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention…

Numerical Analysis · Computer Science 2020-03-24 Haishan Ye , Luo Luo , Zhihua Zhang

Dual descent methods are commonly used to solve network optimization problems because their implementation can be distributed through the network. However, their convergence rates are typically very slow. This paper introduces a family of…

Optimization and Control · Mathematics 2011-04-07 M. Zargham , A. Ribeiro , A. Jadbabaie , A. Ozdaglar

This paper introduces an efficient algorithm for computing the best approximation of a given matrix onto the intersection of linear equalities, inequalities and the doubly nonnegative cone (the cone of all positive semidefinite matrices…

Optimization and Control · Mathematics 2018-03-20 Ying Cui , Defeng Sun , Kim-Chuan Toh

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by…

The inversion of extremely high order matrices has been a challenging task because of the limited processing and memory capacity of conventional computers. In a scenario in which the data does not fit in memory, it is worth to consider…

Numerical Analysis · Mathematics 2018-05-08 Iria C. S. Cosme , Isaac F. Fernandes , João L. de Carvalho , Samuel Xavier-de-Souza

In several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients. Maintaining these per-parameter…

Machine Learning · Computer Science 2018-04-13 Noam Shazeer , Mitchell Stern

Estimation of the precision matrix (or inverse covariance matrix) is of great importance in statistical data analysis and machine learning. However, as the number of parameters scales quadratically with the dimension $p$, computation…

Computation · Statistics 2022-11-02 Qian LI , Binyan Jiang , Defeng Sun

Second-order optimization methods, which leverage curvature information, offer faster and more stable convergence than first-order methods such as stochastic gradient descent (SGD) and Adam. However, their practical adoption is hindered by…

Emerging Technologies · Computer Science 2025-12-08 Saitao Zhang , Yubiao Luo , Shiqing Wang , Pushen Zuo , Yongxiang Li , Lunshuai Pan , Zheng Miao , Zhong Sun

We introduce AdaSub, a stochastic optimization algorithm that computes a search direction based on second-order information in a low-dimensional subspace that is defined adaptively based on available current and past information. Compared…

Optimization and Control · Mathematics 2023-11-08 João Victor Galvão da Mata , Martin S. Andersen

Finding roots of equations is at the heart of most computational science. A well-known and widely used iterative algorithm is the Newton's method. However, its convergence depends heavily on the initial guess, with poor choices often…

Numerical Analysis · Mathematics 2020-04-09 Ankush Aggarwal , Sanjay Pant

We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently…

Machine Learning · Computer Science 2017-09-18 Sébastien M. R. Arnold , Chunming Wang

We present improved algorithms for fast calculation of the inverse square root for single-precision floating-point numbers. The algorithms are much more accurate than the famous fast inverse square root algorithm and have the same or…

Numerical Analysis · Computer Science 2018-02-22 Cezary J. Walczyk , Leonid V. Moroz , Jan L. Cieśliński

This paper proposes an arc-search interior-point algorithm for the nonlinear constrained optimization problem. The proposed algorithm uses the second-order derivatives to construct a search arc that approaches the optimizer. Because the arc…

Optimization and Control · Mathematics 2025-06-13 Yaguang Yang

The inverse of a large matrix can often be accurately approximated by a polynomial of degree significantly lower than the order of the matrix. The iteration polynomial generated by a run of the GMRES algorithm is a good candidate, and its…

Numerical Analysis · Mathematics 2025-02-26 Mark Embree , Joel A. Henningsen , Jordan Jackson , Ronald B. Morgan

Obtaining the inverse of a large symmetric positive definite matrix $\mathcal{A}\in\mathbb{R}^{p\times p}$ is a continual challenge across many mathematical disciplines. The computational complexity associated with direct methods can be…

Numerical Analysis · Mathematics 2025-09-03 Ann Paterson , Jennifer Pestana , Victorita Dolean
‹ Prev 1 2 3 10 Next ›