Related papers: Block Mean Approximation for Efficient Second Orde…
A novel approach is given to overcome the computational challenges of the full-matrix Adaptive Gradient algorithm (Full AdaGrad) in stochastic optimization. By developing a recursive method that estimates the inverse of the square root of…
In this paper, we try to uncover the second-order essence of several first-order optimization methods. For Nesterov Accelerated Gradient, we rigorously prove that the algorithm makes use of the difference between past and current gradients,…
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second…
Successive quadratic approximations, or second-order proximal methods, are useful for minimizing functions that are a sum of a smooth part and a convex, possibly nonsmooth part that promotes regularization. Most analyses of iteration…
Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention…
Dual descent methods are commonly used to solve network optimization problems because their implementation can be distributed through the network. However, their convergence rates are typically very slow. This paper introduces a family of…
This paper introduces an efficient algorithm for computing the best approximation of a given matrix onto the intersection of linear equalities, inequalities and the doubly nonnegative cone (the cone of all positive semidefinite matrices…
Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…
Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by…
The inversion of extremely high order matrices has been a challenging task because of the limited processing and memory capacity of conventional computers. In a scenario in which the data does not fit in memory, it is worth to consider…
In several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients. Maintaining these per-parameter…
Estimation of the precision matrix (or inverse covariance matrix) is of great importance in statistical data analysis and machine learning. However, as the number of parameters scales quadratically with the dimension $p$, computation…
Second-order optimization methods, which leverage curvature information, offer faster and more stable convergence than first-order methods such as stochastic gradient descent (SGD) and Adam. However, their practical adoption is hindered by…
We introduce AdaSub, a stochastic optimization algorithm that computes a search direction based on second-order information in a low-dimensional subspace that is defined adaptively based on available current and past information. Compared…
Finding roots of equations is at the heart of most computational science. A well-known and widely used iterative algorithm is the Newton's method. However, its convergence depends heavily on the initial guess, with poor choices often…
We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently…
We present improved algorithms for fast calculation of the inverse square root for single-precision floating-point numbers. The algorithms are much more accurate than the famous fast inverse square root algorithm and have the same or…
This paper proposes an arc-search interior-point algorithm for the nonlinear constrained optimization problem. The proposed algorithm uses the second-order derivatives to construct a search arc that approaches the optimizer. Because the arc…
The inverse of a large matrix can often be accurately approximated by a polynomial of degree significantly lower than the order of the matrix. The iteration polynomial generated by a run of the GMRES algorithm is a good candidate, and its…
Obtaining the inverse of a large symmetric positive definite matrix $\mathcal{A}\in\mathbb{R}^{p\times p}$ is a continual challenge across many mathematical disciplines. The computational complexity associated with direct methods can be…