Related papers: GPU Accelerated Sub-Sampled Newton's Method

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study

While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of…

Optimization and Control · Mathematics 2018-02-19 Peng Xu , Farbod Roosta-Khorasani , Michael W. Mahoney

Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs

Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order…

Machine Learning · Computer Science 2022-08-04 Severin Reiz , Tobias Neckel , Hans-Joachim Bungartz

Newton-ADMM: A Distributed GPU-Accelerated Optimizer for Multiclass Classification Problems

First-order optimization methods, such as stochastic gradient descent (SGD) and its variants, are widely used in machine learning applications due to their simplicity and low per-iteration costs. However, they often require larger numbers…

Machine Learning · Computer Science 2020-02-05 Chih-Hao Fang , Sudhir B Kylasa , Fred Roosta , Michael W. Mahoney , Ananth Grama

Efficient Implementation Of Newton-Raphson Methods For Sequential Data Prediction

We investigate the problem of sequential linear data prediction for real life big data applications. The second order algorithms, i.e., Newton-Raphson Methods, asymptotically achieve the performance of the "best" possible linear data…

Data Structures and Algorithms · Computer Science 2017-01-20 Burak C. Civek , Suleyman S. Kozat

Nestrov's Acceleration For Second Order Method

Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if…

Machine Learning · Computer Science 2017-10-25 Haishan Ye , Zhihua Zhang

Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems

First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks. Second-order methods, despite their better convergence rate, are rarely used in practice due to the…

Machine Learning · Computer Science 2019-09-26 Tianle Cai , Ruiqi Gao , Jikai Hou , Siyu Chen , Dong Wang , Di He , Zhihua Zhang , Liwei Wang

Approximate Newton Methods

Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention…

Numerical Analysis · Computer Science 2020-03-24 Haishan Ye , Luo Luo , Zhihua Zhang

Second-order Information in First-order Optimization Methods

In this paper, we try to uncover the second-order essence of several first-order optimization methods. For Nesterov Accelerated Gradient, we rigorously prove that the algorithm makes use of the difference between past and current gradients,…

Machine Learning · Computer Science 2019-12-23 Yuzheng Hu , Licong Lin , Shange Tang

Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients

We show that, for finite-sum minimization problems, incorporating partial second-order information of the objective function can dramatically improve the robustness to mini-batch size of variance-reduced stochastic gradient methods, making…

Optimization and Control · Mathematics 2024-04-24 Sachin Garg , Albert S. Berahas , Michał Dereziński

Adaptive Regularized Newton Method with Inexact Hessian

Newton's method is the most widespread high-order method, demanding the gradient and the Hessian of the objective function. However, one of the main disadvantages of Newtons method is its lack of global convergence and high iteration cost.…

Optimization and Control · Mathematics 2025-12-10 Aleksandr Shestakov , Nail Bashirov , Andrei Semenov , Alexander Gasnikov , Martin Takáč , Aleksandr Beznosikov , Dmitry Kamzolov

Faster Differentially Private Convex Optimization via Second-Order Methods

Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than…

Machine Learning · Computer Science 2023-05-23 Arun Ganesh , Mahdi Haghifam , Thomas Steinke , Abhradeep Thakurta

Complexity reduction in online stochastic Newton methods with potential O(N d) total cost

Optimizing smooth convex functions in stochastic settings, where only noisy estimates of gradients and Hessians are available, is a fundamental problem in optimization. While first-order methods possess a low per-iteration cost, their…

Statistics Theory · Mathematics 2026-02-06 Antoine Godichon-Baggioni , Bruno Portier , Guillaume Sallé

Newton methods based convolution neural networks using parallel processing

Training of convolutional neural networks is a high dimensional and a non-convex optimization problem. At present, it is inefficient in situations where parametric learning rates can not be confidently set. Some past works have introduced…

Machine Learning · Computer Science 2023-04-06 Ujjwal Thakur , Anuj Sharma

Faster Optimization on Sparse Graphs via Neural Reparametrization

In mathematical optimization, second-order Newton's methods generally converge faster than first-order methods, but they require the inverse of the Hessian, hence are computationally expensive. However, we discover that on sparse graphs,…

Machine Learning · Computer Science 2022-05-30 Nima Dehmamy , Csaba Both , Jianzhi Long , Rose Yu

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Incorporating second-order information has proven helpful in…

Optimization and Control · Mathematics 2025-04-30 Michał Dereziński

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

Sub-Sampled Newton Methods II: Local Convergence Rates

Many data-fitting applications require the solution of an optimization problem involving a sum of large number of functions of high dimensional parameter. Here, we consider the problem of minimizing a sum of $n$ functions over a convex…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Nesterov's Acceleration For Approximate Newton

Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if…

Machine Learning · Computer Science 2017-10-25 Haishan Ye , Zhihua Zhang

Frugality in second-order optimization: floating-point approximations for Newton's method

Minimizing loss functions is central to machine-learning training. Although first-order methods dominate practical applications, higher-order techniques such as Newton's method can deliver greater accuracy and faster convergence, yet are…

Machine Learning · Computer Science 2025-11-25 Giuseppe Carrino , Elena Loli Piccolomini , Elisa Riccietti , Theo Mary