English
Related papers

Related papers: GPU Accelerated Sub-Sampled Newton's Method

200 papers

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of…

Optimization and Control · Mathematics 2018-02-19 Peng Xu , Farbod Roosta-Khorasani , Michael W. Mahoney

Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order…

Machine Learning · Computer Science 2022-08-04 Severin Reiz , Tobias Neckel , Hans-Joachim Bungartz

First-order optimization methods, such as stochastic gradient descent (SGD) and its variants, are widely used in machine learning applications due to their simplicity and low per-iteration costs. However, they often require larger numbers…

Machine Learning · Computer Science 2020-02-05 Chih-Hao Fang , Sudhir B Kylasa , Fred Roosta , Michael W. Mahoney , Ananth Grama

We investigate the problem of sequential linear data prediction for real life big data applications. The second order algorithms, i.e., Newton-Raphson Methods, asymptotically achieve the performance of the "best" possible linear data…

Data Structures and Algorithms · Computer Science 2017-01-20 Burak C. Civek , Suleyman S. Kozat

Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if…

Machine Learning · Computer Science 2017-10-25 Haishan Ye , Zhihua Zhang

First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks. Second-order methods, despite their better convergence rate, are rarely used in practice due to the…

Machine Learning · Computer Science 2019-09-26 Tianle Cai , Ruiqi Gao , Jikai Hou , Siyu Chen , Dong Wang , Di He , Zhihua Zhang , Liwei Wang

Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention…

Numerical Analysis · Computer Science 2020-03-24 Haishan Ye , Luo Luo , Zhihua Zhang

In this paper, we try to uncover the second-order essence of several first-order optimization methods. For Nesterov Accelerated Gradient, we rigorously prove that the algorithm makes use of the difference between past and current gradients,…

Machine Learning · Computer Science 2019-12-23 Yuzheng Hu , Licong Lin , Shange Tang

We show that, for finite-sum minimization problems, incorporating partial second-order information of the objective function can dramatically improve the robustness to mini-batch size of variance-reduced stochastic gradient methods, making…

Optimization and Control · Mathematics 2024-04-24 Sachin Garg , Albert S. Berahas , Michał Dereziński

Newton's method is the most widespread high-order method, demanding the gradient and the Hessian of the objective function. However, one of the main disadvantages of Newtons method is its lack of global convergence and high iteration cost.…

Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than…

Machine Learning · Computer Science 2023-05-23 Arun Ganesh , Mahdi Haghifam , Thomas Steinke , Abhradeep Thakurta

Optimizing smooth convex functions in stochastic settings, where only noisy estimates of gradients and Hessians are available, is a fundamental problem in optimization. While first-order methods possess a low per-iteration cost, their…

Statistics Theory · Mathematics 2026-02-06 Antoine Godichon-Baggioni , Bruno Portier , Guillaume Sallé

Training of convolutional neural networks is a high dimensional and a non-convex optimization problem. At present, it is inefficient in situations where parametric learning rates can not be confidently set. Some past works have introduced…

Machine Learning · Computer Science 2023-04-06 Ujjwal Thakur , Anuj Sharma

In mathematical optimization, second-order Newton's methods generally converge faster than first-order methods, but they require the inverse of the Hessian, hence are computationally expensive. However, we discover that on sparse graphs,…

Machine Learning · Computer Science 2022-05-30 Nima Dehmamy , Csaba Both , Jianzhi Long , Rose Yu

Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Incorporating second-order information has proven helpful in…

Optimization and Control · Mathematics 2025-04-30 Michał Dereziński

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

Many data-fitting applications require the solution of an optimization problem involving a sum of large number of functions of high dimensional parameter. Here, we consider the problem of minimizing a sum of $n$ functions over a convex…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if…

Machine Learning · Computer Science 2017-10-25 Haishan Ye , Zhihua Zhang

Minimizing loss functions is central to machine-learning training. Although first-order methods dominate practical applications, higher-order techniques such as Newton's method can deliver greater accuracy and faster convergence, yet are…

Machine Learning · Computer Science 2025-11-25 Giuseppe Carrino , Elena Loli Piccolomini , Elisa Riccietti , Theo Mary
‹ Prev 1 2 3 10 Next ›