Related papers: Newton methods based convolution neural networks u…

Newton Methods for Convolutional Neural Networks

Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some situations. Recently, Newton methods have been…

Machine Learning · Statistics 2018-11-16 Chien-Chih Wang , Kent Loong Tan , Chih-Jen Lin

Distributed Newton Methods for Deep Neural Networks

Deep learning involves a difficult non-convex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but…

Machine Learning · Statistics 2018-02-02 Chien-Chih Wang , Kent Loong Tan , Chun-Ting Chen , Yu-Hsiang Lin , S. Sathiya Keerthi , Dhruv Mahajan , S. Sundararajan , Chih-Jen Lin

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs

Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order…

Machine Learning · Computer Science 2022-08-04 Severin Reiz , Tobias Neckel , Hans-Joachim Bungartz

Revisiting Sub-sampled Newton Methods

Many machine learning models depend on solving a large scale optimization problem. Recently, sub-sampled Newton methods have emerged to attract much attention for optimization due to their efficiency at each iteration, rectified a weakness…

Optimization and Control · Mathematics 2016-09-06 Haishan Ye , Luo Luo , Zhihua Zhang

Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms

When training neural networks with custom objectives, such as ranking losses and shortest-path losses, a common problem is that they are, per se, non-differentiable. A popular approach is to continuously relax the objectives to provide…

Machine Learning · Computer Science 2024-10-28 Felix Petersen , Christian Borgelt , Tobias Sutter , Hilde Kuehne , Oliver Deussen , Stefano Ermon

GPU Accelerated Sub-Sampled Newton's Method

First order methods, which solely rely on gradient information, are commonly used in diverse machine learning (ML) and data analysis (DA) applications. This is attributed to the simplicity of their implementations, as well as low…

Machine Learning · Computer Science 2018-03-06 Sudhir B. Kylasa , Farbod Roosta-Khorasani , Michael W. Mahoney , Ananth Grama

Approximate Newton Methods

Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention…

Numerical Analysis · Computer Science 2020-03-24 Haishan Ye , Luo Luo , Zhihua Zhang

Sub-Sampled Newton Methods II: Local Convergence Rates

Many data-fitting applications require the solution of an optimization problem involving a sum of large number of functions of high dimensional parameter. Here, we consider the problem of minimizing a sum of $n$ functions over a convex…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Subspace Quasi-Newton Method with Gradient Approximation

In recent years, various subspace algorithms have been developed to handle large-scale optimization problems. Although existing subspace Newton methods require fewer iterations to converge in practice, the matrix operations and full…

Optimization and Control · Mathematics 2024-06-05 Taisei Miyaishi , Ryota Nozawa , Pierre-Louis Poirion , Akiko Takeda

Second-order Neural Network Training Using Complex-step Directional Derivative

While the superior performance of second-order optimization methods such as Newton's method is well known, they are hardly used in practice for deep learning because neither assembling the Hessian matrix nor calculating its inverse is…

Machine Learning · Computer Science 2020-09-16 Siyuan Shen , Tianjia Shao , Kun Zhou , Chenfanfu Jiang , Feng Luo , Yin Yang

Adaptive Regularized Newton Method with Inexact Hessian

Newton's method is the most widespread high-order method, demanding the gradient and the Hessian of the objective function. However, one of the main disadvantages of Newtons method is its lack of global convergence and high iteration cost.…

Optimization and Control · Mathematics 2025-12-10 Aleksandr Shestakov , Nail Bashirov , Andrei Semenov , Alexander Gasnikov , Martin Takáč , Aleksandr Beznosikov , Dmitry Kamzolov

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

Adaptive pruning-based Newton's method for distributed learning

Newton's method leverages curvature information to boost performance, and thus outperforms first-order methods for distributed learning problems. However, Newton's method is not practical in large-scale and heterogeneous learning…

Machine Learning · Computer Science 2024-12-18 Shuzhen Chen , Yuan Yuan , Youming Tao , Tianzhu Wang , Zhipeng Cai , Dongxiao Yu

Convexified Convolutional Neural Networks

We describe the class of convexified convolutional neural networks (CCNNs), which capture the parameter sharing of convolutional neural networks in a convex manner. By representing the nonlinear convolutional filters as vectors in a…

Machine Learning · Computer Science 2016-09-06 Yuchen Zhang , Percy Liang , Martin J. Wainwright

Newton-like method with diagonal correction for distributed optimization

We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are…

Information Theory · Computer Science 2017-02-21 Dragana Bajovic , Dusan Jakovetic , Natasa Krejic , Natasa Krklec Jerinkic

Training CNNs faster with Dynamic Input and Kernel Downsampling

We reduce training time in convolutional networks (CNNs) with a method that, for some of the mini-batches: a) scales down the resolution of input images via downsampling, and b) reduces the forward pass operations via pooling on the…

Machine Learning · Computer Science 2019-10-16 Zissis Poulos , Ali Nouri , Andreas Moshovos

Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks

The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g.,…

Machine Learning · Computer Science 2018-06-12 Zhihao Jia , Sina Lin , Charles R. Qi , Alex Aiken

Hybrid Approach to Parallel Stochastic Gradient Descent

Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel.…

Machine Learning · Computer Science 2024-07-02 Aakash Sudhirbhai Vora , Dhrumil Chetankumar Joshi , Aksh Kantibhai Patel

Do Subsampled Newton Methods Work for High-Dimensional Data?

Subsampled Newton methods approximate Hessian matrices through subsampling techniques, alleviating the cost of forming Hessian matrices but using sufficient curvature information. However, previous results require $\Omega (d)$ samples to…

Machine Learning · Statistics 2019-05-07 Xiang Li , Shusen Wang , Zhihua Zhang