Related papers: Accelerating SGD for Distributed Deep-Learning Usi…

An Exact Distributed Newton Method for Reinforcement Learning

In this paper, we propose a distributed second- order method for reinforcement learning. Our approach is the fastest in literature so-far as it outperforms state-of-the-art methods, including ADMM, by significant margins. We achieve this by…

Optimization and Control · Mathematics 2016-08-08 Rasul Tutunov , Haitham Bou-Ammar , Ali Jadbabaie

Distributed Newton Methods for Deep Neural Networks

Deep learning involves a difficult non-convex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but…

Machine Learning · Statistics 2018-02-02 Chien-Chih Wang , Kent Loong Tan , Chun-Ting Chen , Yu-Hsiang Lin , S. Sathiya Keerthi , Dhruv Mahajan , S. Sundararajan , Chih-Jen Lin

Newton-like method with diagonal correction for distributed optimization

We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are…

Information Theory · Computer Science 2017-02-21 Dragana Bajovic , Dusan Jakovetic , Natasa Krejic , Natasa Krklec Jerinkic

A distributed semismooth Newton based augmented Lagrangian method for distributed optimization

This paper proposes a novel distributed semismooth Newton based augmented Lagrangian method for solving a class of optimization problems over networks, where the global objective is defined as the sum of locally held cost functions, and…

Optimization and Control · Mathematics 2026-03-02 Qihao Ma , Chengjing Wang , Peipei Tang , Dunbiao Niu , Aimin Xu

Research of Damped Newton Stochastic Gradient Descent Method for Neural Network Training

First-order methods like stochastic gradient descent(SGD) are recently the popular optimization method to train deep neural networks (DNNs), but second-order methods are scarcely used because of the overpriced computing cost in getting the…

Machine Learning · Computer Science 2021-04-01 Jingcheng Zhou , Wei Wei , Zhiming Zheng

Practical Newton-Type Distributed Learning using Gradient Based Approximations

We study distributed algorithms for expected loss minimization where the datasets are large and have to be stored on different machines. Often we deal with minimizing the average of a set of convex functions where each function is the…

Machine Learning · Computer Science 2019-07-24 Samira Sheikhi

DN-ADMM: Distributed Newton ADMM for Multi-agent Optimization

In a multi-agent network, we consider the problem of minimizing an objective function that is expressed as the sum of private convex and smooth functions, and a (possibly) non-differentiable convex regularizer. We propose a novel…

Optimization and Control · Mathematics 2021-09-30 Yichuan Li , Nikolaos M. Freris , Petros Voulgaris , Dusan Stipanovic

Distributed Hessian-Free Optimization for Deep Neural Network

Training deep neural network is a high dimensional and a highly non-convex optimization problem. Stochastic gradient descent (SGD) algorithm and it's variations are the current state-of-the-art solvers for this task. However, due to…

Machine Learning · Computer Science 2017-01-17 Xi He , Dheevatsa Mudigere , Mikhail Smelyanskiy , Martin Takáč

A Distributed Continuous-time Modified Newton-Raphson Algorithm

We propose a continuous-time second-order optimization algorithm for solving unconstrained convex optimization problems with bounded Hessian. We show that this alternative algorithm has a comparable convergence rate to that of the…

Optimization and Control · Mathematics 2021-05-21 Hossein Moradian , Solmaz S. Kia

Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation

In this work, we propose Natural Hypergradient Descent (NHGD), a new method for solving bilevel optimization problems. To address the computational bottleneck in hypergradient estimation--namely, the need to compute or approximate Hessian…

Machine Learning · Computer Science 2026-04-02 Deyi Kong , Zaiwei Chen , Shuzhong Zhang , Shancong Mou

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Deep Reinforcement Learning via L-BFGS Optimization

Reinforcement Learning (RL) algorithms allow artificial agents to improve their action selections so as to increase rewarding experiences in their environments. Deep Reinforcement Learning algorithms require solving a nonconvex and…

Machine Learning · Computer Science 2019-04-18 Jacob Rafati , Roummel F. Marcia

Distributed Optimization Algorithm with Superlinear Convergence Rate

This paper considers distributed optimization problems, where each agent cooperatively minimizes the sum of local objective functions through the communication with its neighbors. The widely adopted distributed gradient method in solving…

Optimization and Control · Mathematics 2025-08-19 Yeming Xu , Ziyuan Guo , Kaihong Lu , Huanshui Zhang

A Distributed Second-Order Algorithm You Can Trust

Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years. While first-order methods seem to dominate the field, second-order methods are nevertheless attractive…

Machine Learning · Computer Science 2018-06-21 Celestine Dünner , Aurelien Lucchi , Matilde Gargiani , An Bian , Thomas Hofmann , Martin Jaggi

Distributed Cross-Layer Optimization in Wireless Networks: A Second-Order Approach

Due to the rapidly growing scale and heterogeneity of wireless networks, the design of distributed cross-layer optimization algorithms have received significant interest from the networking research community. So far, the standard…

Networking and Internet Architecture · Computer Science 2016-11-18 Jia Liu , Cathy H. Xia , Ness B. Shroff , Hanif D. Sherali

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

Exact and Inexact Subsampled Newton Methods for Optimization

The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how…

Optimization and Control · Mathematics 2016-09-28 Raghu Bollapragada , Richard Byrd , Jorge Nocedal

A Distributed Newton Method for Large Scale Consensus Optimization

In this paper, we propose a distributed Newton method for consensus optimization. Our approach outperforms state-of-the-art methods, including ADMM. The key idea is to exploit the sparsity of the dual Hessian and recast the computation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-22 Rasul Tutunov , Haitham Bou Ammar , Ali Jadbabaie

Accelerated Dual Descent for Network Optimization

Dual descent methods are commonly used to solve network optimization problems because their implementation can be distributed through the network. However, their convergence rates are typically very slow. This paper introduces a family of…

Optimization and Control · Mathematics 2011-04-07 M. Zargham , A. Ribeiro , A. Jadbabaie , A. Ozdaglar

Quadratic Gradient: A Unified Framework Bridging Gradient Descent and Newton-Type Methods by Synthesizing Hessians and Gradients

Accelerating the convergence of second-order optimization, particularly Newton-type methods, remains a pivotal challenge in algorithmic research. In this paper, we extend previous work on the \textbf{Quadratic Gradient (QG)} and rigorously…

Optimization and Control · Mathematics 2026-04-01 John Chiang