Related papers: A Distributed Second-Order Algorithm You Can Trust

Distributed Optimization Algorithm with Superlinear Convergence Rate

This paper considers distributed optimization problems, where each agent cooperatively minimizes the sum of local objective functions through the communication with its neighbors. The widely adopted distributed gradient method in solving…

Optimization and Control · Mathematics 2025-08-19 Yeming Xu , Ziyuan Guo , Kaihong Lu , Huanshui Zhang

Distributed Cross-Layer Optimization in Wireless Networks: A Second-Order Approach

Due to the rapidly growing scale and heterogeneity of wireless networks, the design of distributed cross-layer optimization algorithms have received significant interest from the networking research community. So far, the standard…

Networking and Internet Architecture · Computer Science 2016-11-18 Jia Liu , Cathy H. Xia , Ness B. Shroff , Hanif D. Sherali

Distributed Optimization Methods for Multi-Robot Systems: Part II -- A Survey

Although the field of distributed optimization is well-developed, relevant literature focused on the application of distributed optimization to multi-robot problems is limited. This survey constitutes the second part of a two-part series on…

Robotics · Computer Science 2024-12-02 Ola Shorinwa , Trevor Halsted , Javier Yu , Mac Schwager

Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning

An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model…

Optimization and Control · Mathematics 2026-05-11 Yunlang Zhu , Lingjun Guo , Zahra Khatti , Xiaoyi Qu , Chia-Yuan Wu , Lara Zebiane , Frank E. Curtis

Distributed Averaging Methods for Randomized Second Order Optimization

We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a significant bottleneck. We develop unbiased parameter averaging methods for randomized second order optimization…

Machine Learning · Statistics 2020-02-18 Burak Bartan , Mert Pilanci

A distributed block coordinate descent method for training $l_1$ regularized linear classifiers

Distributed training of $l_1$ regularized classifiers has received great attention recently. Most existing methods approach this problem by taking steps obtained from approximating the objective by a quadratic approximation that is…

Machine Learning · Computer Science 2016-11-25 Dhruv Mahajan , S. Sathiya Keerthi , S. Sundararajan

Practical Newton-Type Distributed Learning using Gradient Based Approximations

We study distributed algorithms for expected loss minimization where the datasets are large and have to be stored on different machines. Often we deal with minimizing the average of a set of convex functions where each function is the…

Machine Learning · Computer Science 2019-07-24 Samira Sheikhi

Decentralized Inexact Cubic Newton Method with Consensus Procedure

Distributed optimization is widely used in large-scale and privacy-preserving machine learning, where each agent stores a local objective and communicates only with its neighbors in a connected network. We study decentralized second-order…

Optimization and Control · Mathematics 2026-05-22 Artem Agafonov , Anton Novitskii , Alexander Rogozin , Yury Sokolov , Dmitry Kamzolov , Alexander Dyakonov , Martin Takáč , Alexander Gasnikov

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

An efficient distributed learning algorithm based on effective local functional approximations

Scalable machine learning over big data is an important problem that is receiving a lot of attention in recent years. On popular distributed environments such as Hadoop running on a cluster of commodity machines, communication costs are…

Machine Learning · Computer Science 2015-03-18 Dhruv Mahajan , Nikunj Agrawal , S. Sathiya Keerthi , S. Sundararajan , Leon Bottou

High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates

As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer…

Machine Learning · Computer Science 2024-07-10 Fred Lu , Ryan R. Curtin , Edward Raff , Francis Ferraro , James Holt

Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence

Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some…

Machine Learning · Computer Science 2025-11-03 Matin Ansaripour , Shayan Talaei , Giorgi Nadiradze , Dan Alistarh

Linear Regression with Distributed Learning: A Generalization Error Perspective

Distributed learning provides an attractive framework for scaling the learning task by sharing the computational load over multiple nodes in a network. Here, we investigate the performance of distributed learning for large-scale linear…

Machine Learning · Statistics 2021-11-03 Martin Hellkvist , Ayça Özçelikkale , Anders Ahlén

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently…

Machine Learning · Computer Science 2017-09-18 Sébastien M. R. Arnold , Chunming Wang

Distributed Training of Structured SVM

Training structured prediction models is time-consuming. However, most existing approaches only use a single machine, thus, the advantage of computing power and the capacity for larger data sets of multiple machines have not been exploited.…

Machine Learning · Statistics 2016-02-16 Ching-pei Lee , Kai-Wei Chang , Shyam Upadhyay , Dan Roth

Learning Linear Models Using Distributed Iterative Hessian Sketching

This work considers the problem of learning the Markov parameters of a linear system from observed data. Recent non-asymptotic system identification results have characterized the sample complexity of this problem in the single and…

Optimization and Control · Mathematics 2021-12-09 Han Wang , James Anderson

Distributed Learning Systems with First-order Methods

Scalable and efficient distributed learning is one of the main driving forces behind the recent rapid advancement of machine learning and artificial intelligence. One prominent feature of this topic is that recent progresses have been made…

Machine Learning · Computer Science 2021-04-13 Ji Liu , Ce Zhang

Distributed Newton Methods for Deep Neural Networks

Deep learning involves a difficult non-convex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but…

Machine Learning · Statistics 2018-02-02 Chien-Chih Wang , Kent Loong Tan , Chun-Ting Chen , Yu-Hsiang Lin , S. Sathiya Keerthi , Dhruv Mahajan , S. Sundararajan , Chih-Jen Lin

A Fully Stochastic Second-Order Trust Region Method

A stochastic second-order trust region method is proposed, which can be viewed as a second-order extension of the trust-region-ish (TRish) algorithm proposed by Curtis et al. (INFORMS J. Optim. 1(3) 200-220, 2019). In each iteration, a…

Optimization and Control · Mathematics 2019-11-19 Frank E. Curtis , Rui Shi