Related papers: Optimal Shrinkage for Distributed Second-Order Opt…

Distributed Averaging Methods for Randomized Second Order Optimization

We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a significant bottleneck. We develop unbiased parameter averaging methods for randomized second order optimization…

Machine Learning · Statistics 2020-02-18 Burak Bartan , Mert Pilanci

Distributed estimation of the inverse Hessian by determinantal averaging

In distributed optimization and distributed numerical linear algebra, we often encounter an inversion bias: if we want to compute a quantity that depends on the inverse of a sum of distributed matrices, then the sum of the inverses does not…

Machine Learning · Computer Science 2019-05-29 Michał Dereziński , Michael W. Mahoney

Distributed Optimization Algorithm with Superlinear Convergence Rate

This paper considers distributed optimization problems, where each agent cooperatively minimizes the sum of local objective functions through the communication with its neighbors. The widely adopted distributed gradient method in solving…

Optimization and Control · Mathematics 2025-08-19 Yeming Xu , Ziyuan Guo , Kaihong Lu , Huanshui Zhang

Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization

In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data. However, the local estimates on each machine are typically biased, relative to…

Machine Learning · Computer Science 2020-07-06 Michał Dereziński , Burak Bartan , Mert Pilanci , Michael W. Mahoney

Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction

We study online inference and asymptotic covariance estimation for the stochastic gradient descent (SGD) algorithm. While classical methods (such as plug-in and batch-means estimators) are available, they either require inaccessible…

Machine Learning · Statistics 2026-04-24 Ziyang Wei , Wanrong Zhu , Jingyang Lyu , Wei Biao Wu

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

A Distributed Continuous-time Modified Newton-Raphson Algorithm

We propose a continuous-time second-order optimization algorithm for solving unconstrained convex optimization problems with bounded Hessian. We show that this alternative algorithm has a comparable convergence rate to that of the…

Optimization and Control · Mathematics 2021-05-21 Hossein Moradian , Solmaz S. Kia

Distributed Sketching for Randomized Optimization: Exact Characterization, Concentration and Lower Bounds

We consider distributed optimization methods for problems where forming the Hessian is computationally challenging and communication is a significant bottleneck. We leverage randomized sketches for reducing the problem dimensions as well as…

Optimization and Control · Mathematics 2022-03-21 Burak Bartan , Mert Pilanci

Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms

This paper addresses second-order stochastic optimization for estimating the minimizer of a convex function written as an expectation. A direct recursive estimation technique for the inverse Hessian matrix using a Robbins-Monro procedure is…

Optimization and Control · Mathematics 2025-03-11 Antoine Godichon-Baggioni , Wei Lu , Bruno Portier

Advancing the lower bounds: An accelerated, stochastic, second-order method with optimal adaptation to inexactness

We present a new accelerated stochastic second-order method that is robust to both gradient and Hessian inexactness, which occurs typically in machine learning. We establish theoretical lower bounds and prove that our algorithm achieves…

Optimization and Control · Mathematics 2024-05-28 Artem Agafonov , Dmitry Kamzolov , Alexander Gasnikov , Ali Kavis , Kimon Antonakopoulos , Volkan Cevher , Martin Takáč

A randomized algorithm for nonconvex minimization with inexact evaluations and complexity guarantees

We consider minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian (without assuming access to the function value) to achieve approximate second-order optimality. A novel feature of our method is that…

Optimization and Control · Mathematics 2024-03-27 Shuyao Li , Stephen J. Wright

The Practicality of Stochastic Optimization in Imaging Inverse Problems

In this work we investigate the practicality of stochastic gradient descent and recently introduced variants with variance-reduction techniques in imaging inverse problems. Such algorithms have been shown in the machine learning literature…

Optimization and Control · Mathematics 2021-01-26 Junqi Tang , Karen Egiazarian , Mohammad Golbabaee , Mike Davies

Fast, Accurate Second Order Methods for Network Optimization

Dual descent methods are commonly used to solve network flow optimization problems, since their implementation can be distributed over the network. These algorithms, however, often exhibit slow convergence rates. Approximate Newton methods…

Optimization and Control · Mathematics 2015-03-25 Rasul Tutunov , Haitham Bou Ammar , Ali Jadbabaie

Distributed Cross-Layer Optimization in Wireless Networks: A Second-Order Approach

Due to the rapidly growing scale and heterogeneity of wireless networks, the design of distributed cross-layer optimization algorithms have received significant interest from the networking research community. So far, the standard…

Networking and Internet Architecture · Computer Science 2016-11-18 Jia Liu , Cathy H. Xia , Ness B. Shroff , Hanif D. Sherali

Distributed Sensor Selection using a Truncated Newton Method

We propose a new distributed algorithm for computing a truncated Newton method, where the main diagonal of the Hessian is computed using belief propagation. As a case study for this approach, we examine the sensor selection problem, a…

Information Theory · Computer Science 2010-01-14 Danny Bickson , Danny Dolev

A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization

We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving empirical risk minimization (ERM) problems with a nonsmooth regularization term. Our algorithm is applicable…

Machine Learning · Computer Science 2019-12-16 Ching-pei Lee , Cong Han Lim , Stephen J. Wright

Stein Shrinkage and Second-Order Efficiency for semiparametric estimation of the shift

The problem of estimating the shift (or, equivalently, the center of symmetry) of an unknown symmetric and periodic function $f$ observed in Gaussian white noise is considered. Using the blockwise Stein method, a penalized profile…

Statistics Theory · Mathematics 2007-06-13 Arnak Dalalyan

Better SGD using Second-order Momentum

We develop a new algorithm for non-convex stochastic optimization that finds an $\epsilon$-critical point in the optimal $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector product computations. Our algorithm uses Hessian-vector…

Machine Learning · Computer Science 2021-07-13 Hoang Tran , Ashok Cutkosky

Adaptive system optimization using random directions stochastic approximation

We present novel algorithms for simulation optimization using random directions stochastic approximation (RDSA). These include first-order (gradient) as well as second-order (Newton) schemes. We incorporate both continuous-valued as well as…

Optimization and Control · Mathematics 2015-08-11 Prashanth L. A. , Shalabh Bhatnagar , Michael Fu , Steve Marcus

A Homogeneous Second-Order Descent Method for Nonconvex Optimization

In this paper, we introduce a Homogeneous Second-Order Descent Method (HSODM) using the homogenized quadratic approximation to the original function. The merit of homogenization is that only the leftmost eigenvector of a gradient-Hessian…

Optimization and Control · Mathematics 2025-04-08 Chuwen Zhang , Dongdong Ge , Chang He , Bo Jiang , Yuntian Jiang , Chenyu Xue , Yinyu Ye