Related papers: Improving Computational Complexity in Statistical …

An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models

Using gradient descent (GD) with fixed or decaying step-size is a standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it…

Machine Learning · Statistics 2023-02-03 Nhat Ho , Tongzheng Ren , Sujay Sanghavi , Purnamrita Sarkar , Rachel Ward

Statistical Inference for Model Parameters in Stochastic Gradient Descent

The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function…

Machine Learning · Statistics 2023-11-02 Xi Chen , Jason D. Lee , Xin T. Tong , Yichen Zhang

New insights and perspectives on the natural gradient method

Natural gradient descent is an optimization method traditionally motivated from the perspective of information geometry, and works well for many applications as an alternative to stochastic gradient descent. In this paper we critically…

Machine Learning · Computer Science 2020-09-22 James Martens

Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation

In this work, we propose Natural Hypergradient Descent (NHGD), a new method for solving bilevel optimization problems. To address the computational bottleneck in hypergradient estimation--namely, the need to compute or approximate Hessian…

Machine Learning · Computer Science 2026-04-02 Deyi Kong , Zaiwei Chen , Shuzhong Zhang , Shancong Mou

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Natural Gradient Methods: Perspectives, Efficient-Scalable Approximations, and Analysis

Natural Gradient Descent, a second-degree optimization method motivated by the information geometry, makes use of the Fisher Information Matrix instead of the Hessian which is typically used. However, in many cases, the Fisher Information…

Machine Learning · Computer Science 2023-03-10 Rajesh Shrestha

Reconstructing Deep Neural Networks: Unleashing the Optimization Potential of Natural Gradient Descent

Natural gradient descent (NGD) is a powerful optimization technique for machine learning, but the computational complexity of the inverse Fisher information matrix limits its application in training deep neural networks. To overcome this…

Machine Learning · Computer Science 2024-12-11 Weihua Liu , Said Boumaraf , Jianwu Li , Chaochao Lin , Xiabi Liu , Lijuan Niu , Naoufel Werghi

Stochastic Proximal Gradient Descent for Nuclear Norm Regularization

In this paper, we utilize stochastic optimization to reduce the space complexity of convex composite optimization with a nuclear norm regularizer, where the variable is a matrix of size $m \times n$. By constructing a low-rank estimate of…

Machine Learning · Computer Science 2015-12-08 Lijun Zhang , Tianbao Yang , Rong Jin , Zhi-Hua Zhou

Online stochastic gradient descent on non-convex losses from high-dimensional inference

Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively…

Machine Learning · Statistics 2023-06-23 Gerard Ben Arous , Reza Gheissari , Aukosh Jagannath

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the…

Machine Learning · Computer Science 2021-08-17 Gergely Neu , Gintare Karolina Dziugaite , Mahdi Haghifam , Daniel M. Roy

Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent

We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss…

Machine Learning · Computer Science 2021-10-18 Tongzheng Ren , Fuheng Cui , Alexia Atsidakou , Sujay Sanghavi , Nhat Ho

Making the Last Iterate of SGD Information Theoretically Optimal

Stochastic gradient descent (SGD) is one of the most widely used algorithms for large scale optimization problems. While classical theoretical analysis of SGD for convex problems studies (suffix) \emph{averages} of iterates and obtains…

Optimization and Control · Mathematics 2019-05-30 Prateek Jain , Dheeraj Nagaraj , Praneeth Netrapalli

Faster Differentially Private Convex Optimization via Second-Order Methods

Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than…

Machine Learning · Computer Science 2023-05-23 Arun Ganesh , Mahdi Haghifam , Thomas Steinke , Abhradeep Thakurta

Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies

Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations. Despite the huge efforts directed at the design of efficient stochastic PG-type algorithms, the…

Machine Learning · Computer Science 2023-11-09 Ilyas Fatkhullin , Anas Barakat , Anastasia Kireeva , Niao He

Parameter-Agnostic Optimization under Relaxed Smoothness

Tuning hyperparameters, such as the stepsize, presents a major challenge of training machine learning models. To address this challenge, numerous adaptive optimization algorithms have been developed that achieve near-optimal complexities,…

Optimization and Control · Mathematics 2023-11-07 Florian Hübler , Junchi Yang , Xiang Li , Niao He

Thermodynamic Natural Gradient Descent

Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by…

Machine Learning · Computer Science 2024-05-24 Kaelan Donatella , Samuel Duffield , Maxwell Aifer , Denis Melanson , Gavin Crooks , Patrick J. Coles

A Novel Structured Natural Gradient Descent for Deep Learning

Natural gradient descent (NGD) provided deep insights and powerful tools to deep neural networks. However the computation of Fisher information matrix becomes more and more difficult as the network structure turns large and complex. This…

Machine Learning · Computer Science 2021-09-22 Weihua Liu , Xiabi Liu

Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite…

Machine Learning · Statistics 2026-05-26 Jose Blanchet , Peter Glynn , Wenhao Yang

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix

This paper introduces a new stochastic optimization method based on the regularized Fisher information matrix (FIM), named SOFIM, which can efficiently utilize the FIM to approximate the Hessian matrix for finding Newton's gradient update…

Machine Learning · Computer Science 2024-05-02 Mrinmay Sen , A. K. Qin , Gayathri C , Raghu Kishore N , Yen-Wei Chen , Balasubramanian Raman