Related papers: Gradient descent in matrix factorization: Understa…

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing

It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models. This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem, whose goal…

Machine Learning · Computer Science 2023-01-30 Jikai Jin , Zhiyuan Li , Kaifeng Lyu , Simon S. Du , Jason D. Lee

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through…

Machine Learning · Computer Science 2021-02-25 Tianyi Liu , Yan Li , Song Wei , Enlu Zhou , Tuo Zhao

Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization

The nonconvex formulation of the matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient Descent (GD) is a simple yet efficient baseline…

Machine Learning · Statistics 2025-07-03 Daesung Kim , Hye Won Chung

Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing

Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor…

Optimization and Control · Mathematics 2023-10-25 Ziye Ma , Javad Lavaei , Somayeh Sojoudi

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

Recent empirical advances show that training deep models with large learning rate often improves generalization performance. However, theoretical justifications on the benefits of large learning rate are highly limited, due to challenges in…

Machine Learning · Computer Science 2022-03-01 Yuqing Wang , Minshuo Chen , Tuo Zhao , Molei Tao

Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization

Many problems encountered in science and engineering can be formulated as estimating a low-rank object (e.g., matrices and tensors) from incomplete, and possibly corrupted, linear measurements. Through the lens of matrix and tensor…

Machine Learning · Computer Science 2023-10-11 Cong Ma , Xingyu Xu , Tian Tong , Yuejie Chi

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

In deep learning, it is common to use more network parameters than training points. In such scenarioof over-parameterization, there are usually multiple networks that achieve zero training error so that thetraining algorithm induces an…

Machine Learning · Computer Science 2023-08-22 Hung-Hsu Chou , Carsten Gieshoff , Johannes Maly , Holger Rauhut

Implicit Regularization in Deep Matrix Factorization

Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity." We study the implicit…

Machine Learning · Computer Science 2019-10-29 Sanjeev Arora , Nadav Cohen , Wei Hu , Yuping Luo

Special Properties of Gradient Descent with Large Learning Rates

When training neural networks, it has been widely observed that a large step size is essential in stochastic gradient descent (SGD) for obtaining superior models. However, the effect of large step sizes on the success of SGD is not well…

Machine Learning · Computer Science 2023-02-17 Amirkeivan Mohtashami , Martin Jaggi , Sebastian Stich

Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent

Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and…

Machine Learning · Computer Science 2021-06-16 Tian Tong , Cong Ma , Yuejie Chi

Implicit Regularization Makes Overparameterized Asymmetric Matrix Sensing Robust to Perturbations

Several key questions remain unanswered regarding overparameterized learning models. It is unclear how (stochastic) gradient descent finds solutions that generalize well, and in particular the role of small random initializations. Matrix…

Machine Learning · Computer Science 2025-08-25 Johan S. Wind

Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization

Many theoretical studies on neural networks attribute their excellent empirical performance to the implicit bias or regularization induced by first-order optimization algorithms when training networks under certain initialization…

Machine Learning · Computer Science 2025-08-29 Hancheng Min , René Vidal

Global Convergence of Four-Layer Matrix Factorization under Random Initialization

Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no…

Optimization and Control · Mathematics 2025-11-20 Minrui Luo , Weihang Xu , Xiang Gao , Maryam Fazel , Simon Shaolei Du

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

This paper rigorously shows how over-parameterization changes the convergence behaviors of gradient descent (GD) for the matrix sensing problem, where the goal is to recover an unknown low-rank ground-truth matrix from near-isotropic linear…

Machine Learning · Computer Science 2023-11-27 Nuoya Xiong , Lijun Ding , Simon S. Du

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning

Matrix factorization is a simple and natural test-bed to investigate the implicit regularization of gradient descent. Gunasekar et al. (2017) conjectured that Gradient Flow with infinitesimal initialization converges to the solution that…

Machine Learning · Computer Science 2021-04-13 Zhiyuan Li , Yuping Luo , Kaifeng Lyu

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how…

Machine Learning · Computer Science 2024-01-12 Haoyuan Sun , Khashayar Gatmiry , Kwangjun Ahn , Navid Azizan

On the Generalization Mystery in Deep Learning

The generalization mystery in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of…

Machine Learning · Computer Science 2022-06-07 Satrajit Chatterjee , Piotr Zielinski

Depth Without the Magic: Inductive Bias of Natural Gradient Descent

In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing…

Machine Learning · Statistics 2021-11-24 Anna Kerekes , Anna Mészáros , Ferenc Huszár

A New Perspective of Accelerated Gradient Methods: The Controlled Invariant Manifold Approach

Gradient Descent (GD) is a ubiquitous algorithm for finding the optimal solution to an optimization problem. For reduced computational complexity, the optimal solution $\mathrm{x^*}$ of the optimization problem must be attained in a minimum…

Optimization and Control · Mathematics 2023-06-01 Revati Gunjal , Sushama Wagh , Syed Shadab Nayyer , Alex Stankovic , Navdeep M. Singh

Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks

We study the gradient descent (GD) dynamics of a depth-2 linear neural network with a single input and output. We show that GD converges at an explicit linear rate to a global minimum of the training loss, even with a large stepsize --…

Machine Learning · Computer Science 2025-01-22 Pierfrancesco Beneventano , Blake Woodworth