English
Related papers

Related papers: Gradient descent in matrix factorization: Understa…

200 papers

It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models. This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem, whose goal…

Machine Learning · Computer Science 2023-01-30 Jikai Jin , Zhiyuan Li , Kaifeng Lyu , Simon S. Du , Jason D. Lee

Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through…

Machine Learning · Computer Science 2021-02-25 Tianyi Liu , Yan Li , Song Wei , Enlu Zhou , Tuo Zhao

The nonconvex formulation of the matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient Descent (GD) is a simple yet efficient baseline…

Machine Learning · Statistics 2025-07-03 Daesung Kim , Hye Won Chung

Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor…

Optimization and Control · Mathematics 2023-10-25 Ziye Ma , Javad Lavaei , Somayeh Sojoudi

Recent empirical advances show that training deep models with large learning rate often improves generalization performance. However, theoretical justifications on the benefits of large learning rate are highly limited, due to challenges in…

Machine Learning · Computer Science 2022-03-01 Yuqing Wang , Minshuo Chen , Tuo Zhao , Molei Tao

Many problems encountered in science and engineering can be formulated as estimating a low-rank object (e.g., matrices and tensors) from incomplete, and possibly corrupted, linear measurements. Through the lens of matrix and tensor…

Machine Learning · Computer Science 2023-10-11 Cong Ma , Xingyu Xu , Tian Tong , Yuejie Chi

In deep learning, it is common to use more network parameters than training points. In such scenarioof over-parameterization, there are usually multiple networks that achieve zero training error so that thetraining algorithm induces an…

Machine Learning · Computer Science 2023-08-22 Hung-Hsu Chou , Carsten Gieshoff , Johannes Maly , Holger Rauhut

Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity." We study the implicit…

Machine Learning · Computer Science 2019-10-29 Sanjeev Arora , Nadav Cohen , Wei Hu , Yuping Luo

When training neural networks, it has been widely observed that a large step size is essential in stochastic gradient descent (SGD) for obtaining superior models. However, the effect of large step sizes on the success of SGD is not well…

Machine Learning · Computer Science 2023-02-17 Amirkeivan Mohtashami , Martin Jaggi , Sebastian Stich

Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and…

Machine Learning · Computer Science 2021-06-16 Tian Tong , Cong Ma , Yuejie Chi

Several key questions remain unanswered regarding overparameterized learning models. It is unclear how (stochastic) gradient descent finds solutions that generalize well, and in particular the role of small random initializations. Matrix…

Machine Learning · Computer Science 2025-08-25 Johan S. Wind

Many theoretical studies on neural networks attribute their excellent empirical performance to the implicit bias or regularization induced by first-order optimization algorithms when training networks under certain initialization…

Machine Learning · Computer Science 2025-08-29 Hancheng Min , René Vidal

Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no…

Optimization and Control · Mathematics 2025-11-20 Minrui Luo , Weihang Xu , Xiang Gao , Maryam Fazel , Simon Shaolei Du

This paper rigorously shows how over-parameterization changes the convergence behaviors of gradient descent (GD) for the matrix sensing problem, where the goal is to recover an unknown low-rank ground-truth matrix from near-isotropic linear…

Machine Learning · Computer Science 2023-11-27 Nuoya Xiong , Lijun Ding , Simon S. Du

Matrix factorization is a simple and natural test-bed to investigate the implicit regularization of gradient descent. Gunasekar et al. (2017) conjectured that Gradient Flow with infinitesimal initialization converges to the solution that…

Machine Learning · Computer Science 2021-04-13 Zhiyuan Li , Yuping Luo , Kaifeng Lyu

Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how…

Machine Learning · Computer Science 2024-01-12 Haoyuan Sun , Khashayar Gatmiry , Kwangjun Ahn , Navid Azizan

The generalization mystery in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of…

Machine Learning · Computer Science 2022-06-07 Satrajit Chatterjee , Piotr Zielinski

In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing…

Machine Learning · Statistics 2021-11-24 Anna Kerekes , Anna Mészáros , Ferenc Huszár

Gradient Descent (GD) is a ubiquitous algorithm for finding the optimal solution to an optimization problem. For reduced computational complexity, the optimal solution $\mathrm{x^*}$ of the optimization problem must be attained in a minimum…

Optimization and Control · Mathematics 2023-06-01 Revati Gunjal , Sushama Wagh , Syed Shadab Nayyer , Alex Stankovic , Navdeep M. Singh

We study the gradient descent (GD) dynamics of a depth-2 linear neural network with a single input and output. We show that GD converges at an explicit linear rate to a global minimum of the training loss, even with a large stepsize --…

Machine Learning · Computer Science 2025-01-22 Pierfrancesco Beneventano , Blake Woodworth
‹ Prev 1 2 3 10 Next ›