Related papers: Implicit Regularization in Over-Parameterized Supp…

Understanding Implicit Regularization in Over-Parameterized Single Index Model

In this paper, we leverage over-parameterization to design regularization-free algorithms for the high-dimensional single index model and provide theoretical guarantees for the induced implicit regularization phenomenon. Specifically, we…

Machine Learning · Statistics 2021-11-18 Jianqing Fan , Zhuoran Yang , Mengxin Yu

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often…

Machine Learning · Computer Science 2020-06-09 Cong Ma , Kaizheng Wang , Yuejie Chi , Yuxin Chen

A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and Nonsmooth Bi-level Optimization

Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from…

Machine Learning · Computer Science 2022-06-06 Ziyi Chen , Bhavya Kailkhura , Yi Zhou

Subgradient Regularization: A Descent-Oriented Subgradient Method for Nonsmooth Optimization

In nonsmooth optimization, a negative subgradient is not necessarily a descent direction, making the design of convergent descent methods based on zeroth-order and first-order information a challenging task. The well-studied bundle methods…

Optimization and Control · Mathematics 2025-05-13 Hanyang Li , Ying Cui

A geometric alternative to Nesterov's accelerated gradient descent

We propose a new method for unconstrained optimization of a smooth and strongly convex function, which attains the optimal rate of convergence of Nesterov's accelerated gradient descent. The new algorithm has a simple geometric…

Optimization and Control · Mathematics 2015-06-30 Sébastien Bubeck , Yin Tat Lee , Mohit Singh

Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent

In machine learning and statistical data analysis, we often run into objective function that is a summation: the number of terms in the summation possibly is equal to the sample size, which can be enormous. In such a setting, the stochastic…

Machine Learning · Statistics 2022-08-30 Yiling Luo , Xiaoming Huo , Yajun Mei

Stochastic Nested Variance Reduction for Nonconvex Optimization

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with…

Machine Learning · Computer Science 2020-10-20 Dongruo Zhou , Pan Xu , Quanquan Gu

High-Dimensional Linear Regression via Implicit Regularization

Many statistical estimators for high-dimensional linear regression are M-estimators, formed through minimizing a data-dependent square loss function plus a regularizer. This work considers a new class of estimators implicitly defined…

Statistics Theory · Mathematics 2022-02-15 Peng Zhao , Yun Yang , Qiao-Chu He

Learnable Descent Algorithm for Nonsmooth Nonconvex Image Reconstruction

We propose a general learning based framework for solving nonsmooth and nonconvex image reconstruction problems. We model the regularization function as the composition of the $l_{2,1}$ norm and a smooth but nonconvex feature mapping…

Computer Vision and Pattern Recognition · Computer Science 2022-09-07 Yunmei Chen , Hongcheng Liu , Xiaojing Ye , Qingchao Zhang

A Novel Learnable Gradient Descent Type Algorithm for Non-convex Non-smooth Inverse Problems

Optimization algorithms for solving nonconvex inverse problem have attracted significant interests recently. However, existing methods require the nonconvex regularization to be smooth or simple to ensure convergence. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2020-03-26 Qingchao Zhang , Xiaojing Ye , Hongcheng Liu , Yunmei Chen

A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

When equipped with efficient optimization algorithms, the over-parameterized neural networks have demonstrated high level of performance even though the loss function is non-convex and non-smooth. While many works have been focusing on…

Machine Learning · Computer Science 2021-03-11 Zhiqi Bu , Shiyun Xu , Kan Chen

Tuning-Free Structured Sparse Recovery of Multiple Measurement Vectors using Implicit Regularization

Recovering jointly sparse signals in the multiple measurement vectors (MMV) setting is a fundamental problem in machine learning, but traditional methods often require careful parameter tuning or prior knowledge of the sparsity of the…

Machine Learning · Computer Science 2026-02-02 Lakshmi Jayalal , Sheetal Kalyani

Implicit Regularization in Over-parameterized Neural Networks

Over-parameterized neural networks generalize well in practice without any explicit regularization. Although it has not been proven yet, empirical evidence suggests that implicit regularization plays a crucial role in deep learning and…

Machine Learning · Computer Science 2019-03-07 Masayoshi Kubo , Ryotaro Banno , Hidetaka Manabe , Masataka Minoji

On Lower and Upper Bounds in Smooth Strongly Convex Optimization - A Unified Approach via Linear Iterative Methods

In this thesis we develop a novel framework to study smooth and strongly convex optimization algorithms, both deterministic and stochastic. Focusing on quadratic functions we are able to examine optimization algorithms as a recursive…

Optimization and Control · Mathematics 2014-10-24 Yossi Arjevani

Efficient Distributed Learning over Decentralized Networks with Convoluted Support Vector Machine

This paper addresses the problem of efficiently classifying high-dimensional data over decentralized networks. Penalized support vector machines (SVMs) are widely used for high-dimensional classification tasks. However, the double…

Machine Learning · Statistics 2025-03-11 Canyi Chen , Nan Qiao , Liping Zhu

Proximal gradient method for huberized support vector machine

The Support Vector Machine (SVM) has been used in a wide variety of classification problems. The original SVM uses the hinge loss function, which is non-differentiable and makes the problem difficult to solve in particular for regularized…

Machine Learning · Statistics 2015-12-01 Yangyang Xu , Ioannis Akrotirianakis , Amit Chakraborty

Estimate Sequences for Variance-Reduced Stochastic Composite Optimization

In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. This point of view covers the stochastic gradient…

Machine Learning · Statistics 2019-05-08 Andrei Kulunchakov , Julien Mairal

Explicit Regularization of Stochastic Gradient Methods through Duality

We consider stochastic gradient methods under the interpolation regime where a perfect fit can be obtained (minimum loss at each observation). While previous work highlighted the implicit regularization of such algorithms, we consider an…

Optimization and Control · Mathematics 2020-04-01 Anant Raj , Francis Bach

Stochastic Subspace Descent

We present two stochastic descent algorithms that apply to unconstrained optimization and are particularly efficient when the objective function is slow to evaluate and gradients are not easily obtained, as in some PDE-constrained…

Optimization and Control · Mathematics 2019-04-30 David Kozak , Stephen Becker , Alireza Doostan , Luis Tenorio

Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications

Deep neural networks with remarkably strong generalization performances are usually over-parameterized. Despite explicit regularization strategies are used for practitioners to avoid over-fitting, the impacts are often small. Some…

Computation and Language · Computer Science 2018-11-05 Deren Lei , Zichen Sun , Yijun Xiao , William Yang Wang