Related papers: Explicit Regularization via Regularizer Mirror Des…

The Generalization Error of Stochastic Mirror Descent on Over-Parametrized Linear Models

Despite being highly over-parametrized, and having the ability to fully interpolate the training data, deep networks are known to generalize well to unseen data. It is now understood that part of the reason for this is that the training…

Machine Learning · Computer Science 2023-02-21 Danil Akhtiamov , Babak Hassibi

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how…

Machine Learning · Computer Science 2024-01-12 Haoyuan Sun , Khashayar Gatmiry , Kwangjun Ahn , Navid Azizan

Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization

Most modern learning problems are highly overparameterized, meaning that there are many more parameters than the number of training data points, and as a result, the training loss may have infinitely many global minima (parameter vectors…

Machine Learning · Computer Science 2019-06-11 Navid Azizan , Sahin Lale , Babak Hassibi

Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent

In machine learning and statistical data analysis, we often run into objective function that is a summation: the number of terms in the summation possibly is equal to the sample size, which can be enormous. In such a setting, the stochastic…

Machine Learning · Statistics 2022-08-30 Yiling Luo , Xiaoming Huo , Yajun Mei

Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization

Stochastic descent methods (of the gradient and mirror varieties) have become increasingly popular in optimization. In fact, it is now widely recognized that the success of deep learning is not only due to the special deep architecture of…

Machine Learning · Computer Science 2019-01-21 Navid Azizan , Babak Hassibi

Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications

Deep neural networks with remarkably strong generalization performances are usually over-parameterized. Despite explicit regularization strategies are used for practitioners to avoid over-fitting, the impacts are often small. Some…

Computation and Language · Computer Science 2018-11-05 Deren Lei , Zichen Sun , Yijun Xiao , William Yang Wang

Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes

We present new policy mirror descent (PMD) methods for solving reinforcement learning (RL) problems with either strongly convex or general convex regularizers. By exploring the structural properties of these overall highly nonconvex…

Machine Learning · Computer Science 2022-04-08 Guanghui Lan

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Policy optimization, which finds the desired policy by maximizing value functions via optimization techniques, lies at the heart of reinforcement learning (RL). In addition to value maximization, other practical considerations arise as…

Machine Learning · Computer Science 2023-01-12 Wenhao Zhan , Shicong Cen , Baihe Huang , Yuxin Chen , Jason D. Lee , Yuejie Chi

Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and Reducing Overfitting

In this work, we present some applications of random matrix theory for the training of deep neural networks. Recently, random matrix theory (RMT) has been applied to the overfitting problem in deep learning. Specifically, it has been shown…

Machine Learning · Computer Science 2023-03-17 Yitzchak Shmalo , Jonathan Jenkins , Oleksii Krupchytskyi

Weight Rescaling: Effective and Robust Regularization for Deep Neural Networks with Batch Normalization

Weight decay is often used to ensure good generalization in the training practice of deep neural networks with batch normalization (BN-DNNs), where some convolution layers are invariant to weight rescaling due to the normalization. In this…

Machine Learning · Computer Science 2022-06-22 Ziquan Liu , Yufei Cui , Jia Wan , Yu Mao , Antoni B. Chan

On the Effect of Regularization in Policy Mirror Descent

Policy Mirror Descent (PMD) has emerged as a unifying framework in reinforcement learning (RL) by linking policy gradient methods with a first-order optimization method known as mirror descent. At its core, PMD incorporates two key…

Machine Learning · Computer Science 2025-07-14 Jan Felix Kleuker , Aske Plaat , Thomas Moerland

The Hidden Cost of Approximation in Online Mirror Descent

Online mirror descent (OMD) is a fundamental algorithmic paradigm that underlies many algorithms in optimization, machine learning and sequential decision-making. The OMD iterates are defined as solutions to optimization subproblems which,…

Machine Learning · Computer Science 2025-12-01 Ofir Schlisselberg , Uri Sherman , Tomer Koren , Yishay Mansour

Stochastic Mirror Descent in Average Ensemble Models

The stochastic mirror descent (SMD) algorithm is a general class of training algorithms, which includes the celebrated stochastic gradient descent (SGD), as a special case. It utilizes a mirror potential to influence the implicit bias of…

Machine Learning · Computer Science 2022-10-28 Taylan Kargin , Fariborz Salehi , Babak Hassibi

Boosting Data-Driven Mirror Descent with Randomization, Equivariance, and Acceleration

Learning-to-optimize (L2O) is an emerging research area in large-scale optimization with applications in data science. Recently, researchers have proposed a novel L2O framework called learned mirror descent (LMD), based on the classical…

Optimization and Control · Mathematics 2024-05-13 Hong Ye Tan , Subhadip Mukherjee , Junqi Tang , Carola-Bibiane Schönlieb

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Despite overparameterization, deep networks trained via supervised learning are easy to optimize and exhibit excellent generalization. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit…

Machine Learning · Computer Science 2021-12-10 Aviral Kumar , Rishabh Agarwal , Tengyu Ma , Aaron Courville , George Tucker , Sergey Levine

A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality

Stochastic mirror descent (SMD) is a fairly new family of algorithms that has recently found a wide range of applications in optimization, machine learning, and control. It can be considered a generalization of the classical stochastic…

Optimization and Control · Mathematics 2019-04-04 Navid Azizan , Babak Hassibi

Training Structured Neural Networks Through Manifold Identification and Variance Reduction

This paper proposes an algorithm (RMDA) for training neural networks (NNs) with a regularization term for promoting desired structures. RMDA does not incur computation additional to proximal SGD with momentum, and achieves variance…

Machine Learning · Computer Science 2022-05-02 Zih-Syuan Huang , Ching-pei Lee

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question. To this end, there has been substantial effort to…

Machine Learning · Computer Science 2023-06-27 Haoyuan Sun , Kwangjun Ahn , Christos Thrampoulidis , Navid Azizan

Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness

Deep Neural Network (DNN) trained by the gradient descent method is known to be vulnerable to maliciously perturbed adversarial input, aka. adversarial attack. As one of the countermeasures against adversarial attack, increasing the model…

Computer Vision and Pattern Recognition · Computer Science 2019-05-31 Adnan Siraj Rakin , Zhezhi He , Li Yang , Yanzhi Wang , Liqiang Wang , Deliang Fan

Data augmentation instead of explicit regularization

Contrary to most machine learning models, modern deep artificial neural networks typically include multiple components that contribute to regularization. Despite the fact that some (explicit) regularization techniques, such as weight decay…

Computer Vision and Pattern Recognition · Computer Science 2020-11-13 Alex Hernández-García , Peter König