Related papers: Self-Regularized Learning Methods
A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…
When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces…
In an attempt to better understand generalization in deep learning, we study several possible explanations. We show that implicit regularization induced by the optimization method is playing a key role in generalization and success of deep…
We develop a machine-learning framework to learn hyperparameter sequences for accelerated first-order methods (e.g., the step size and momentum sequences in accelerated gradient descent) to quickly solve parametric convex optimization…
Mathematically characterizing the implicit regularization induced by gradient-based optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may…
Over-parameterized neural networks generalize well in practice without any explicit regularization. Although it has not been proven yet, empirical evidence suggests that implicit regularization plays a crucial role in deep learning and…
The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-days overparameterized learning algorithms. This notion refers to the tendency of the…
We propose graph-dependent implicit regularisation strategies for distributed stochastic subgradient descent (Distributed SGD) for convex problems in multi-agent learning. Under the standard assumptions of convexity, Lipschitz continuity,…
Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient…
Regularization techniques are widely employed in optimization-based approaches for solving ill-posed inverse problems in data analysis and scientific computing. These methods are based on augmenting the objective with a penalty function,…
There is growing body of learning problems for which it is natural to organize the parameters into matrix, so as to appropriately regularize the parameters under some matrix norm (in order to impose some more sophisticated prior knowledge).…
We consider the problem of supervised learning with convex loss functions and propose a new form of iterative regularization based on the subgradient method. Unlike other regularization approaches, in iterative regularization no constraint…
Selecting the best regularization parameter in inverse problems is a classical and yet challenging problem. Recently, data-driven approaches have become popular to tackle this challenge. These approaches are appealing since they do require…
We propose \textit{Meta-Regularization}, a novel approach for the adaptive choice of the learning rate in first-order gradient descent methods. Our approach modifies the objective function by adding a regularization term on the learning…
Optimization in machine learning typically deals with the minimization of empirical objectives defined by training data. However, the ultimate goal of learning is to minimize the error on future data (test error), for which the training…
Self-paced learning (SPL) mimics the cognitive mechanism of humans and animals that gradually learns from easy to hard samples. One key issue in SPL is to obtain better weighting strategy that is determined by minimizer function. Existing…
Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity." We study the implicit…
We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks, with emphasis on how different regularization terms affect the model performance. We first derive the convergence rate…
Generalization is a central problem in Machine Learning. Most prediction methods require careful calibration of hyperparameters carried out on a hold-out \textit{validation} dataset to achieve generalization. The main goal of this paper is…
Learned Optimizers (LOs), a type of Meta-learning, have gained traction due to their ability to be parameterized and trained for efficient optimization. Traditional gradient-based methods incorporate explicit regularization techniques such…