Related papers: Orthogonal Over-Parameterized Training

Orthogonal Transforms in Neural Networks Amount to Effective Regularization

We consider applications of neural networks in nonlinear system identification and formulate a hypothesis that adjusting general network structure by incorporating frequency information or other known orthogonal transform, should result in…

Machine Learning · Computer Science 2025-01-24 Krzysztof Zając , Wojciech Sopot , Paweł Wachel

Subquadratic Overparameterization for Shallow Neural Networks

Overparameterization refers to the important phenomenon where the width of a neural network is chosen such that learning algorithms can provably attain zero loss in nonconvex training. The existing theory establishes such global convergence…

Machine Learning · Computer Science 2021-11-04 Chaehwan Song , Ali Ramezani-Kebrya , Thomas Pethick , Armin Eftekhari , Volkan Cevher

Nearly Minimal Over-Parametrization of Shallow Neural Networks

A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario,…

Machine Learning · Computer Science 2019-10-30 Armin Eftekhari , ChaeHwan Song , Volkan Cevher

Port-Hamiltonian Approach to Neural Network Training

Neural networks are discrete entities: subdivided into discrete layers and parametrized by weights which are iteratively optimized via difference equations. Recent work proposes networks with layer outputs which are no longer quantized but…

Neural and Evolutionary Computing · Computer Science 2019-09-09 Stefano Massaroli , Michael Poli , Federico Califano , Angela Faragasso , Jinkyoo Park , Atsushi Yamashita , Hajime Asama

An Improved Analysis of Training Over-parameterized Deep Neural Networks

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

Learning through atypical "phase transitions" in overparameterized neural networks

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of…

Machine Learning · Computer Science 2022-07-27 Carlo Baldassi , Clarissa Lauditi , Enrico M. Malatesta , Rosalba Pacelli , Gabriele Perugini , Riccardo Zecchina

Parameter Interpolation Adversarial Training for Robust Image Classification

Though deep neural networks exhibit superior performance on various tasks, they are still plagued by adversarial examples. Adversarial training has been demonstrated to be the most effective method to defend against adversarial attacks.…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Xin Liu , Yichen Yang , Kun He , John E. Hopcroft

Orthogonal Convolutional Neural Networks

Deep convolutional neural networks are hindered by training instability and feature redundancy towards further performance improvement. A promising solution is to impose orthogonality on convolutional filters. We develop an efficient…

Computer Vision and Pattern Recognition · Computer Science 2020-04-09 Jiayun Wang , Yubei Chen , Rudrasis Chakraborty , Stella X. Yu

Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections

The problem of learning long-term dependencies in sequences using Recurrent Neural Networks (RNNs) is still a major challenge. Recent methods have been suggested to solve this problem by constraining the transition matrix to be unitary…

Machine Learning · Computer Science 2017-06-14 Zakaria Mhammedi , Andrew Hellicar , Ashfaqur Rahman , James Bailey

ORFit: One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares

While large machine learning models have shown remarkable performance in various domains, their training typically requires iterating for many passes over the training data. However, due to computational and memory constraints and potential…

Machine Learning · Computer Science 2025-11-04 Youngjae Min , Namhoon Cho , Navid Azizan

On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions

We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup. In line with previous works on the same neural architecture, the optimization is performed…

Optimization and Control · Mathematics 2023-11-08 Simon Martin , Francis Bach , Giulio Biroli

Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks

Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to…

Machine Learning · Computer Science 2017-11-22 Lei Huang , Xianglong Liu , Bo Lang , Adams Wei Yu , Yongliang Wang , Bo Li

Towards moderate overparameterization: global convergence guarantees for training shallow neural networks

Many modern neural network architectures are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Sufficiently overparameterized neural network architectures in principle have the…

Machine Learning · Computer Science 2019-02-14 Samet Oymak , Mahdi Soltanolkotabi

NLPOpt-Net: A Learning Method for Nonlinear Optimization with Feasibility Guarantees

Nonlinear Parametric Optimization Network (NLPOpt-Net) is an unsupervised learning architecture to solve constrained nonlinear programs (NLP). Given the structure of an NLP, it learns the parametric solution maps with guaranteed constraint…

Machine Learning · Computer Science 2026-05-04 Bimol Nath Roy , Rahul Golder , MM Faruque Hasan

Controllable Orthogonalization in Training DNNs

Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1 and reduce redundancy in representation. This paper proposes a computationally efficient and…

Computer Vision and Pattern Recognition · Computer Science 2020-04-03 Lei Huang , Li Liu , Fan Zhu , Diwen Wan , Zehuan Yuan , Bo Li , Ling Shao

Orthogonal Stochastic Configuration Networks with Adaptive Construction Parameter for Data Analytics

As a randomized learner model, SCNs are remarkable that the random weights and biases are assigned employing a supervisory mechanism to ensure universal approximation and fast learning. However, the randomness makes SCNs more likely to…

Machine Learning · Computer Science 2022-05-27 Wei Dai , Chuanfeng Ning , Shiyu Pei , Song Zhu , Xuesong Wang

Implicit Regularization and Generalization in Overparameterized Neural Networks

Classical statistical learning theory predicts that overparameterized models should exhibit severe overfitting, yet modern deep neural networks with far more parameters than training samples consistently generalize well. This contradiction…

Machine Learning · Computer Science 2026-04-10 Zeran Johannsen

Learning with Hyperspherical Uniformity

Due to the over-parameterization nature, neural networks are a powerful tool for nonlinear function approximation. In order to achieve good generalization on unseen data, a suitable inductive bias is of great importance for neural networks.…

Machine Learning · Computer Science 2021-11-17 Weiyang Liu , Rongmei Lin , Zhen Liu , Li Xiong , Bernhard Schölkopf , Adrian Weller

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

The selection of initial parameter values for gradient-based optimization of deep neural networks is one of the most impactful hyperparameter choices in deep learning systems, affecting both convergence times and model performance. Yet…

Machine Learning · Computer Science 2020-01-17 Wei Hu , Lechao Xiao , Jeffrey Pennington

Neuro-Optimization: Learning Objective Functions Using Neural Networks

Mathematical optimization is widely used in various research fields. With a carefully-designed objective function, mathematical optimization can be quite helpful in solving many problems. However, objective functions are usually…

Machine Learning · Computer Science 2019-05-27 Younghan Jeon , Minsik Lee , Jin Young Choi