English
Related papers

Related papers: Orthogonal Over-Parameterized Training

200 papers

We consider applications of neural networks in nonlinear system identification and formulate a hypothesis that adjusting general network structure by incorporating frequency information or other known orthogonal transform, should result in…

Machine Learning · Computer Science 2025-01-24 Krzysztof Zając , Wojciech Sopot , Paweł Wachel

Overparameterization refers to the important phenomenon where the width of a neural network is chosen such that learning algorithms can provably attain zero loss in nonconvex training. The existing theory establishes such global convergence…

Machine Learning · Computer Science 2021-11-04 Chaehwan Song , Ali Ramezani-Kebrya , Thomas Pethick , Armin Eftekhari , Volkan Cevher

A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario,…

Machine Learning · Computer Science 2019-10-30 Armin Eftekhari , ChaeHwan Song , Volkan Cevher

Neural networks are discrete entities: subdivided into discrete layers and parametrized by weights which are iteratively optimized via difference equations. Recent work proposes networks with layer outputs which are no longer quantized but…

Neural and Evolutionary Computing · Computer Science 2019-09-09 Stefano Massaroli , Michael Poli , Federico Califano , Angela Faragasso , Jinkyoo Park , Atsushi Yamashita , Hajime Asama

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of…

Though deep neural networks exhibit superior performance on various tasks, they are still plagued by adversarial examples. Adversarial training has been demonstrated to be the most effective method to defend against adversarial attacks.…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Xin Liu , Yichen Yang , Kun He , John E. Hopcroft

Deep convolutional neural networks are hindered by training instability and feature redundancy towards further performance improvement. A promising solution is to impose orthogonality on convolutional filters. We develop an efficient…

Computer Vision and Pattern Recognition · Computer Science 2020-04-09 Jiayun Wang , Yubei Chen , Rudrasis Chakraborty , Stella X. Yu

The problem of learning long-term dependencies in sequences using Recurrent Neural Networks (RNNs) is still a major challenge. Recent methods have been suggested to solve this problem by constraining the transition matrix to be unitary…

Machine Learning · Computer Science 2017-06-14 Zakaria Mhammedi , Andrew Hellicar , Ashfaqur Rahman , James Bailey

While large machine learning models have shown remarkable performance in various domains, their training typically requires iterating for many passes over the training data. However, due to computational and memory constraints and potential…

Machine Learning · Computer Science 2025-11-04 Youngjae Min , Namhoon Cho , Navid Azizan

We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup. In line with previous works on the same neural architecture, the optimization is performed…

Optimization and Control · Mathematics 2023-11-08 Simon Martin , Francis Bach , Giulio Biroli

Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to…

Machine Learning · Computer Science 2017-11-22 Lei Huang , Xianglong Liu , Bo Lang , Adams Wei Yu , Yongliang Wang , Bo Li

Many modern neural network architectures are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Sufficiently overparameterized neural network architectures in principle have the…

Machine Learning · Computer Science 2019-02-14 Samet Oymak , Mahdi Soltanolkotabi

Nonlinear Parametric Optimization Network (NLPOpt-Net) is an unsupervised learning architecture to solve constrained nonlinear programs (NLP). Given the structure of an NLP, it learns the parametric solution maps with guaranteed constraint…

Machine Learning · Computer Science 2026-05-04 Bimol Nath Roy , Rahul Golder , MM Faruque Hasan

Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1 and reduce redundancy in representation. This paper proposes a computationally efficient and…

Computer Vision and Pattern Recognition · Computer Science 2020-04-03 Lei Huang , Li Liu , Fan Zhu , Diwen Wan , Zehuan Yuan , Bo Li , Ling Shao

As a randomized learner model, SCNs are remarkable that the random weights and biases are assigned employing a supervisory mechanism to ensure universal approximation and fast learning. However, the randomness makes SCNs more likely to…

Machine Learning · Computer Science 2022-05-27 Wei Dai , Chuanfeng Ning , Shiyu Pei , Song Zhu , Xuesong Wang

Classical statistical learning theory predicts that overparameterized models should exhibit severe overfitting, yet modern deep neural networks with far more parameters than training samples consistently generalize well. This contradiction…

Machine Learning · Computer Science 2026-04-10 Zeran Johannsen

Due to the over-parameterization nature, neural networks are a powerful tool for nonlinear function approximation. In order to achieve good generalization on unseen data, a suitable inductive bias is of great importance for neural networks.…

Machine Learning · Computer Science 2021-11-17 Weiyang Liu , Rongmei Lin , Zhen Liu , Li Xiong , Bernhard Schölkopf , Adrian Weller

The selection of initial parameter values for gradient-based optimization of deep neural networks is one of the most impactful hyperparameter choices in deep learning systems, affecting both convergence times and model performance. Yet…

Machine Learning · Computer Science 2020-01-17 Wei Hu , Lechao Xiao , Jeffrey Pennington

Mathematical optimization is widely used in various research fields. With a carefully-designed objective function, mathematical optimization can be quite helpful in solving many problems. However, objective functions are usually…

Machine Learning · Computer Science 2019-05-27 Younghan Jeon , Minsik Lee , Jin Young Choi
‹ Prev 1 2 3 10 Next ›