Related papers: Spherical Perspective on Learning with Normalizati…

Learning with Hyperspherical Uniformity

Due to the over-parameterization nature, neural networks are a powerful tool for nonlinear function approximation. In order to achieve good generalization on unseen data, a suitable inductive bias is of great importance for neural networks.…

Machine Learning · Computer Science 2021-11-17 Weiyang Liu , Rongmei Lin , Zhen Liu , Li Xiong , Bernhard Schölkopf , Adrian Weller

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

Adaptive gradient methods such as Adam have gained increasing popularity in deep learning optimization. However, it has been observed that compared with (stochastic) gradient descent, Adam can converge to a different solution with a…

Machine Learning · Computer Science 2021-08-26 Difan Zou , Yuan Cao , Yuanzhi Li , Quanquan Gu

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their…

Machine Learning · Computer Science 2023-01-18 Maxim Kodryan , Ekaterina Lobacheva , Maksim Nakhodnov , Dmitry Vetrov

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

Normalization layers (e.g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets. Motivated by the long-held…

Machine Learning · Computer Science 2023-01-18 Kaifeng Lyu , Zhiyuan Li , Sanjeev Arora

Deep Learning Based Sphere Decoding

In this paper, a deep learning (DL)-based sphere decoding algorithm is proposed, where the radius of the decoding hypersphere is learned by a deep neural network (DNN). The performance achieved by the proposed algorithm is very close to the…

Signal Processing · Electrical Eng. & Systems 2024-03-26 Mostafa Mohammadkarimi , Mehrtash Mehrabi , Masoud Ardakani , Yindi Jing

Geometry Perspective Of Estimating Learning Capability Of Neural Networks

The paper uses statistical and differential geometric motivation to acquire prior information about the learning capability of an artificial neural network on a given dataset. The paper considers a broad class of neural networks with…

Machine Learning · Computer Science 2020-12-02 Ankan Dutta , Arnab Rakshit

Learning and Generalization in Overparameterized Normalizing Flows

In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with a sufficiently small learning rate and…

Machine Learning · Computer Science 2022-03-24 Kulin Shah , Amit Deshpande , Navin Goyal

Narrowing the Focus: Learned Optimizers for Pretrained Models

In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of…

Machine Learning · Computer Science 2024-10-08 Gus Kristiansen , Mark Sandler , Andrey Zhmoginov , Nolan Miller , Anirudh Goyal , Jihwan Lee , Max Vladymyrov

Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold

When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called "neural collapse" phenomenon. More specifically, for the output features of the penultimate…

Machine Learning · Computer Science 2023-03-09 Can Yaras , Peng Wang , Zhihui Zhu , Laura Balzano , Qing Qu

Online Learning for the Random Feature Model in the Student-Teacher Framework

Deep neural networks are widely used prediction algorithms whose performance often improves as the number of weights increases, leading to over-parametrization. We consider a two-layered neural network whose first layer is frozen while the…

Machine Learning · Computer Science 2023-04-10 Roman Worschech , Bernd Rosenow

New Interpretations of Normalization Methods in Deep Learning

In recent years, a variety of normalization methods have been proposed to help train neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However,…

Machine Learning · Computer Science 2020-06-17 Jiacheng Sun , Xiangyong Cao , Hanwen Liang , Weiran Huang , Zewei Chen , Zhenguo Li

Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach

Neural networks, a central tool in machine learning, have demonstrated remarkable, high fidelity performance on image recognition and classification tasks. These successes evince an ability to accurately represent high dimensional…

Machine Learning · Statistics 2023-02-08 Grant M. Rotskoff , Eric Vanden-Eijnden

Scale-Regularized Filter Learning

We start out by demonstrating that an elementary learning task, corresponding to the training of a single linear neuron in a convolutional neural network, can be solved for feature spaces of very high dimensionality. In a second step,…

Computer Vision and Pattern Recognition · Computer Science 2017-07-11 Marco Loog , François Lauze

A Mean Field View of the Landscape of Two-Layers Neural Networks

Multi-layer neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy mathematical understanding. Learning a neural network requires to optimize a non-convex high-dimensional…

Machine Learning · Statistics 2022-06-08 Song Mei , Andrea Montanari , Phan-Minh Nguyen

The Neural Differential Manifold: An Architecture with Explicit Geometric Structure

This paper introduces the Neural Differential Manifold (NDM), a novel neural network architecture that explicitly incorporates geometric structure into its fundamental design. Departing from conventional Euclidean parameter spaces, the NDM…

Machine Learning · Computer Science 2025-10-30 Di Zhang

Theory of Deep Convolutional Neural Networks II: Spherical Analysis

Deep learning based on deep neural networks of various structures and architectures has been powerful in many practical applications, but it lacks enough theoretical verifications. In this paper, we consider a family of deep convolutional…

Machine Learning · Computer Science 2020-07-29 Zhiying Fang , Han Feng , Shuo Huang , Ding-Xuan Zhou

On Learning Sets of Symmetric Elements

Learning from unordered sets is a fundamental learning setup, recently attracting increasing attention. Research in this area has focused on the case where elements of the set are represented by feature vectors, and far less emphasis has…

Machine Learning · Computer Science 2020-12-01 Haggai Maron , Or Litany , Gal Chechik , Ethan Fetaya

Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD

In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD). Most related works study SMD by focusing on "effective…

Machine Learning · Statistics 2020-11-30 Ruosi Wan , Zhanxing Zhu , Xiangyu Zhang , Jian Sun

Block-Normalized Gradient Method: An Empirical Study for Training Deep Neural Network

In this paper, we propose a generic and simple strategy for utilizing stochastic gradient information in optimization. The technique essentially contains two consecutive steps in each iteration: 1) computing and normalizing each block…

Machine Learning · Computer Science 2018-04-24 Adams Wei Yu , Lei Huang , Qihang Lin , Ruslan Salakhutdinov , Jaime Carbonell