Related papers: On Infinite-Width Hypernetworks

A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks

Deep neural networks' remarkable ability to correctly fit training data when optimized by gradient-based algorithms is yet to be fully understood. Recent theoretical results explain the convergence for ReLU networks that are wider than…

Machine Learning · Computer Science 2021-02-09 Asaf Noy , Yi Xu , Yonathan Aflalo , Lihi Zelnik-Manor , Rong Jin

HyperNetworks

This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the…

Machine Learning · Computer Science 2016-12-02 David Ha , Andrew Dai , Quoc V. Le

Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization

While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1)…

Machine Learning · Computer Science 2019-08-27 Tomaso Poggio , Andrzej Banburski , Qianli Liao

Demystifying the Global Convergence Puzzle of Learning Over-parameterized ReLU Nets in Very High Dimensions

This theoretical paper is devoted to developing a rigorous theory for demystifying the global convergence phenomenon in a challenging scenario: learning over-parameterized Rectified Linear Unit (ReLU) nets for very high dimensional dataset…

Machine Learning · Computer Science 2022-06-08 Peng He

Most ReLU Networks Admit Identifiable Parameters

We study the realization map of deep ReLU networks, focusing on when a function determines its parameters up to scaling and permutation. To analyze hidden redundancies beyond these standard symmetries, we introduce a framework based on…

Machine Learning · Computer Science 2026-05-21 Moritz Grillo , Guido Montúfar

Global convergence of ResNets: From finite to infinite width using linear parameterization

Overparametrization is a key factor in the absence of convexity to explain global convergence of gradient descent (GD) for neural networks. Beside the well studied lazy regime, infinite width (mean field) analysis has been developed for…

Neural and Evolutionary Computing · Computer Science 2023-02-07 Raphaël Barboni , Gabriel Peyré , François-Xavier Vialard

Convex Geometry and Duality of Over-parameterized Neural Networks

We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We first prove that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set, where simple…

Machine Learning · Computer Science 2021-09-01 Tolga Ergen , Mert Pilanci

Over-parametrized neural networks as under-determined linear systems

We draw connections between simple neural networks and under-determined linear systems to comprehensively explore several interesting theoretical questions in the study of neural networks. First, we emphatically show that it is unsurprising…

Numerical Analysis · Mathematics 2020-11-02 Austin R. Benson , Anil Damle , Alex Townsend

A Brief Review of Hypernetworks in Deep Learning

Hypernetworks, or hypernets for short, are neural networks that generate weights for another neural network, known as the target network. They have emerged as a powerful deep learning technique that allows for greater flexibility,…

Machine Learning · Computer Science 2025-01-03 Vinod Kumar Chauhan , Jiandong Zhou , Ping Lu , Soheila Molaei , David A. Clifton

ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models

Neural networks often operate in the overparameterized regime, in which there are far more parameters than training samples, allowing the training data to be fit perfectly. That is, training the network effectively learns an interpolating…

Machine Learning · Computer Science 2025-03-19 Suzanna Parkinson , Greg Ongie , Rebecca Willett

Towards moderate overparameterization: global convergence guarantees for training shallow neural networks

Many modern neural network architectures are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Sufficiently overparameterized neural network architectures in principle have the…

Machine Learning · Computer Science 2019-02-14 Samet Oymak , Mahdi Soltanolkotabi

Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks

We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global…

Optimization and Control · Mathematics 2025-01-13 David A. R. Robin , Kevin Scaman , Marc Lelarge

Deep Convolutional Framelets: A General Deep Learning Framework for Inverse Problems

Recently, deep learning approaches with various network architectures have achieved significant performance improvement over existing iterative reconstruction methods in various imaging problems. However, it is still unclear why these deep…

Machine Learning · Statistics 2018-01-26 Jong Chul Ye , Yoseob Han , Eunju Cha

How Infinitely Wide Neural Networks Can Benefit from Multi-task Learning -- an Exact Macroscopic Characterization

In practice, multi-task learning (through learning features shared among tasks) is an essential property of deep neural networks (NNs). While infinite-width limits of NNs can provide good intuition for their generalization behavior, the…

Machine Learning · Computer Science 2022-10-21 Jakob Heiss , Josef Teichmann , Hanna Wutte

Understanding the role of depth in the neural tangent kernel for overparameterized neural networks

Overparameterized fully-connected neural networks have been shown to behave like kernel models when trained with gradient descent, under mild conditions on the width, the learning rate, and the parameter initialization. In the limit of…

Machine Learning · Computer Science 2025-11-11 William St-Arnaud , Margarida Carvalho , Golnoosh Farnadi

On the Expressiveness and Generalization of Hypergraph Neural Networks

This extended abstract describes a framework for analyzing the expressiveness, learning, and (structural) generalization of hypergraph neural networks (HyperGNNs). Specifically, we focus on how HyperGNNs can learn from finite datasets and…

Machine Learning · Computer Science 2023-03-10 Zhezheng Luo , Jiayuan Mao , Joshua B. Tenenbaum , Leslie Pack Kaelbling

Foundation Models Secretly Understand Neural Network Weights: Enhancing Hypernetwork Architectures with Foundation Models

Large pre-trained models, or foundation models, have shown impressive performance when adapted to a variety of downstream tasks, often out-performing specialized models. Hypernetworks, neural networks that generate some or all of the…

Machine Learning · Computer Science 2025-03-04 Jeffrey Gu , Serena Yeung-Levy

On the growth of the parameters of approximating ReLU neural networks

This work focuses on the analysis of fully connected feed forward ReLU neural networks as they approximate a given, smooth function. In contrast to conventionally studied universal approximation properties under increasing architectures,…

Machine Learning · Computer Science 2024-06-24 Erion Morina , Martin Holler

Quasi-Equivariant Metanetworks

Metanetworks are neural architectures designed to operate directly on pretrained weights to perform downstream tasks. However, the parameter space serves only as a proxy for the underlying function class, and the parameter-function mapping…

Machine Learning · Computer Science 2026-04-28 Viet-Hoang Tran , An Nguyen , Benoît Guérand , Thieu N. Vo , Tan M. Nguyen

On the optimization and generalization of overparameterized implicit neural networks

Implicit neural networks have become increasingly attractive in the machine learning community since they can achieve competitive performance but use much less computational resources. Recently, a line of theoretical works established the…

Machine Learning · Computer Science 2022-10-03 Tianxiang Gao , Hongyang Gao