Related papers: Every Model Learned by Gradient Descent Is Approxi…

Neural Networks can Learn Representations with Gradient Descent

Significant theoretical work has established that in specific regimes, neural networks trained by gradient descent behave like kernel methods. However, in practice, it is known that neural networks strongly outperform their associated…

Machine Learning · Computer Science 2022-07-01 Alex Damian , Jason D. Lee , Mahdi Soltanolkotabi

Deep Equals Shallow for ReLU Networks in Kernel Regimes

Deep networks are often considered to be more expressive than shallow ones in terms of approximation. Indeed, certain functions can be approximated by deep networks provably more efficiently than by shallow ones, however, no tractable…

Machine Learning · Statistics 2021-08-27 Alberto Bietti , Francis Bach

Adaptive Deep Kernel Learning

Deep kernel learning provides an elegant and principled framework for combining the structural properties of deep learning algorithms with the flexibility of kernel methods. By means of a deep neural network, we learn a parametrized kernel…

Machine Learning · Computer Science 2020-12-14 Prudencio Tossou , Basile Dura , Francois Laviolette , Mario Marchand , Alexandre Lacoste

Layer-wise training of deep networks using kernel similarity

Deep learning has shown promising results in many machine learning applications. The hierarchical feature representation built by deep networks enable compact and precise encoding of the data. A kernel analysis of the trained deep networks…

Machine Learning · Computer Science 2017-03-22 Mandar Kulkarni , Shirish Karande

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

An open question in the Deep Learning community is why neural networks trained with Gradient Descent generalize well on real datasets even though they are capable of fitting random data. We propose an approach to answering this question…

Machine Learning · Computer Science 2020-02-26 Satrajit Chatterjee

Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features

In recent years neural networks have achieved impressive results on many technological and scientific tasks. Yet, the mechanism through which these models automatically select features, or patterns in data, for prediction remains unclear.…

Machine Learning · Computer Science 2023-05-11 Adityanarayanan Radhakrishnan , Daniel Beaglehole , Parthe Pandit , Mikhail Belkin

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

We study the relative power of learning with gradient descent on differentiable models, such as neural networks, versus using the corresponding tangent kernels. We show that under certain conditions, gradient descent achieves small error…

Machine Learning · Computer Science 2021-03-02 Eran Malach , Pritish Kamath , Emmanuel Abbe , Nathan Srebro

How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks

The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works…

Machine Learning · Computer Science 2024-11-06 Mo Zhou , Rong Ge

Gradient Kernel Regression

In this article a surprising result is demonstrated using the neural tangent kernel. This kernel is defined as the inner product of the vector of the gradient of an underlying model evaluated at training points. This kernel is used to…

Artificial Intelligence · Computer Science 2021-04-14 Matt Calder

Approximation and Gradient Descent Training with Neural Networks

It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training…

Machine Learning · Computer Science 2024-05-21 G. Welper

The Connection Between Approximation, Depth Separation and Learnability in Neural Networks

Several recent works have shown separation results between deep neural networks, and hypothesis classes with inferior approximation capacity such as shallow networks or kernel classes. On the other hand, the fact that deep networks can…

Machine Learning · Computer Science 2021-07-20 Eran Malach , Gilad Yehudai , Shai Shalev-Shwartz , Ohad Shamir

Iteratively reweighted kernel machines efficiently learn sparse functions

The impressive practical performance of neural networks is often attributed to their ability to learn low-dimensional data representations and hierarchical structure directly from data. In this work, we argue that these two phenomena are…

Machine Learning · Statistics 2025-10-06 Libin Zhu , Damek Davis , Dmitriy Drusvyatskiy , Maryam Fazel

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

Learning Explicit Deep Representations from Deep Kernel Networks

Deep kernel learning aims at designing nonlinear combinations of multiple standard elementary kernels by training deep networks. This scheme has proven to be effective, but intractable when handling large-scale datasets especially when the…

Computer Vision and Pattern Recognition · Computer Science 2018-05-01 Mingyuan Jiu , Hichem Sahbi

Learning Longer Memory in Recurrent Neural Networks

Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due…

Neural and Evolutionary Computing · Computer Science 2015-04-20 Tomas Mikolov , Armand Joulin , Sumit Chopra , Michael Mathieu , Marc'Aurelio Ranzato

How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model

Deep learning algorithms demonstrate a surprising ability to learn high-dimensional tasks from limited examples. This is commonly attributed to the depth of neural networks, enabling them to build a hierarchy of abstract, low-dimensional…

Machine Learning · Computer Science 2024-07-04 Francesco Cagnetta , Leonardo Petrini , Umberto M. Tomasini , Alessandro Favero , Matthieu Wyart

Neural networks with differentiable structure

While gradient descent has proven highly successful in learning connection weights for neural networks, the actual structure of these networks is usually determined by hand, or by other optimization algorithms. Here we describe a simple…

Neural and Evolutionary Computing · Computer Science 2016-08-09 Thomas Miconi

Deep Multiple Kernel Learning

Deep learning methods have predominantly been applied to large artificial neural networks. Despite their state-of-the-art performance, these large networks typically do not generalize well to datasets with limited sample sizes. In this…

Machine Learning · Statistics 2016-11-17 Eric Strobl , Shyam Visweswaran

Gradients as Features for Deep Representation Learning

We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Specifically, we propose to explore gradient-based features. These features are gradients of the…

Machine Learning · Computer Science 2020-04-14 Fangzhou Mu , Yingyu Liang , Yin Li

Universality of Gradient Descent Neural Network Training

It has been observed that design choices of neural networks are often crucial for their successful optimization. In this article, we therefore discuss the question if it is always possible to redesign a neural network so that it trains well…

Machine Learning · Computer Science 2020-07-28 G. Welper