Related papers: A Simple Algorithm For Scaling Up Kernel Methods
Recent research shows that the following two models are equivalent: (a) infinitely wide neural networks (NNs) trained under l2 loss by gradient descent with infinitesimally small learning rate (b) kernel regression with respect to so-called…
In this work, we investigate the generalization properties of random feature methods. Our analysis extends prior results for Tikhonov regularization to a broad class of spectral regularization techniques and further generalizes the setting…
Random feature approximation is arguably one of the most widely used techniques for kernel methods in large-scale learning algorithms. In this work, we analyze the generalization properties of random feature methods, extending previous…
The Neural Tangent Kernel (NTK) has discovered connections between deep neural networks and kernel methods with insights of optimization and generalization. Motivated by this, recent works report that NTK can achieve better performances…
The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide neural networks trained under least squares loss by gradient descent. Recent works also report that NTK regression can outperform finitely-wide neural networks…
Neural kernels have drastically increased performance on diverse and nonstandard data modalities but require significantly more compute, which previously limited their application to smaller datasets. In this work, we address this by…
Kernel methods provide a flexible and theoretically grounded approach to nonlinear and nonparametric learning. While memory and run-time requirements hinder their applicability to large datasets, many low-rank kernel approximations, such as…
Kernel methods are powerful and flexible approach to solve many problems in machine learning. Due to the pairwise evaluations in kernel methods, the complexity of kernel computation grows as the data size increases; thus the applicability…
Random feature approximation is arguably one of the most popular techniques to speed up kernel methods in large scale algorithms and provides a theoretical approach to the analysis of deep neural networks. We analyze generalization…
How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its width --- namely, number of channels in convolutional layers, and number of nodes in fully-connected internal…
The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under least squares loss by gradient descent. However, despite its importance, the super-quadratic runtime of kernel methods limits the use of…
We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural…
Random features is one of the most popular techniques to speed up kernel methods in large-scale problems. Related works have been recognized by the NeurIPS Test-of-Time award in 2017 and the ICML Best Paper Finalist in 2019. The body of…
A seminal work [Jacot et al., 2018] demonstrated that training a neural network under specific parameterization is equivalent to performing a particular kernel method as width goes to infinity. This equivalence opened a promising direction…
We propose a new method for input variable selection in nonlinear regression. The method is embedded into a kernel regression machine that can model general nonlinear functions, not being a priori limited to additive models. This is the…
Kernel methods offer the flexibility to learn complex relationships in modern, large data sets while enjoying strong theoretical guarantees on quality. Unfortunately, these methods typically require cubic running time in the data set size,…
Quantum machine learning (QML) is the spearhead of quantum computer applications. In particular, quantum neural networks (QNN) are actively studied as the method that works both in near-term quantum computers and fault-tolerant quantum…
Recently the use of Noisy Intermediate Scale Quantum (NISQ) devices for machine learning tasks has been proposed. The propositions often perform poorly due to various restrictions. However, the quantum devices should perform well in…
As the size and richness of available datasets grow larger, the opportunities for solving increasingly challenging problems with algorithms learning directly from data grow at the same pace. Consequently, the capability of learning…
Kernel methods augmented with random features give scalable algorithms for learning from big data. But it has been computationally hard to sample random features according to a probability distribution that is optimized for the data, so as…