Related papers: Analyzing the discrepancy principle for kernelized…
We investigate the discrepancy principle for choosing smoothing parameters for kernel density estimation. The method is based on the distance between the empirical and estimated distribution functions. We prove some new positive and…
In this paper, we study the problem of early stopping for iterative learning algorithms in a reproducing kernel Hilbert space (RKHS) in the nonparametric regression framework. In particular, we work with the gradient descent and (iterative)…
In this paper, we propose an adaptive stopping rule for kernel-based gradient descent (KGD) algorithms. We introduce the empirical effective dimension to quantify the increments of iterations in KGD and derive an implementable early…
In recent years, new regularization methods based on (deep) neural networks have shown very promising empirical performance for the numerical solution of ill-posed problems, e.g., in medical imaging and imaging science. Due to the…
The strategy of early stopping is a regularization technique based on choosing a stopping time for an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy…
Stochastic gradient descent (SGD) is a promising numerical method for solving large-scale inverse problems. However, its theoretical properties remain largely underexplored in the lens of classical regularization theory. In this note, we…
We consider linear recurrent neural networks, which have become a key building block of sequence modeling due to their ability for stable and effective long-range modeling. In this paper, we aim at characterizing this ability on a simple…
Early stopping of iterative algorithms is an algorithmic regularization method to avoid over-fitting in estimation and classification. In this paper, we show that early stopping can also be applied to obtain the minimax optimal testing in a…
A simple proof of the convergence of the variational regularization, with the regularization parameter, chosen by the discrepancy principle, is given for linear operators under suitable assumptions. It is shown that the discrepancy…
Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability. While various methods have been proposed to speed up their convergence, the model selection…
We provide improved error bounds for kernel-based numerical differentiation in terms of growth functions when kernels are of a finite smoothness, such as polyharmonic splines, thin plate splines or Wendland kernels. In contrast to existing…
There is a recent surge of interest in designing deep architectures based on the update steps in traditional algorithms, or learning neural networks to improve and replace traditional algorithms. While traditional algorithms have certain…
We study nonparametric regression by an over-parameterized two-layer neural network trained by gradient descent (GD) in this paper. We show that, if the neural network is trained by GD with early stopping, then the trained network renders a…
This paper focuses on parameter selection issues of kernel ridge regression (KRR). Due to special spectral properties of KRR, we find that delicate subdivision of the parameter interval shrinks the difference between two successive KRR…
Most normative models in computational neuroscience describe the task of learning as the optimisation of a cost function with respect to a set of parameters. However, learning as optimisation fails to account for a time varying environment…
We study the statistical-computational trade-offs for learning with exact invariances (or symmetries) using kernel regression. Traditional methods, such as data augmentation, group averaging, canonicalization, and frame-averaging, either…
We establish a general form of explicit, input-dependent, measure-valued warpings for learning nonstationary kernels. While stationary kernels are ubiquitous and simple to use, they struggle to adapt to functions that vary in smoothness…
In this note we consider spectral cut-off estimators to solve a statistical linear inverse problem under arbitrary white noise. The truncation level is determined with a recently introduced adaptive method based on the classical discrepancy…
Existing large-dimensional theory for spectral algorithms resolves either the optimally tuned point or the interpolation limit, but leaves the under-regularized regime unexplored. We study the learning curve and benign overfitting of…
This paper investigates the convergence properties of spectral algorithms -- a class of regularization methods originating from inverse problems -- under covariate shift. In this setting, the marginal distributions of inputs differ between…