Related papers: Universal Approximation Using Shuffled Linear Mode…
One of the theoretical pillars that sustain certain machine learning models are universal approximation theorems, which prove that they can approximate all functions from a function class to arbitrary precision. Independently, classical…
Motivated by the Central Limit Theorem, in this paper, we study both universal and non-universal simulations of random variables with an arbitrary target distribution $Q_{Y}$ by general mappings, not limited to linear ones (as in the…
The linear model uses the space defined by the input to project the target or desired signal and find the optimal set of model parameters. When the problem is nonlinear, the adaption requires nonlinear models for good performance, but it…
Spatial generalized linear mixed models (SGLMMs) are popular and flexible models for non-Gaussian spatial data. They are useful for spatial interpolations as well as for fitting regression models that account for spatial dependence, and are…
We give sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions…
The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated using a subset of points called…
Data driven models of dynamical systems help planners and controllers to provide more precise and accurate motions. Most model learning algorithms will try to minimize a loss function between the observed data and the model's predictions.…
Linear mixed models (LMMs), which incorporate fixed and random effects, are key tools for analyzing heterogeneous data, such as in personalized medicine. Nowadays, this type of data is increasingly wide, sometimes containing thousands of…
The generalized Gauss-Newton (GGN) approximation is often used to make practical Bayesian deep learning approaches scalable by replacing a second order derivative with a product of first order derivatives. In this paper we argue that the…
Empirical economic research frequently applies maximum likelihood estimation in cases where the likelihood function is analytically intractable. Most of the theoretical literature focuses on maximum simulated likelihood (MSL) estimators,…
Stochastic gradient descent type methods are ubiquitous in machine learning, but they are only applicable to the optimization of differentiable functions. Proximal algorithms are more general and applicable to nonsmooth functions. We…
Gaussian process models are commonly used as emulators for computer experiments. However, developing a Gaussian process emulator can be computationally prohibitive when the number of experimental samples is even moderately large. Local…
This work presents a unified framework that combines global approximations with locally built models to handle challenging nonconvex and nonsmooth composite optimization problems, including cases involving extended real-valued functions. We…
Matrix approximation is a common tool in machine learning for building accurate prediction models for recommendation systems, text mining, and computer vision. A prevalent assumption in constructing matrix approximations is that the…
Language models have emerged as a critical area of focus in artificial intelligence, particularly with the introduction of groundbreaking innovations like ChatGPT. Large-scale Transformer networks have quickly become the leading approach…
Feature selection has been widely used to alleviate compute requirements during training, elucidate model interpretability, and improve model generalizability. We propose SLM -- Sparse Learnable Masks -- a canonical approach for end-to-end…
Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an…
Most machine learning methods assume fixed probability distributions, limiting their applicability in nonstationary real-world scenarios. While continual learning methods address this issue, current approaches often rely on black-box models…
We obtain a new universal approximation theorem for continuous (possibly nonlinear) operators on arbitrary Banach spaces using the Leray-Schauder mapping. Moreover, we introduce and study a method for operator learning in Banach spaces…
The universal approximation property of various machine learning models is currently only understood on a case-by-case basis, limiting the rapid development of new theoretically justified neural network architectures and blurring our…