Related papers: Self-Attention through Kernel-Eigen Pair Sparse Va…

Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

Recently, a new line of works has emerged to understand and improve self-attention in Transformers by treating it as a kernel machine. However, existing works apply the methods for symmetric kernels to the asymmetric self-attention,…

Machine Learning · Computer Science 2023-12-06 Yingyi Chen , Qinghua Tao , Francesco Tonin , Johan A. K. Suykens

Revisiting Kernel Attention with Correlated Gaussian Process Representation

Transformers have increasingly become the de facto method to model sequential data with state-of-the-art performance. Due to its widespread use, being able to estimate and calibrate its modeling uncertainty is important to understand and…

Machine Learning · Computer Science 2025-03-03 Long Minh Bui , Tho Tran Huu , Duy Dinh , Tan Minh Nguyen , Trong Nghia Hoang

Calibrating Transformers via Sparse Gaussian Processes

Transformer models have achieved profound success in prediction tasks in a wide range of applications in natural language processing, speech recognition and computer vision. Extending Transformer's success to safety-critical domains…

Machine Learning · Computer Science 2025-09-11 Wenlong Chen , Yingzhen Li

Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior

Choosing a proper set of kernel functions is an important problem in learning Gaussian Process (GP) models since each kernel structure has different model complexity and data fitness. Recently, automatic kernel composition methods provide…

Machine Learning · Computer Science 2021-02-25 Anh Tong , Toan Tran , Hung Bui , Jaesik Choi

Uncertainty Quantification for Scientific Machine Learning using Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN)

Kolmogorov-Arnold Networks have emerged as interpretable alternatives to traditional multi-layer perceptrons. However, standard implementations lack principled uncertainty quantification capabilities essential for many scientific…

Machine Learning · Computer Science 2025-12-10 Y. Sungtaek Ju

Sparse Gaussian Process Variational Autoencoders

Large, multi-dimensional spatio-temporal datasets are omnipresent in modern science and engineering. An effective framework for handling such data are Gaussian process deep generative models (GP-DGMs), which employ GP priors over the latent…

Machine Learning · Statistics 2020-10-26 Matthew Ashman , Jonathan So , Will Tebbutt , Vincent Fortuin , Michael Pearce , Richard E. Turner

SEEK: Self-adaptive Explainable Kernel For Nonstationary Gaussian Processes

Gaussian processes (GPs) are powerful probabilistic models that define flexible priors over functions, offering strong interpretability and uncertainty quantification. However, GP models often rely on simple, stationary kernels which can…

Machine Learning · Computer Science 2025-05-20 Nima Negarandeh , Carlos Mora , Ramin Bostanabad

A Generalized Stochastic Variational Bayesian Hyperparameter Learning Framework for Sparse Spectrum Gaussian Process Regression

While much research effort has been dedicated to scaling up sparse Gaussian process (GP) models based on inducing variables for big data, little attention is afforded to the other less explored class of low-rank GP approximations that…

Machine Learning · Statistics 2016-11-21 Quang Minh Hoang , Trong Nghia Hoang , Kian Hsiang Low

Scalable and Interpretable Scientific Discovery via Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN)

Kolmogorov-Arnold Networks (KANs) offer a promising alternative to Multi-Layer Perceptron (MLP) by placing learnable univariate functions on network edges, enhancing interpretability. However, standard KANs lack probabilistic outputs,…

Machine Learning · Computer Science 2025-12-02 Y. Sungtaek Ju

Sparse Variational Contaminated Noise Gaussian Process Regression with Applications in Geomagnetic Perturbations Forecasting

Gaussian Processes (GP) have become popular machine-learning methods for kernel-based learning on datasets with complicated covariance structures. In this paper, we present a novel extension to the GP framework using a contaminated normal…

Machine Learning · Computer Science 2024-07-03 Daniel Iong , Matthew McAnear , Yuezhou Qu , Shasha Zou , Gabor Toth , Yang Chen

EigenGP: Sparse Gaussian process models with data-dependent eigenfunctions

Gaussian processes (GPs) provide a nonparametric representation of functions. However, classical GP inference suffers from high computational cost and it is difficult to design nonstationary GP priors in practice. In this paper, we propose…

Machine Learning · Computer Science 2013-03-15 Yuan Qi , Bo Dai , Yao Zhu

Scalable Variational Bayesian Kernel Selection for Sparse Gaussian Process Regression

This paper presents a variational Bayesian kernel selection (VBKS) algorithm for sparse Gaussian process regression (SGPR) models. In contrast to existing GP kernel selection algorithms that aim to select only one kernel with the highest…

Machine Learning · Computer Science 2019-12-06 Tong Teng , Jie Chen , Yehong Zhang , Kian Hsiang Low

Adaptive Kernel Selection for Stein Variational Gradient Descent

A central challenge in Bayesian inference is efficiently approximating posterior distributions. Stein Variational Gradient Descent (SVGD) is a popular variational inference method which transports a set of particles to approximate a target…

Machine Learning · Statistics 2025-12-05 Moritz Melcher , Simon Weissmann , Ashia C. Wilson , Jakob Zech

Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel

Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a…

Machine Learning · Computer Science 2025-01-17 Richard Cornelius Suwandi , Zhidi Lin , Feng Yin , Zhiguo Wang , Sergios Theodoridis

Inverse-Free Sparse Variational Gaussian Processes

Gaussian processes (GPs) offer appealing properties but are costly to train at scale. Sparse variational GP (SVGP) approximations reduce cost yet still rely on Cholesky decompositions of kernel matrices, ill-suited to low-precision,…

Machine Learning · Statistics 2026-04-02 Stefano Cortinovis , Laurence Aitchison , Stefanos Eleftheriadis , Mark van der Wilk

Variable sigma Gaussian processes: An expectation propagation perspective

Gaussian processes (GPs) provide a probabilistic nonparametric representation of functions in regression, classification, and other problems. Unfortunately, exact learning with GPs is intractable for large datasets. A variety of approximate…

Machine Learning · Computer Science 2010-02-23 Yuan Qi , Ahmed H. Abdel-Gawad , Thomas P. Minka

SIKA-GP: Accelerating Gaussian Process Inference with Sparse Inducing Kernel Approximations for Bayesian Deep Learning

Gaussian processes (GPs) provide a principled Bayesian framework for uncertainty estimation, but their computational complexity severely limits scalability to large datasets. We propose SIKA-GP, which accelerates GP inference using sparse…

Machine Learning · Computer Science 2026-05-27 Wenyuan Zhao , Rui Tuo , Chao Tian

Exact Gaussian Processes for Massive Datasets via Non-Stationary Sparsity-Discovering Kernels

A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. This success is largely attributed to the GP's analytical tractability, robustness, non-parametric…

Machine Learning · Statistics 2022-05-19 Marcus M. Noack , Harinarayan Krishnan , Mark D. Risser , Kristofer G. Reyes

Fully Bayesian Differential Gaussian Processes through Stochastic Differential Equations

Deep Gaussian process models typically employ discrete hierarchies, but recent advancements in differential Gaussian processes (DiffGPs) have extended these models to infinite depths. However, existing DiffGP approaches often overlook the…

Machine Learning · Computer Science 2025-12-16 Jian Xu , Zhiqi Lin , Min Chen , Junmei Yang , Delu Zeng , John Paisley

Connections and Equivalences between the Nystr\"om Method and Sparse Variational Gaussian Processes

We investigate the connections between sparse approximation methods for making kernel methods and Gaussian processes (GPs) scalable to large-scale data, focusing on the Nystr\"om method and the Sparse Variational Gaussian Processes (SVGP).…

Machine Learning · Statistics 2023-02-09 Veit Wild , Motonobu Kanagawa , Dino Sejdinovic