Transformer Neural Processes - Kernel Regression
Abstract
Neural Processes (NPs) are a rapidly evolving class of models designed to directly model the posterior predictive distribution of stochastic processes. Originally developed as a scalable alternative to Gaussian Processes (GPs), which are limited by runtime complexity, the most accurate modern NPs can often rival GPs but still suffer from an bottleneck due to their attention mechanism. We introduce the Transformer Neural Process - Kernel Regression (TNP-KR), a scalable NP featuring: (1) a Kernel Regression Block (KRBlock), a simple, extensible, and parameter efficient transformer block with complexity , where and are the number of context and test points, respectively; (2) a kernel-based attention bias; and (3) two novel attention mechanisms: scan attention (SA), a memory-efficient scan-based attention that when paired with a kernel-based bias can make TNP-KR translation invariant, and deep kernel attention (DKA), a Performer-style attention that implicitly incoporates a distance bias and further reduces complexity to . These enhancements enable both TNP-KR variants to perform inference with 100K context points on over 1M test points in under a minute on a single 24GB GPU. On benchmarks spanning meta regression, Bayesian optimization, image completion, and epidemiology, TNP-KR with DKA outperforms its Performer counterpart on nearly every benchmark, while TNP-KR with SA achieves state-of-the-art results.
Cite
@article{arxiv.2411.12502,
title = {Transformer Neural Processes - Kernel Regression},
author = {Daniel Jenson and Jhonathan Navott and Mengyan Zhang and Makkunda Sharma and Elizaveta Semenova and Seth Flaxman},
journal= {arXiv preprint arXiv:2411.12502},
year = {2026}
}
Comments
This was superseded by 'Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes' (arXiv:2506.09163)