Related papers: On Learning the Transformer Kernel

Kernel Transform Learning

This work proposes kernel transform learning. The idea of dictionary learning is well known; it is a synthesis formulation where a basis is learnt along with the coefficients so as to generate or synthesize the data. Transform learning is…

Computer Vision and Pattern Recognition · Computer Science 2020-08-10 Jyoti Maggu , Angshul Majumdar

Transfer Learning with Kernel Methods

Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple machine learning models that are competitive on a variety of tasks, it…

Machine Learning · Computer Science 2022-11-02 Adityanarayanan Radhakrishnan , Max Ruiz Luyten , Neha Prasad , Caroline Uhler

Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel

Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the…

Machine Learning · Computer Science 2019-11-13 Yao-Hung Hubert Tsai , Shaojie Bai , Makoto Yamada , Louis-Philippe Morency , Ruslan Salakhutdinov

Automated Spectral Kernel Learning

The generalization performance of kernel methods is largely determined by the kernel, but common kernels are stationary thus input-independent and output-independent, that limits their applications on complicated tasks. In this paper, we…

Machine Learning · Computer Science 2023-08-30 Jian Li , Yong Liu , Weiping Wang

Adaptive Deep Kernel Learning

Deep kernel learning provides an elegant and principled framework for combining the structural properties of deep learning algorithms with the flexibility of kernel methods. By means of a deep neural network, we learn a parametrized kernel…

Machine Learning · Computer Science 2020-12-14 Prudencio Tossou , Basile Dura , Francois Laviolette , Mario Marchand , Alexandre Lacoste

Spectraformer: A Unified Random Feature Framework for Transformer

Linearization of attention using various kernel approximation and kernel learning techniques has shown promise. Past methods used a subset of combinations of component functions and weight matrices within the random feature paradigm. We…

Machine Learning · Computer Science 2025-09-24 Duke Nguyen , Du Yin , Aditya Joshi , Flora Salim

Metric and Kernel Learning using a Linear Transformation

Metric and kernel learning are important in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are…

Machine Learning · Computer Science 2009-11-02 Prateek Jain , Brian Kulis , Jason V. Davis , Inderjit S. Dhillon

Deep Kernel Learning

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a…

Machine Learning · Computer Science 2015-11-09 Andrew Gordon Wilson , Zhiting Hu , Ruslan Salakhutdinov , Eric P. Xing

A la Carte - Learning Fast Kernels

Kernel methods have great promise for learning rich statistical representations of large modern datasets. However, compared to neural networks, kernel methods have been perceived as lacking in scalability and flexibility. We introduce a…

Machine Learning · Computer Science 2014-12-22 Zichao Yang , Alexander J. Smola , Le Song , Andrew Gordon Wilson

KernelNet: A Data-Dependent Kernel Parameterization for Deep Generative Modeling

Learning with kernels is an important concept in machine learning. Standard approaches for kernel methods often use predefined kernels that require careful selection of hyperparameters. To mitigate this burden, we propose in this paper a…

Machine Learning · Computer Science 2020-06-26 Yufan Zhou , Changyou Chen , Jinhui Xu

Linear Self-Attention Approximation via Trainable Feedforward Kernel

In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approaches -- models attaining sub-quadratic attention complexity can utilize a notion of sparsity or a low-rank approximation of inputs to reduce…

Machine Learning · Computer Science 2022-11-09 Uladzislau Yorsh , Alexander Kovalenko

Kernels for sequentially ordered data

We present a novel framework for kernel learning with sequential data of any kind, such as time series, sequences of graphs, or strings. Our approach is based on signature features which can be seen as an ordered variant of sample…

Machine Learning · Statistics 2016-02-01 Franz J Király , Harald Oberhauser

Transductive Kernels for Gaussian Processes on Graphs

Kernels on graphs have had limited options for node-level problems. To address this, we present a novel, generalized kernel for graphs with node feature data for semi-supervised learning. The kernel is derived from a regularization…

Machine Learning · Computer Science 2022-11-29 Yin-Cong Zhi , Felix L. Opolka , Yin Cheng Ng , Pietro Liò , Xiaowen Dong

Convolutional Spectral Kernel Learning

Recently, non-stationary spectral kernels have drawn much attention, owing to its powerful feature representation ability in revealing long-range correlations and input-dependent characteristics. However, non-stationary spectral kernels are…

Machine Learning · Computer Science 2020-03-02 Jian Li , Yong Liu , Weiping Wang

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for…

Machine Learning · Computer Science 2024-06-06 Yaroslav Aksenov , Nikita Balagansky , Sofia Maria Lo Cicero Vaina , Boris Shaposhnikov , Alexey Gorbatovski , Daniil Gavrilov

Multiple Kernel Representation Learning on Networks

Learning representations of nodes in a low dimensional space is a crucial task with numerous interesting applications in network analysis, including link prediction, node classification, and visualization. Two popular approaches for this…

Social and Information Networks · Computer Science 2022-08-10 Abdulkadir Celikkanat , Yanning Shen , Fragkiskos D. Malliaros

Kernelized Classification in Deep Networks

We propose a kernelized classification layer for deep networks. Although conventional deep networks introduce an abundance of nonlinearity for representation (feature) learning, they almost universally use a linear classifier on the learned…

Machine Learning · Computer Science 2021-03-22 Sadeep Jayasumana , Srikumar Ramalingam , Sanjiv Kumar

Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines

Despite their ubiquity in core AI fields like natural language processing, the mechanics of deep attention-based neural networks like the Transformer model are not fully understood. In this article, we present a new perspective towards…

Machine Learning · Computer Science 2021-06-04 Matthew A. Wright , Joseph E. Gonzalez

Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity

This paper introduces a diagonal adaptive kernel model that dynamically learns kernel eigenvalues and output coefficients simultaneously during training. Unlike fixed-kernel methods tied to the neural tangent kernel theory, the diagonal…

Machine Learning · Computer Science 2025-01-16 Yicheng Li , Qian Lin

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to…

Machine Learning · Computer Science 2021-02-26 Blake Bordelon , Abdulkadir Canatar , Cengiz Pehlevan