English
Related papers

Related papers: On Learning the Transformer Kernel

200 papers

This work proposes kernel transform learning. The idea of dictionary learning is well known; it is a synthesis formulation where a basis is learnt along with the coefficients so as to generate or synthesize the data. Transform learning is…

Computer Vision and Pattern Recognition · Computer Science 2020-08-10 Jyoti Maggu , Angshul Majumdar

Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple machine learning models that are competitive on a variety of tasks, it…

Machine Learning · Computer Science 2022-11-02 Adityanarayanan Radhakrishnan , Max Ruiz Luyten , Neha Prasad , Caroline Uhler

Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the…

Machine Learning · Computer Science 2019-11-13 Yao-Hung Hubert Tsai , Shaojie Bai , Makoto Yamada , Louis-Philippe Morency , Ruslan Salakhutdinov

The generalization performance of kernel methods is largely determined by the kernel, but common kernels are stationary thus input-independent and output-independent, that limits their applications on complicated tasks. In this paper, we…

Machine Learning · Computer Science 2023-08-30 Jian Li , Yong Liu , Weiping Wang

Deep kernel learning provides an elegant and principled framework for combining the structural properties of deep learning algorithms with the flexibility of kernel methods. By means of a deep neural network, we learn a parametrized kernel…

Machine Learning · Computer Science 2020-12-14 Prudencio Tossou , Basile Dura , Francois Laviolette , Mario Marchand , Alexandre Lacoste

Linearization of attention using various kernel approximation and kernel learning techniques has shown promise. Past methods used a subset of combinations of component functions and weight matrices within the random feature paradigm. We…

Machine Learning · Computer Science 2025-09-24 Duke Nguyen , Du Yin , Aditya Joshi , Flora Salim

Metric and kernel learning are important in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are…

Machine Learning · Computer Science 2009-11-02 Prateek Jain , Brian Kulis , Jason V. Davis , Inderjit S. Dhillon

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a…

Machine Learning · Computer Science 2015-11-09 Andrew Gordon Wilson , Zhiting Hu , Ruslan Salakhutdinov , Eric P. Xing

Kernel methods have great promise for learning rich statistical representations of large modern datasets. However, compared to neural networks, kernel methods have been perceived as lacking in scalability and flexibility. We introduce a…

Machine Learning · Computer Science 2014-12-22 Zichao Yang , Alexander J. Smola , Le Song , Andrew Gordon Wilson

Learning with kernels is an important concept in machine learning. Standard approaches for kernel methods often use predefined kernels that require careful selection of hyperparameters. To mitigate this burden, we propose in this paper a…

Machine Learning · Computer Science 2020-06-26 Yufan Zhou , Changyou Chen , Jinhui Xu

In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approaches -- models attaining sub-quadratic attention complexity can utilize a notion of sparsity or a low-rank approximation of inputs to reduce…

Machine Learning · Computer Science 2022-11-09 Uladzislau Yorsh , Alexander Kovalenko

We present a novel framework for kernel learning with sequential data of any kind, such as time series, sequences of graphs, or strings. Our approach is based on signature features which can be seen as an ordered variant of sample…

Machine Learning · Statistics 2016-02-01 Franz J Király , Harald Oberhauser

Kernels on graphs have had limited options for node-level problems. To address this, we present a novel, generalized kernel for graphs with node feature data for semi-supervised learning. The kernel is derived from a regularization…

Machine Learning · Computer Science 2022-11-29 Yin-Cong Zhi , Felix L. Opolka , Yin Cheng Ng , Pietro Liò , Xiaowen Dong

Recently, non-stationary spectral kernels have drawn much attention, owing to its powerful feature representation ability in revealing long-range correlations and input-dependent characteristics. However, non-stationary spectral kernels are…

Machine Learning · Computer Science 2020-03-02 Jian Li , Yong Liu , Weiping Wang

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for…

Learning representations of nodes in a low dimensional space is a crucial task with numerous interesting applications in network analysis, including link prediction, node classification, and visualization. Two popular approaches for this…

Social and Information Networks · Computer Science 2022-08-10 Abdulkadir Celikkanat , Yanning Shen , Fragkiskos D. Malliaros

We propose a kernelized classification layer for deep networks. Although conventional deep networks introduce an abundance of nonlinearity for representation (feature) learning, they almost universally use a linear classifier on the learned…

Machine Learning · Computer Science 2021-03-22 Sadeep Jayasumana , Srikumar Ramalingam , Sanjiv Kumar

Despite their ubiquity in core AI fields like natural language processing, the mechanics of deep attention-based neural networks like the Transformer model are not fully understood. In this article, we present a new perspective towards…

Machine Learning · Computer Science 2021-06-04 Matthew A. Wright , Joseph E. Gonzalez

This paper introduces a diagonal adaptive kernel model that dynamically learns kernel eigenvalues and output coefficients simultaneously during training. Unlike fixed-kernel methods tied to the neural tangent kernel theory, the diagonal…

Machine Learning · Computer Science 2025-01-16 Yicheng Li , Qian Lin

We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to…

Machine Learning · Computer Science 2021-02-26 Blake Bordelon , Abdulkadir Canatar , Cengiz Pehlevan
‹ Prev 1 2 3 10 Next ›