Related papers: Designing Robust Transformers using Robust Kernel …

Robust Kernel Density Estimation

We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical…

Machine Learning · Statistics 2011-09-07 JooSeuk Kim , Clayton D. Scott

Revisiting Transformers with Insights from Image Filtering and Boosting

The self-attention mechanism, a cornerstone of Transformer-based state-of-the-art deep learning architectures, is largely heuristic-driven and fundamentally challenging to interpret. Establishing a robust theoretical foundation to explain…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Laziz U. Abdullaev , Maksim Tkachenko , Tan M. Nguyen

KDEformer: Accelerating Transformers via Kernel Density Estimation

Dot-product attention mechanism plays a crucial role in modern deep architectures (e.g., Transformer) for sequence modeling, however, na\"ive exact computation of this model incurs quadratic time and memory complexities in sequence length,…

Machine Learning · Computer Science 2023-06-30 Amir Zandieh , Insu Han , Majid Daliri , Amin Karbasi

Linear Self-Attention Approximation via Trainable Feedforward Kernel

In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approaches -- models attaining sub-quadratic attention complexity can utilize a notion of sparsity or a low-rank approximation of inputs to reduce…

Machine Learning · Computer Science 2022-11-09 Uladzislau Yorsh , Alexander Kovalenko

Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel

Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the…

Machine Learning · Computer Science 2019-11-13 Yao-Hung Hubert Tsai , Shaojie Bai , Makoto Yamada , Louis-Philippe Morency , Ruslan Salakhutdinov

Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space

While robust parameter estimation has been well studied in parametric density estimation, there has been little investigation into robust density estimation in the nonparametric setting. We present a robust version of the popular kernel…

Machine Learning · Statistics 2014-11-18 Robert A. Vandermeulen , Clayton D. Scott

Robust Kernel Density Estimation with Median-of-Means principle

In this paper, we introduce a robust nonparametric density estimator combining the popular Kernel Density Estimation method and the Median-of-Means principle (MoM-KDE). This estimator is shown to achieve robustness to any kind of anomalous…

Statistics Theory · Mathematics 2020-07-01 Pierre Humbert , Batiste Le Bars , Ludovic Minvielle , Nicolas Vayatis

Understanding The Robustness in Vision Transformers

Recent studies show that Vision Transformers(ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of systematic understanding. In…

Computer Vision and Pattern Recognition · Computer Science 2022-11-09 Daquan Zhou , Zhiding Yu , Enze Xie , Chaowei Xiao , Anima Anandkumar , Jiashi Feng , Jose M. Alvarez

TraDE: Transformers for Density Estimation

We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data. Our model is trained using a penalized maximum likelihood objective, which ensures that samples from the…

Machine Learning · Computer Science 2020-10-16 Rasool Fakoor , Pratik Chaudhari , Jonas Mueller , Alexander J. Smola

Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention

The key to a Transformer model is the self-attention mechanism, which allows the model to analyze an entire sequence in a computationally efficient manner. Recent work has suggested the possibility that general attention mechanisms used by…

Machine Learning · Computer Science 2020-01-01 Thomas Dowdell , Hongyu Zhang

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis

The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most…

Machine Learning · Computer Science 2024-11-01 Rachel S. Y. Teo , Tan M. Nguyen

Robustness Verification for Transformers

Robustness verification that aims to formally certify the prediction behavior of neural networks has become an important tool for understanding model behavior and obtaining safety guarantees. However, previous methods can usually only…

Machine Learning · Computer Science 2020-12-24 Zhouxing Shi , Huan Zhang , Kai-Wei Chang , Minlie Huang , Cho-Jui Hsieh

Fine-Tuning Pre-trained Transformers into Decaying Fast Weights

Autoregressive Transformers are strong language models but incur O(T) complexity during per-token generation due to the self-attention mechanism. Recent work proposes kernel-based methods to approximate causal self-attention by replacing it…

Machine Learning · Computer Science 2022-10-11 Huanru Henry Mao

Robust Stochastic Configuration Networks with Kernel Density Estimation

Neural networks have been widely used as predictive models to fit data distribution, and they could be implemented through learning a collection of samples. In many applications, however, the given dataset may contain noisy samples or…

Neural and Evolutionary Computing · Computer Science 2017-05-30 Dianhui Wang , Ming Li

Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention

Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP) due to their exceptional performance. However, recent studies show their vulnerability to textual adversarial attacks where the…

Computation and Language · Computer Science 2023-12-01 Lujia Shen , Yuwen Pu , Shouling Ji , Changjiang Li , Xuhong Zhang , Chunpeng Ge , Ting Wang

Transformer Meets Twicing: Harnessing Unattended Residual Information

Transformer-based deep learning models have achieved state-of-the-art performance across numerous language and vision tasks. While the self-attention mechanism, a core component of transformers, has proven capable of handling complex data…

Machine Learning · Computer Science 2025-08-05 Laziz Abdullaev , Tan M. Nguyen

Efficient Content-Based Sparse Attention with Routing Transformers

Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic compute and memory requirements with respect to sequence length. Successful approaches…

Machine Learning · Computer Science 2020-10-27 Aurko Roy , Mohammad Saffar , Ashish Vaswani , David Grangier

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function $f(x,y)$ and a set of points $\{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$, we…

Machine Learning · Computer Science 2024-02-15 Jiehao Liang , Zhao Song , Zhaozhuo Xu , Junze Yin , Danyang Zhuo

A Tutorial on Kernel Density Estimation and Recent Advances

This tutorial provides a gentle introduction to kernel density estimation (KDE) and recent advances regarding confidence bands and geometric/topological features. We begin with a discussion of basic properties of KDE: the convergence rate…

Methodology · Statistics 2017-09-13 Yen-Chi Chen

Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers

Much of recent Deep Reinforcement Learning success is owed to the neural architecture's potential to learn and use effective internal representations of the world. While many current algorithms access a simulator to train with a large…

Artificial Intelligence · Computer Science 2022-02-03 Amir Ardalan Kalantari , Mohammad Amini , Sarath Chandar , Doina Precup