English
Related papers

Related papers: Designing Robust Transformers using Robust Kernel …

200 papers

We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical…

Machine Learning · Statistics 2011-09-07 JooSeuk Kim , Clayton D. Scott

The self-attention mechanism, a cornerstone of Transformer-based state-of-the-art deep learning architectures, is largely heuristic-driven and fundamentally challenging to interpret. Establishing a robust theoretical foundation to explain…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Laziz U. Abdullaev , Maksim Tkachenko , Tan M. Nguyen

Dot-product attention mechanism plays a crucial role in modern deep architectures (e.g., Transformer) for sequence modeling, however, na\"ive exact computation of this model incurs quadratic time and memory complexities in sequence length,…

Machine Learning · Computer Science 2023-06-30 Amir Zandieh , Insu Han , Majid Daliri , Amin Karbasi

In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approaches -- models attaining sub-quadratic attention complexity can utilize a notion of sparsity or a low-rank approximation of inputs to reduce…

Machine Learning · Computer Science 2022-11-09 Uladzislau Yorsh , Alexander Kovalenko

Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the…

Machine Learning · Computer Science 2019-11-13 Yao-Hung Hubert Tsai , Shaojie Bai , Makoto Yamada , Louis-Philippe Morency , Ruslan Salakhutdinov

While robust parameter estimation has been well studied in parametric density estimation, there has been little investigation into robust density estimation in the nonparametric setting. We present a robust version of the popular kernel…

Machine Learning · Statistics 2014-11-18 Robert A. Vandermeulen , Clayton D. Scott

In this paper, we introduce a robust nonparametric density estimator combining the popular Kernel Density Estimation method and the Median-of-Means principle (MoM-KDE). This estimator is shown to achieve robustness to any kind of anomalous…

Statistics Theory · Mathematics 2020-07-01 Pierre Humbert , Batiste Le Bars , Ludovic Minvielle , Nicolas Vayatis

Recent studies show that Vision Transformers(ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of systematic understanding. In…

Computer Vision and Pattern Recognition · Computer Science 2022-11-09 Daquan Zhou , Zhiding Yu , Enze Xie , Chaowei Xiao , Anima Anandkumar , Jiashi Feng , Jose M. Alvarez

We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data. Our model is trained using a penalized maximum likelihood objective, which ensures that samples from the…

Machine Learning · Computer Science 2020-10-16 Rasool Fakoor , Pratik Chaudhari , Jonas Mueller , Alexander J. Smola

The key to a Transformer model is the self-attention mechanism, which allows the model to analyze an entire sequence in a computationally efficient manner. Recent work has suggested the possibility that general attention mechanisms used by…

Machine Learning · Computer Science 2020-01-01 Thomas Dowdell , Hongyu Zhang

The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most…

Machine Learning · Computer Science 2024-11-01 Rachel S. Y. Teo , Tan M. Nguyen

Robustness verification that aims to formally certify the prediction behavior of neural networks has become an important tool for understanding model behavior and obtaining safety guarantees. However, previous methods can usually only…

Machine Learning · Computer Science 2020-12-24 Zhouxing Shi , Huan Zhang , Kai-Wei Chang , Minlie Huang , Cho-Jui Hsieh

Autoregressive Transformers are strong language models but incur O(T) complexity during per-token generation due to the self-attention mechanism. Recent work proposes kernel-based methods to approximate causal self-attention by replacing it…

Machine Learning · Computer Science 2022-10-11 Huanru Henry Mao

Neural networks have been widely used as predictive models to fit data distribution, and they could be implemented through learning a collection of samples. In many applications, however, the given dataset may contain noisy samples or…

Neural and Evolutionary Computing · Computer Science 2017-05-30 Dianhui Wang , Ming Li

Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP) due to their exceptional performance. However, recent studies show their vulnerability to textual adversarial attacks where the…

Computation and Language · Computer Science 2023-12-01 Lujia Shen , Yuwen Pu , Shouling Ji , Changjiang Li , Xuhong Zhang , Chunpeng Ge , Ting Wang

Transformer-based deep learning models have achieved state-of-the-art performance across numerous language and vision tasks. While the self-attention mechanism, a core component of transformers, has proven capable of handling complex data…

Machine Learning · Computer Science 2025-08-05 Laziz Abdullaev , Tan M. Nguyen

Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic compute and memory requirements with respect to sequence length. Successful approaches…

Machine Learning · Computer Science 2020-10-27 Aurko Roy , Mohammad Saffar , Ashish Vaswani , David Grangier

Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function $f(x,y)$ and a set of points $\{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$, we…

Machine Learning · Computer Science 2024-02-15 Jiehao Liang , Zhao Song , Zhaozhuo Xu , Junze Yin , Danyang Zhuo

This tutorial provides a gentle introduction to kernel density estimation (KDE) and recent advances regarding confidence bands and geometric/topological features. We begin with a discussion of basic properties of KDE: the convergence rate…

Methodology · Statistics 2017-09-13 Yen-Chi Chen

Much of recent Deep Reinforcement Learning success is owed to the neural architecture's potential to learn and use effective internal representations of the world. While many current algorithms access a simulator to train with a large…

Artificial Intelligence · Computer Science 2022-02-03 Amir Ardalan Kalantari , Mohammad Amini , Sarath Chandar , Doina Precup
‹ Prev 1 2 3 10 Next ›