English
Related papers

Related papers: KDEformer: Accelerating Transformers via Kernel De…

200 papers

Recent advances in Transformer architectures have empowered their empirical success in a variety of tasks across different domains. However, existing works mainly focus on predictive accuracy and computational cost, without considering…

Machine Learning · Computer Science 2023-11-09 Xing Han , Tongzheng Ren , Tan Minh Nguyen , Khai Nguyen , Joydeep Ghosh , Nhat Ho

Estimating probability density and its score from samples remains a core problem in generative modeling, Bayesian inference, and kinetic theory. Existing methods are bifurcated: classical kernel density estimators (KDE) generalize across…

Machine Learning · Computer Science 2026-05-29 Vasily Ilin , Peter Sushko , Ranjay Krishna

Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of "fully-attentive"…

Machine Learning · Computer Science 2023-03-16 Carmelo Scribano , Giorgia Franchini , Marco Prato , Marko Bertogna

This paper studies the use of kernel density estimation (KDE) for linear algebraic tasks involving the kernel matrix of a collection of $n$ data points in $\mathbb R^d$. In particular, we improve upon existing algorithms for computing the…

Data Structures and Algorithms · Computer Science 2026-03-05 Rikhav Shah , Sandeep Silwal , Haike Xu

Kernel density estimation (KDE) is one of the most widely used nonparametric density estimation methods. The fact that it is a memory-based method, i.e., it uses the entire training data set for prediction, makes it unsuitable for most…

Machine Learning · Computer Science 2022-08-08 Joseph A. Gallego , Juan F. Osorio , Fabio A. González

Convolutional Neural Networks (CNNs) have dominated computer vision for years, due to its ability in capturing locality and translation invariance. Recently, many vision transformer architectures have been proposed and they show promising…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Pichao Wang , Xue Wang , Fan Wang , Ming Lin , Shuning Chang , Hao Li , Rong Jin

Transformers excel across domains, yet their quadratic attention complexity poses a barrier to scaling. Random-feature attention, as in Performers, can reduce this cost to linear in the sequence length by approximating the softmax kernel…

Machine Learning · Computer Science 2026-03-05 Amirhossein Farzam , Hossein Mobahi , Nolan Andrew Miller , Luke Sernau

Initially introduced as a machine translation model, the Transformer architecture has now become the foundation for modern deep learning architecture, with applications in a wide range of fields, from computer vision to natural language…

Computation and Language · Computer Science 2024-06-21 Martin Courtois , Malte Ostendorff , Leonhard Hennig , Georg Rehm

Machine learning models are increasingly used to predict material properties and accelerate atomistic simulations, but the reliability of their predictions depends on the representativeness of the training data. We present a scalable,…

Chemical Physics · Physics 2025-10-20 Daniel Willimetz , Lukáš Grajciar

Attention mechanism has been crucial for image diffusion models, however, their quadratic computational complexity limits the sizes of images we can process within reasonable time and memory constraints. This paper investigates the…

Computer Vision and Pattern Recognition · Computer Science 2024-05-09 Ethan Smith , Nayan Saxena , Aninda Saha

Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has increased steadily reaching…

Hardware Architecture · Computer Science 2025-01-15 Rya Sanovar , Srikant Bharadwaj , Renee St. Amant , Victor Rühle , Saravan Rajmohan

The Transformer model has been pivotal in advancing fields such as natural language processing, speech recognition, and computer vision. However, a critical limitation of this model is its quadratic computational and memory complexity…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Firas Khader , Omar S. M. El Nahhas , Tianyu Han , Gustav Müller-Franzes , Sven Nebelung , Jakob Nikolas Kather , Daniel Truhn

The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original…

Machine Learning · Computer Science 2021-11-04 Shengjie Luo , Shanda Li , Tianle Cai , Di He , Dinglan Peng , Shuxin Zheng , Guolin Ke , Liwei Wang , Tie-Yan Liu

Transformers have become central to natural language processing and large language models, but their deployment at scale faces three major challenges. First, the attention mechanism requires massive matrix multiplications and frequent…

Hardware Architecture · Computer Science 2026-01-22 Xiaoxuan Yang , Peilin Chen , Tergel Molom-Ochir , Yiran Chen

Transformers have been successfully used in various fields and are becoming the standard tools in computer vision. However, self-attention, a core component of transformers, has a quadratic complexity problem, which limits the use of…

Computer Vision and Pattern Recognition · Computer Science 2022-06-02 Jiuk Hong , Chaehyeon Lee , Soyoun Bang , Heechul Jung

The quadratic complexity of dot-product attention introduced in Transformer remains a fundamental bottleneck impeding the progress of foundation models toward unbounded context lengths. Addressing this challenge, we introduce the Deep…

Machine Learning · Computer Science 2025-09-03 Yifan Zhang

Recently Transformers have provided state-of-the-art performance in sparse matching, crucial to realize high-performance 3D vision applications. Yet, these Transformers lack efficiency due to the quadratic computational complexity of their…

Computer Vision and Pattern Recognition · Computer Science 2022-04-25 Suwichaya Suwanwimolkul , Satoshi Komorita

The attention mechanism is the computational core of modern Transformer architectures, but its quadratic complexity in the input sequence length is the bottleneck for large-scale inference. This has motivated a rapidly growing body of work…

Transformer architecture has been very successful long runner in the field of Deep Learning (DL) and Large Language Models (LLM) because of its powerful attention-based learning and parallel-natured architecture. As the models grow gigantic…

Machine Learning · Computer Science 2026-01-21 Phani Kumar , Nyshadham , Jyothendra Varma , Polisetty V R K , Aditya Rathore

Transformers have been successful in many vision tasks, thanks to their capability of capturing long-range dependency. However, their quadratic computational complexity poses a major obstacle for applying them to vision tasks requiring…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Shitao Tang , Jiahui Zhang , Siyu Zhu , Ping Tan
‹ Prev 1 2 3 10 Next ›