English
Related papers

Related papers: Key-Value Transformer

200 papers

While CNNs were long considered state of the art for image processing, the introduction of Transformer architectures has challenged this position. While achieving excellent results in image classification and segmentation, Transformers…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 DeShin Hwa , Tobias Holmes , Klaus Drechsler

Transformer, an attention-based encoder-decoder model, has already revolutionized the field of natural language processing (NLP). Inspired by such significant achievements, some pioneering works have recently been done on employing…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Yang Liu , Yao Zhang , Yixin Wang , Feng Hou , Jin Yuan , Jiang Tian , Yang Zhang , Zhongchao Shi , Jianping Fan , Zhiqiang He

The key-value (KV) cache is a primary memory bottleneck in Transformers. We propose Low-Rank Key-Value (LRKV) attention, which reduces KV cache memory by exploiting redundancy across attention heads, while being compute efficient. Each…

Machine Learning · Computer Science 2026-04-09 James O'Neill , Robert Clancy , Mariia Matskevichus , Fergal Reid

Massive transformer-based models face several challenges, including slow and computationally intensive pre-training and over-parametrization. This paper addresses these challenges by proposing a versatile method called GQKVA, which…

In this work, quantum transformers are designed and analysed in detail by extending the state-of-the-art classical transformer neural network architectures known to be very performant in natural language processing and image analysis.…

Window-based attention has become a popular choice in vision transformers due to its superior performance, lower computational complexity, and less memory footprint. However, the design of hand-crafted windows, which is data-agnostic,…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Qiming Zhang , Jing Zhang , Yufei Xu , Dacheng Tao

Transformers have emerged as the underpinning architecture for Large Language Models (LLMs). In generative language models, the inference process involves two primary phases: prompt processing and token generation. Token generation, which…

Machine Learning · Computer Science 2024-04-09 Muhammad Adnan , Akhil Arunkumar , Gaurav Jain , Prashant J. Nair , Ilya Soloveychik , Purushotham Kamath

Transformers have been successfully used in various fields and are becoming the standard tools in computer vision. However, self-attention, a core component of transformers, has a quadratic complexity problem, which limits the use of…

Computer Vision and Pattern Recognition · Computer Science 2022-06-02 Jiuk Hong , Chaehyeon Lee , Soyoun Bang , Heechul Jung

Auto-regressive inference of transformers benefit greatly from Key-Value (KV) caching, but can lead to major memory bottlenecks as model size, batch size, and sequence length grow at scale. We introduce Multi-Layer Key-Value (MLKV) sharing,…

Machine Learning · Computer Science 2024-10-16 Zayd Muhammad Kawakibi Zuhri , Muhammad Farid Adilazuarda , Ayu Purwarianti , Alham Fikri Aji

Starting from first principles and a linguistic perspective centered on part-of-speech (POS) and syntactic analysis, this paper explores and derives the underlying essence of the Query-Key-Value (QKV) mechanism within the Transformer…

Artificial Intelligence · Computer Science 2026-03-18 Zhang Edward

A vision transformer (ViT) is the dominant model in the computer vision field. Despite numerous studies that mainly focus on dealing with inductive bias and complexity, there remains the problem of finding better transformer networks. For…

Computer Vision and Pattern Recognition · Computer Science 2023-05-01 Jaesin Ahn , Jiuk Hong , Jeongwoo Ju , Heechul Jung

The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-11 Axel Berg , Mark O'Connor , Miguel Tairum Cruz

Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory required to store the KV cache can become prohibitive at long sequence…

Machine Learning · Computer Science 2024-05-22 William Brandon , Mayank Mishra , Aniruddha Nrusimha , Rameswar Panda , Jonathan Ragan Kelly

Transformers were initially introduced for natural language processing (NLP) tasks, but fast they were adopted by most deep learning fields, including computer vision. They measure the relationships between pairs of input tokens (words in…

Computer Vision and Pattern Recognition · Computer Science 2023-03-22 Robin Courant , Maika Edberg , Nicolas Dufour , Vicky Kalogeiton

The Key-Value (KV) cache is central to the efficiency of transformer-based large language models (LLMs), storing previously computed vectors to accelerate inference. Yet, as sequence length and batch size grow, the cache becomes a major…

Machine Learning · Computer Science 2025-12-08 Damien Lesens , Beheshteh T. Rakhshan , Guillaume Rabusseau

Transformers are a widespread and successful model architecture, particularly in Natural Language Processing (NLP) and Computer Vision (CV). The essential innovation of this architecture is the Attention Mechanism, which solves the problem…

Machine Learning · Computer Science 2024-11-25 Bernhard Bermeitinger , Tomas Hrycej , Massimo Pavone , Julianus Kath , Siegfried Handschuh

Convolutional Neural Networks (CNNs) have dominated computer vision for years, due to its ability in capturing locality and translation invariance. Recently, many vision transformer architectures have been proposed and they show promising…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Pichao Wang , Xue Wang , Fan Wang , Ming Lin , Shuning Chang , Hao Li , Rong Jin

The dot product attention mechanism, originally designed for natural language processing tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity…

Disordered Systems and Neural Networks · Physics 2025-01-14 Riccardo Rende , Luciano Loris Viteritti

Standard Transformer attention uses identical dimensionality for queries, keys, and values, yet these components serve different roles: queries and keys produce scalar attention weights (selection), while values carry rich representations…

Machine Learning · Computer Science 2026-03-31 Hengshuai Yao , Xing Chen , Ahmed Murtadha , Guan Wang

Since transformer was firstly published in 2017, several works have been proposed to optimize it. However, the major structure of transformer remains unchanged, ignoring one of its main intrinsic limitations, which is the same static value…

Machine Learning · Computer Science 2025-12-30 Xiaowei Wang
‹ Prev 1 2 3 10 Next ›