English
Related papers

Related papers: LAIT: Efficient Multi-Segment Encoding in Transfor…

200 papers

Adjusting the latency, power, and accuracy of natural language understanding models is a desirable objective of an efficient architecture. This paper proposes an efficient Transformer architecture that adjusts the inference computational…

Computation and Language · Computer Science 2024-09-20 Sajjad Kachuee , Mohammad Sharifkhani

Looped Transformers (LT) have emerged as a powerful architecture by iterating their layers multiple times before decoding the final token. However, pairing them with full attention retains quadratic complexity, making them computationally…

Machine Learning · Computer Science 2026-05-26 Chunyuan Deng , Yizhe Zhang , Rui-Jie Zhu , Yuanyuan Xu , Jiarui Liu , T. S. Eugene Ng , Hanjie Chen

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works…

Computer Vision and Pattern Recognition · Computer Science 2021-12-24 Zizheng Pan , Bohan Zhuang , Haoyu He , Jing Liu , Jianfei Cai

We present FIT: a transformer-based architecture with efficient self-attention and adaptive computation. Unlike original transformers, which operate on a single sequence of data tokens, we divide the data tokens into groups, with each group…

Machine Learning · Computer Science 2023-05-26 Ting Chen , Lala Li

Transformer architectures have emerged as promising deep learning (DL) tools for modeling complex sequence-to-sequence interactions in channel decoding. However, current transformer-based decoders for error correction codes (ECCs)…

Signal Processing · Electrical Eng. & Systems 2025-07-22 Hongzhi Zhu , Wei Xu , Xiaohu You

Pre-trained Transformer models have achieved successes in a wide range of NLP tasks, but are inefficient when dealing with long input sequences. Existing studies try to overcome this challenge via segmenting the long sequence followed by…

Computation and Language · Computer Science 2022-03-16 Xiangyang Mou , Mo Yu , Bingsheng Yao , Lifu Huang

Deep Learning architectures, and in particular Transformers, are conventionally viewed as a composition of layers. These layers are actually often obtained as the sum of two contributions: a residual path that copies the input and the…

Although transformer architectures have achieved state-of-the-art performance across diverse domains, their quadratic computational complexity with respect to sequence length remains a significant bottleneck, particularly for…

Computation and Language · Computer Science 2025-11-05 Zeyu Liu , Souvik Kundu , Lianghao Jiang , Anni Li , Srikanth Ronanki , Sravan Bodapati , Gourav Datta , Peter A. Beerel

Transformer models have advanced the state of the art in many Natural Language Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key challenges of…

The Efficient Adaptive Transformer (EAT) framework unifies three adaptive efficiency techniques - progressive token pruning, sparse attention, and dynamic early exiting - into a single, reproducible architecture for input-adaptive…

Computation and Language · Computer Science 2025-10-16 Jan Miller

Transformer-based language models utilize the attention mechanism for substantial performance improvements in almost all natural language processing (NLP) tasks. Similar attention structures are also extensively studied in several other…

Computation and Language · Computer Science 2023-05-17 Nurullah Sevim , Ege Ozan Özyedek , Furkan Şahinuç , Aykut Koç

Diffusion transformers (DiTs) achieve high generative quality but lock FLOPs to image resolution, limiting principled latency-quality trade-offs, and allocate computation uniformly across input spatial tokens, wasting resource allocation to…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Moayed Haji-Ali , Willi Menapace , Ivan Skorokhodov , Dogyun Park , Anil Kag , Michael Vasilkovsky , Sergey Tulyakov , Vicente Ordonez , Aliaksandr Siarohin

Transformer architectures are the backbone of the modern AI revolution. However, they are based on simply stacking the same blocks in dozens of layers and processing information sequentially from one block to another. In this paper, we…

Computation and Language · Computer Science 2024-12-24 Prateek Verma , Mert Pilanci

Incremental processing allows interactive systems to respond based on partial inputs, which is a desirable property e.g. in dialogue agents. The currently popular Transformer architecture inherently processes sequences as a whole,…

Computation and Language · Computer Science 2024-05-03 Patrick Kahardipraja , Brielen Madureira , David Schlangen

Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections…

Machine Learning · Computer Science 2022-12-13 Yaofeng Desmond Zhong , Tongtao Zhang , Amit Chakraborty , Biswadip Dey

Large language models (LLMs) face inherent performance bottlenecks under parameter constraints, particularly in processing critical tokens that demand complex reasoning. Empirical analysis reveals challenging tokens induce abrupt gradient…

Computation and Language · Computer Science 2025-02-25 Yilong Chen , Junyuan Shang , Zhenyu Zhang , Yanxi Xie , Jiawei Sheng , Tingwen Liu , Shuohuan Wang , Yu Sun , Hua Wu , Haifeng Wang

In decoder-based LLMs, the representation of a given layer serves two purposes: as input to the next layer during the computation of the current token; and as input to the attention mechanism of future tokens. In this work, we show that the…

Computation and Language · Computer Science 2024-11-01 Amit Ben-Artzy , Roy Schwartz

Neural machine translation (NMT) takes deterministic sequences for source representations. However, either word-level or subword-level segmentations have multiple choices to split a source sequence with different word segmentors or…

Computation and Language · Computer Science 2019-06-05 Fengshun Xiao , Jiangtong Li , Hai Zhao , Rui Wang , Kehai Chen

While the Self-Attention mechanism in the Transformer model has proven to be effective in many domains, we observe that it is less effective in more diverse settings (e.g. multimodality) due to the varying granularity of each token and the…

Computer Vision and Pattern Recognition · Computer Science 2024-06-06 Wayner Barrios , SouYoung Jin

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT…

‹ Prev 1 2 3 10 Next ›