Related papers: ETC: Encoding Long and Structured Inputs in Transf…

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the…

Computation and Language · Computer Science 2022-05-04 Mandy Guo , Joshua Ainslie , David Uthus , Santiago Ontanon , Jianmo Ni , Yun-Hsuan Sung , Yinfei Yang

Exploring Length Generalization For Transformer-based Speech Enhancement

Transformer network architecture has proven effective in speech enhancement. However, as its core module, self-attention suffers from quadratic complexity, making it infeasible for training on long speech utterances. In practical scenarios,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-10 Qiquan Zhang , Hongxu Zhu , Xinyuan Qian , Eliathamby Ambikairajah , Haizhou Li

Rethinking Text Line Recognition Models

In this paper, we study the problem of text line recognition. Unlike most approaches targeting specific domains such as scene-text or handwritten documents, we investigate the general problem of developing a universal architecture that can…

Computer Vision and Pattern Recognition · Computer Science 2021-04-23 Daniel Hernandez Diaz , Siyang Qin , Reeve Ingle , Yasuhisa Fujii , Alessandro Bissacco

An Empirical Evaluation of Encoder Architectures for Fast Real-Time Long Conversational Understanding

Analyzing long text data such as customer call transcripts is a cost-intensive and tedious task. Machine learning methods, namely Transformers, are leveraged to model agent-customer interactions. Unfortunately, Transformers adhere to…

Computation and Language · Computer Science 2025-02-19 Annamalai Senthilnathan , Kristjan Arumae , Mohammed Khalilia , Zhengzheng Xing , Aaron R. Colak

Contextually Structured Token Dependency Encoding for Large Language Models

Token representation strategies within large-scale neural architectures often rely on contextually refined embeddings, yet conventional approaches seldom encode structured relationships explicitly within token interactions. Self-attention…

Computation and Language · Computer Science 2025-03-27 James Blades , Frederick Somerfield , William Langley , Susan Everingham , Maurice Witherington

Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

Built upon the Transformer, large language models (LLMs) have captured worldwide attention due to their remarkable abilities. Nevertheless, all Transformer-based models including LLMs suffer from a preset length limit and can hardly…

Computation and Language · Computer Science 2024-10-08 Liang Zhao , Xiachong Feng , Xiaocheng Feng , Weihong Zhong , Dongliang Xu , Qing Yang , Hongtao Liu , Bing Qin , Ting Liu

A Simple and Effective Positional Encoding for Transformers

Transformer models are permutation equivariant. To supply the order and type information of the input tokens, position and segment embeddings are usually added to the input. Recent works proposed variations of positional encodings with…

Computation and Language · Computer Science 2021-11-04 Pu-Chin Chen , Henry Tsai , Srinadh Bhojanapalli , Hyung Won Chung , Yin-Wen Chang , Chun-Sung Ferng

End-to-End Test-Time Training for Long Context

We formulate long-context language modeling as a problem in continual learning rather than architecture design. Under this formulation, we only use a standard architecture -- a Transformer with sliding-window attention. However, our model…

Machine Learning · Computer Science 2026-01-01 Arnuv Tandon , Karan Dalal , Xinhao Li , Daniel Koceja , Marcel Rød , Sam Buchanan , Xiaolong Wang , Jure Leskovec , Sanmi Koyejo , Tatsunori Hashimoto , Carlos Guestrin , Jed McCaleb , Yejin Choi , Yu Sun

LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction

Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be…

Computation and Language · Computer Science 2023-06-01 Jeremiah Milbauer , Annie Louis , Mohammad Javad Hosseini , Alex Fabrikant , Donald Metzler , Tal Schuster

Randomized Positional Encodings Boost Length Generalization of Transformers

Transformers have impressive generalization capabilities on tasks with a fixed context length. However, they fail to generalize to sequences of arbitrary length, even for seemingly simple tasks such as duplicating a string. Moreover, simply…

Machine Learning · Computer Science 2023-05-29 Anian Ruoss , Grégoire Delétang , Tim Genewein , Jordi Grau-Moya , Róbert Csordás , Mehdi Bennani , Shane Legg , Joel Veness

How to Mask in Error Correction Code Transformer: Systematic and Double Masking

In communication and storage systems, error correction codes (ECCs) are pivotal in ensuring data reliability. As deep learning's applicability has broadened across diverse domains, there is a growing research focus on neural network-based…

Machine Learning · Computer Science 2023-08-28 Seong-Joon Park , Hee-Youl Kwak , Sang-Hyo Kim , Sunghwan Kim , Yongjune Kim , Jong-Seon No

Enhancing Transformers Through Conditioned Embedded Tokens

Transformers have transformed modern machine learning, driving breakthroughs in computer vision, natural language processing, and robotics. At the core of their success lies the attention mechanism, which enables the modeling of global…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Hemanth Saratchandran , Simon Lucey

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

Efficient Long Sequence Encoding via Synchronization

Pre-trained Transformer models have achieved successes in a wide range of NLP tasks, but are inefficient when dealing with long input sequences. Existing studies try to overcome this challenge via segmenting the long sequence followed by…

Computation and Language · Computer Science 2022-03-16 Xiangyang Mou , Mo Yu , Bingsheng Yao , Lifu Huang

TransCoder: A Neural-Enhancement Framework for Channel Codes

Reliable communication over noisy channels requires the design of specialized error-correcting codes (ECCs) tailored to specific system requirements. Recently, neural network-based decoders have emerged as promising tools for enhancing ECC…

Information Theory · Computer Science 2025-12-01 Anastasiia Kurmukova , Selim F. Yilmaz , Emre Ozfatura , Deniz Gunduz

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Transformer has become ubiquitous in the deep learning field. One of the key ingredients that destined its success is the self-attention mechanism, which allows fully-connected contextual encoding over input tokens. However, despite its…

Computation and Language · Computer Science 2021-06-08 Shuohang Wang , Luowei Zhou , Zhe Gan , Yen-Chun Chen , Yuwei Fang , Siqi Sun , Yu Cheng , Jingjing Liu

Extended Mind Transformers

Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al.,…

Machine Learning · Computer Science 2024-06-05 Phoebe Klett , Thomas Ahle

A Survey on Transformer Context Extension: Approaches and Evaluation

Large language models (LLMs) based on Transformer have been widely applied in the filed of natural language processing (NLP), demonstrating strong performance, particularly in handling short text tasks. However, when it comes to long…

Computation and Language · Computer Science 2025-07-09 Yijun Liu , Jinzheng Yu , Yang Xu , Zhongyang Li , Qingfu Zhu

A Meta-Learning Perspective on Transformers for Causal Language Modeling

The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view…

Machine Learning · Computer Science 2024-03-26 Xinbo Wu , Lav R. Varshney

Condenser: a Pre-training Architecture for Dense Retrieval

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient…

Computation and Language · Computer Science 2021-09-22 Luyu Gao , Jamie Callan