English
Related papers

Related papers: Efficient Long Sequence Encoding via Synchronizati…

200 papers

Although dominant in natural language processing, transformer-based models remain challenged by the task of long-sequence processing, because the computational cost of self-attention operations in transformers swells quadratically with the…

Computation and Language · Computer Science 2024-07-08 Jiawen Xie , Pengyu Cheng , Xiao Liang , Yong Dai , Nan Du

Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured…

Long context inference scenarios have become increasingly important for large language models, yet they introduce significant computational latency. While prior research has optimized long-sequence inference through operators, model…

Computation and Language · Computer Science 2025-11-10 Wei Shao , Lingchao Zheng , Pengyu Wang , Peizhen Zheng , Jun Li , Yuwei Fan

Encoding long sequences in Natural Language Processing (NLP) is a challenging problem. Though recent pretraining language models achieve satisfying performances in many NLP tasks, they are still restricted by a pre-defined maximum length,…

Computation and Language · Computer Science 2023-05-16 Irene Li , Aosong Feng , Dragomir Radev , Rex Ying

Transformers have impressive generalization capabilities on tasks with a fixed context length. However, they fail to generalize to sequences of arbitrary length, even for seemingly simple tasks such as duplicating a string. Moreover, simply…

Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be…

Computation and Language · Computer Science 2023-06-01 Jeremiah Milbauer , Annie Louis , Mohammad Javad Hosseini , Alex Fabrikant , Donald Metzler , Tal Schuster

Large language models (LLMs) based on Transformer have been widely applied in the filed of natural language processing (NLP), demonstrating strong performance, particularly in handling short text tasks. However, when it comes to long…

Computation and Language · Computer Science 2025-07-09 Yijun Liu , Jinzheng Yu , Yang Xu , Zhongyang Li , Qingfu Zhu

Large language models produce powerful text embeddings, but their causal attention mechanism restricts the flow of information from later to earlier tokens, degrading representation quality. While recent methods attempt to solve this by…

Computation and Language · Computer Science 2025-11-20 Xueying Ding , Xingyue Huang , Mingxuan Ju , Liam Collins , Yozen Liu , Leman Akoglu , Neil Shah , Tong Zhao

Transformer models have advanced the state of the art in many Natural Language Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key challenges of…

Efficient parallelization of Large Language Models (LLMs) with long sequences is essential but challenging due to their significant computational and memory demands, particularly stemming from communication bottlenecks in attention…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-31 Zongwu Wang , Fangxin Liu , Mingshuai Li , Li Jiang

We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This approach produces a unique decoder…

Computation and Language · Computer Science 2022-10-19 Hamish Ivison , Matthew E. Peters

Transformer-based sequence-to-sequence architectures, while achieving state-of-the-art results on a large number of NLP tasks, can still suffer from overfitting during training. In practice, this is usually countered either by applying…

Computation and Language · Computer Science 2022-01-04 Dušan Variš , Ondřej Bojar

Enabling large language models (LLMs) to effectively process and reason with graph-structured data remains a significant challenge despite their remarkable success in natural language tasks. Current approaches either convert graph…

Artificial Intelligence · Computer Science 2025-09-03 Yanbiao Ji , Chang Liu , Xin Chen , Dan Luo , Mei Li , Yue Ding , Wenqing Lin , Hongtao Lu

Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents. There are clear benefits to these approaches compared to the original Transformer in terms…

Computation and Language · Computer Science 2022-10-12 Ilias Chalkidis , Xiang Dai , Manos Fergadiotis , Prodromos Malakasiotis , Desmond Elliott

One of the challenges for current sequence to sequence (seq2seq) models is processing long sequences, such as those in summarization and document level machine translation tasks. These tasks require the model to reason at the token level as…

Computation and Language · Computer Science 2021-09-20 Tobias Rohde , Xiaoxia Wu , Yinhan Liu

Tokenization is a fundamental step in natural language processing, breaking text into units that computational models can process. While learned subword tokenizers have become the de-facto standard, they present challenges such as large…

Computation and Language · Computer Science 2025-01-22 Pit Neitemeier , Björn Deiseroth , Constantin Eichenberg , Lukas Balles

Token representation strategies within large-scale neural architectures often rely on contextually refined embeddings, yet conventional approaches seldom encode structured relationships explicitly within token interactions. Self-attention…

Computation and Language · Computer Science 2025-03-27 James Blades , Frederick Somerfield , William Langley , Susan Everingham , Maurice Witherington

Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study the memorization subtask from the point of view of the design…

Machine Learning · Computer Science 2020-02-03 Antonio Carta , Alessandro Sperduti , Davide Bacciu

In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering. Instead of using the original…

Computation and Language · Computer Science 2020-10-09 Shuohang Wang , Yuwei Fang , Siqi Sun , Zhe Gan , Yu Cheng , Jing Jiang , Jingjing Liu

As the demand for processing extended textual data grows, the ability to handle long-range dependencies and maintain computational efficiency is more critical than ever. One of the key issues for long-sequence modeling using attention-based…

Computation and Language · Computer Science 2025-05-26 Aosong Feng , Rex Ying , Leandros Tassiulas
‹ Prev 1 2 3 10 Next ›