Related papers: Length-Aware Multi-Kernel Transformer for Long Doc…

Investigating Length Issues in Document-level Machine Translation

Transformer architectures are increasingly effective at processing and generating very long chunks of texts, opening new perspectives for document-level machine translation (MT). In this work, we challenge the ability of MT systems to…

Computation and Language · Computer Science 2025-04-29 Ziqian Peng , Rachel Bawden , François Yvon

Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study

Text classification is an area of research which has been studied over the years in Natural Language Processing (NLP). Adapting NLP to multiple domains has introduced many new challenges for text classification and one of them is long…

Computation and Language · Computer Science 2023-07-20 Damith Premasiri , Tharindu Ranasinghe , Ruslan Mitkov

CoLT5: Faster Long-Range Transformers with Conditional Computation

Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to…

Computation and Language · Computer Science 2023-10-25 Joshua Ainslie , Tao Lei , Michiel de Jong , Santiago Ontañón , Siddhartha Brahma , Yury Zemlyanskiy , David Uthus , Mandy Guo , James Lee-Thorp , Yi Tay , Yun-Hsuan Sung , Sumit Sanghai

Efficient Classification of Long Documents Using Transformers

Several methods have been proposed for classifying long textual documents using Transformers. However, there is a lack of consensus on a benchmark to enable a fair comparison among different approaches. In this paper, we provide a…

Computation and Language · Computer Science 2022-03-23 Hyunji Hayley Park , Yogarshi Vyas , Kashif Shah

Beyond Token Limits: Assessing Language Model Performance on Long Text Classification

The most widely used large language models in the social sciences (such as BERT, and its derivatives, e.g. RoBERTa) have a limitation on the input text length that they can process to produce predictions. This is a particularly pressing…

Computation and Language · Computer Science 2025-09-30 Miklós Sebők , Viktor Kovács , Martin Bánóczy , Daniel Møller Eriksen , Nathalie Neptune , Philippe Roussille

Hierarchical Neural Network Approaches for Long Document Classification

Text classification algorithms investigate the intricate relationships between words or phrases and attempt to deduce the document's interpretation. In the last few years, these algorithms have progressed tremendously. Transformer…

Computation and Language · Computer Science 2022-06-28 Snehal Khandve , Vedangi Wagh , Apurva Wani , Isha Joshi , Raviraj Joshi

Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT

Transformer-based models, specifically BERT, have propelled research in various NLP tasks. However, these models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting…

Computation and Language · Computer Science 2023-11-01 Aman Jaiswal , Evangelos Milios

Transformer-based Models for Long-Form Document Matching: Challenges and Empirical Analysis

Recent advances in the area of long document matching have primarily focused on using transformer-based models for long document encoding and matching. There are two primary challenges associated with these models. Firstly, the performance…

Computation and Language · Computer Science 2023-02-09 Akshita Jha , Adithya Samavedhi , Vineeth Rakesh , Jaideep Chandrashekar , Chandan K. Reddy

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Recent advancements in Large Language Models (LLMs) have pushed the boundaries of natural language processing, especially in long-context understanding. However, the evaluation of these models' long-context abilities remains a challenge due…

Computation and Language · Computer Science 2025-04-24 Cunxiang Wang , Ruoxi Ning , Boqi Pan , Tonghui Wu , Qipeng Guo , Cheng Deng , Guangsheng Bao , Xiangkun Hu , Zheng Zhang , Qian Wang , Yue Zhang

Understanding Long Documents with Different Position-Aware Attentions

Despite several successes in document understanding, the practical task for long document understanding is largely under-explored due to several challenges in computation and how to efficiently absorb long multimodal input. Most current…

Computation and Language · Computer Science 2022-08-18 Hai Pham , Guoxin Wang , Yijuan Lu , Dinei Florencio , Cha Zhang

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

Long-context modeling is becoming a core capability of modern large vision-language models (LVLMs), enabling sustained context management across long-document understanding, video analysis, and multi-turn tool use in agentic workflows. Yet…

Computer Vision and Pattern Recognition · Computer Science 2026-05-14 Zhaowei Wang , Lishu Luo , Haodong Duan , Weiwei Liu , Sijin Wu , Ji Luo , Shen Yan , Shuai Peng , Sihang Yuan , Chaoyi Huang , Yi Lin , Yangqiu Song

Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching

Many natural language processing and information retrieval problems can be formalized as the task of semantic matching. Existing work in this area has been largely focused on matching between short texts (e.g., question answering), or…

Information Retrieval · Computer Science 2021-05-07 Liu Yang , Mingyang Zhang , Cheng Li , Michael Bendersky , Marc Najork

LAWCAT: Efficient Distillation from Quadratic to Linear Attention with Convolution across Tokens for Long Context Modeling

Although transformer architectures have achieved state-of-the-art performance across diverse domains, their quadratic computational complexity with respect to sequence length remains a significant bottleneck, particularly for…

Computation and Language · Computer Science 2025-11-05 Zeyu Liu , Souvik Kundu , Lianghao Jiang , Anni Li , Srikanth Ronanki , Sravan Bodapati , Gourav Datta , Peter A. Beerel

MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages

Multimodal Large Language Models (MLLMs) have achieved great success in Speech-to-Text Translation (S2TT) tasks. However, current research is constrained by two key challenges: language coverage and efficiency. Most of the popular S2TT…

Computation and Language · Computer Science 2026-04-14 Yexing Du , Kaiyuan Liu , Youcheng Pan , Bo Yang , Keqi Deng , Xie Chen , Yang Xiang , Ming Liu , Bing Qin , YaoWei Wang

Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval. Starting from an apparent link between text coherence and…

Computation and Language · Computer Science 2020-01-06 Goran Glavaš , Swapna Somasundaran

A Comparative Study of Pretrained Language Models for Long Clinical Text

Objective: Clinical knowledge enriched transformer models (e.g., ClinicalBERT) have state-of-the-art results on clinical NLP (natural language processing) tasks. One of the core limitations of these transformer models is the substantial…

Computation and Language · Computer Science 2023-01-30 Yikuan Li , Ramsey M. Wehbe , Faraz S. Ahmad , Hanyin Wang , Yuan Luo

Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer

Pre-trained Transformers currently dominate most NLP tasks. They impose, however, limits on the maximum input length (512 sub-words in BERT), which are too restrictive in the legal domain. Even sparse-attention models, such as Longformer…

Computation and Language · Computer Science 2022-11-11 Dimitris Mamakas , Petros Tsotsi , Ion Androutsopoulos , Ilias Chalkidis

Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document Classification

Long document classification presents challenges in capturing both local and global dependencies due to their extensive content and complex structure. Existing methods often struggle with token limits and fail to adequately model…

Computation and Language · Computer Science 2024-10-07 Sudipta Singha Roy , Xindi Wang , Robert E. Mercer , Frank Rudzicz

The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Large, pre-trained transformer models like BERT have achieved state-of-the-art results on document understanding tasks, but most implementations can only consider 512 tokens at a time. For many real-world applications, documents can be much…

Computation and Language · Computer Science 2021-07-20 Allison Hegel , Marina Shah , Genevieve Peaslee , Brendan Roof , Emad Elwany

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents. There are clear benefits to these approaches compared to the original Transformer in terms…

Computation and Language · Computer Science 2022-10-12 Ilias Chalkidis , Xiang Dai , Manos Fergadiotis , Prodromos Malakasiotis , Desmond Elliott