Related papers: Document Structure in Long Document Transformers

Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning

Recent approaches in literature have exploited the multi-modal information in documents (text, layout, image) to serve specific downstream document tasks. However, they are limited by their - (i) inability to learn cross-modal…

Computation and Language · Computer Science 2022-01-06 Subhojeet Pramanik , Shashank Mujumdar , Hima Patel

StructuralLM: Structural Pre-training for Form Understanding

Large pre-trained language models achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, they almost exclusively focus on text-only representation, while neglecting cell-level layout information that is important…

Computation and Language · Computer Science 2021-05-25 Chenliang Li , Bin Bi , Ming Yan , Wei Wang , Songfang Huang , Fei Huang , Luo Si

A Survey on Long Text Modeling with Transformers

Modeling long texts has been an essential technique in the field of natural language processing (NLP). With the ever-growing number of long documents, it is important to develop effective modeling methods that can process and analyze such…

Computation and Language · Computer Science 2025-06-11 Zican Dong , Tianyi Tang , Junyi Li , Wayne Xin Zhao

StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training

Most state-of-the-art techniques for Language Models (LMs) today rely on transformer-based architectures and their ubiquitous attention mechanism. However, the exponential growth in computational requirements with longer input sequences…

Computation and Language · Computer Science 2024-11-26 Kaustubh Ponkshe , Venkatapathy Subramanian , Natwar Modani , Ganesh Ramakrishnan

Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study

Text classification is an area of research which has been studied over the years in Natural Language Processing (NLP). Adapting NLP to multiple domains has introduced many new challenges for text classification and one of them is long…

Computation and Language · Computer Science 2023-07-20 Damith Premasiri , Tharindu Ranasinghe , Ruslan Mitkov

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while…

Computation and Language · Computer Science 2020-06-17 Yiheng Xu , Minghao Li , Lei Cui , Shaohan Huang , Furu Wei , Ming Zhou

Efficient Classification of Long Documents Using Transformers

Several methods have been proposed for classifying long textual documents using Transformers. However, there is a lack of consensus on a benchmark to enable a fair comparison among different approaches. In this paper, we provide a…

Computation and Language · Computer Science 2022-03-23 Hyunji Hayley Park , Yogarshi Vyas , Kashif Shah

Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning

Pretraining on large, semantically rich datasets is key for developing language models. Surprisingly, recent studies have shown that even synthetic data, generated procedurally through simple semantic-free algorithms, can yield some of the…

Machine Learning · Computer Science 2025-05-29 Zachary Shinnick , Liangze Jiang , Hemanth Saratchandran , Anton van den Hengel , Damien Teney

Long-Range Transformer Architectures for Document Understanding

Since their release, Transformers have revolutionized many fields from Natural Language Understanding to Computer Vision. Document Understanding (DU) was not left behind with first Transformer based models for DU dating from late 2019.…

Computation and Language · Computer Science 2023-09-12 Thibault Douzon , Stefan Duffner , Christophe Garcia , Jérémy Espinas

Learning to Search in Long Documents Using Document Structure

Reading comprehension models are based on recurrent neural networks that sequentially process the document tokens. As interest turns to answering more complex questions over longer documents, sequential reading of large portions of text…

Computation and Language · Computer Science 2018-09-11 Mor Geva , Jonathan Berant

Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling

Transformer is important for text modeling. However, it has difficulty in handling long documents due to the quadratic complexity with input text length. In order to handle this problem, we propose a hierarchical interactive Transformer…

Computation and Language · Computer Science 2021-12-10 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang

Neural Natural Language Processing for Long Texts: A Survey on Classification and Summarization

The adoption of Deep Neural Networks (DNNs) has greatly benefited Natural Language Processing (NLP) during the past decade. However, the demands of long document analysis are quite different from those of shorter texts, while the ever…

Computation and Language · Computer Science 2024-03-18 Dimitrios Tsirmpas , Ioannis Gkionis , Georgios Th. Papadopoulos , Ioannis Mademlis

In-context Pretraining: Language Modeling Beyond Document Boundaries

Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining…

Computation and Language · Computer Science 2024-06-25 Weijia Shi , Sewon Min , Maria Lomeli , Chunting Zhou , Margaret Li , Gergely Szilvasy , Rich James , Xi Victoria Lin , Noah A. Smith , Luke Zettlemoyer , Scott Yih , Mike Lewis

A Comparative Study of Pretrained Language Models for Long Clinical Text

Objective: Clinical knowledge enriched transformer models (e.g., ClinicalBERT) have state-of-the-art results on clinical NLP (natural language processing) tasks. One of the core limitations of these transformer models is the substantial…

Computation and Language · Computer Science 2023-01-30 Yikuan Li , Ramsey M. Wehbe , Faraz S. Ahmad , Hanyin Wang , Yuan Luo

Learning Structured Text Representations

In this paper, we focus on learning structure-aware document representations from data without recourse to a discourse parser or additional annotations. Drawing inspiration from recent efforts to empower neural networks with a structural…

Computation and Language · Computer Science 2018-02-06 Yang Liu , Mirella Lapata

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

Structured document understanding has attracted considerable attention and made significant progress recently, owing to its crucial role in intelligent document processing. However, most existing related models can only deal with the…

Computation and Language · Computer Science 2022-03-01 Jiapeng Wang , Lianwen Jin , Kai Ding

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has…

Machine Learning · Computer Science 2023-09-20 Colin Raffel , Noam Shazeer , Adam Roberts , Katherine Lee , Sharan Narang , Michael Matena , Yanqi Zhou , Wei Li , Peter J. Liu

Structured Attention Matters to Multimodal LLMs in Document Understanding

Document understanding remains a significant challenge for multimodal large language models (MLLMs). While previous research has primarily focused on locating evidence pages through precise multimodal queries, our work investigates a…

Computation and Language · Computer Science 2025-06-30 Chang Liu , Hongkai Chen , Yujun Cai , Hang Wu , Qingwen Ye , Ming-Hsuan Yang , Yiwei Wang

Language Model Pre-training for Hierarchical Document Representations

Hierarchical neural architectures are often used to capture long-distance dependencies and have been applied to many document-level tasks such as summarization, document segmentation, and sentiment analysis. However, effective usage of such…

Computation and Language · Computer Science 2019-01-29 Ming-Wei Chang , Kristina Toutanova , Kenton Lee , Jacob Devlin

The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Large, pre-trained transformer models like BERT have achieved state-of-the-art results on document understanding tasks, but most implementations can only consider 512 tokens at a time. For many real-world applications, documents can be much…

Computation and Language · Computer Science 2021-07-20 Allison Hegel , Marina Shah , Genevieve Peaslee , Brendan Roof , Emad Elwany