Related papers: LAMBERT: Layout-Aware (Language) Modeling for info…

Improving Information Extraction on Business Documents with Specific Pre-Training Tasks

Transformer-based Language Models are widely used in Natural Language Processing related tasks. Thanks to their pre-training, they have been successfully adapted to Information Extraction in business documents. However, most pre-training…

Computation and Language · Computer Science 2023-09-12 Thibault Douzon , Stefan Duffner , Christophe Garcia , Jérémy Espinas

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics.…

Computation and Language · Computer Science 2021-07-13 Rafał Powalski , Łukasz Borchmann , Dawid Jurkiewicz , Tomasz Dwojak , Michał Pietruszka , Gabriela Pałka

LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding

Document layout comprises both structural and visual (eg. font-sizes) information that is vital but often ignored by machine learning models. The few existing models which do use layout information only consider textual contents, and…

Computation and Language · Computer Science 2021-04-20 Te-Lin Wu , Cheng Li , Mingyang Zhang , Tao Chen , Spurthi Amba Hombaiah , Michael Bendersky

Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration

Building document-grounded dialogue systems have received growing interest as documents convey a wealth of human knowledge and commonly exist in enterprises. Wherein, how to comprehend and retrieve information from documents is a…

Computation and Language · Computer Science 2022-07-15 Zhenyu Zhang , Bowen Yu , Haiyang Yu , Tingwen Liu , Cheng Fu , Jingyang Li , Chengguang Tang , Jian Sun , Yongbin Li

LaTr: Layout-Aware Transformer for Scene-Text VQA

We propose a novel multimodal architecture for Scene Text Visual Question Answering (STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to reason over different modalities. Thus, we first investigate the impact…

Computer Vision and Pattern Recognition · Computer Science 2021-12-28 Ali Furkan Biten , Ron Litman , Yusheng Xie , Srikar Appalaraju , R. Manmatha

Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model…

Computation and Language · Computer Science 2026-02-04 Gaye Colakoglu , Gürkan Solmaz , Jonathan Fürst

Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models

Many business documents processed in modern NLP and IR pipelines are visually rich: in addition to text, their semantics can also be captured by visual traits such as layout, format, and fonts. We study the problem of information extraction…

Computation and Language · Computer Science 2020-05-25 Mengxi Wei , Yifan He , Qiong Zhang

Information Extraction from Visually Rich Documents with Font Style Embeddings

Information extraction (IE) from documents is an intensive area of research with a large set of industrial applications. Current state-of-the-art methods focus on scanned documents with approaches combining computer vision, natural language…

Computation and Language · Computer Science 2022-08-16 Ismail Oussaid , William Vanhuffel , Pirashanth Ratnamogan , Mhamed Hajaiej , Alexis Mathey , Thomas Gilles

DocLLM: A layout-aware generative language model for multimodal document understanding

Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a…

Computation and Language · Computer Science 2024-01-03 Dongsheng Wang , Natraj Raman , Mathieu Sibue , Zhiqiang Ma , Petr Babkin , Simerjot Kaur , Yulong Pei , Armineh Nourbakhsh , Xiaomo Liu

Sort by Structure: Language Model Ranking as Dependency Probing

Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored. The field of Computer Vision has begun to tackle encoder ranking, with promising forays…

Computation and Language · Computer Science 2022-06-13 Max Müller-Eberstein , Rob van der Goot , Barbara Plank

Spatial Information Integration in Small Language Models for Document Layout Generation and Classification

Document layout understanding is a field of study that analyzes the spatial arrangement of information in a document hoping to understand its structure and layout. Models such as LayoutLM (and its subsequent iterations) can understand…

Computation and Language · Computer Science 2025-01-13 Pablo Melendez , Clemens Havas

Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents

This paper introduces a deep learning model tailored for document information analysis, emphasizing document classification, entity relation extraction, and document visual question answering. The proposed model leverages transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 Tofik Ali , Partha Pratim Roy

LAME: Layout Aware Metadata Extraction Approach for Research Articles

The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to…

Machine Learning · Computer Science 2021-12-24 Jongyun Choi , Hyesoo Kong , Hwamook Yoon , Heung-Seon Oh , Yuchul Jung

Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding

Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical in document intelligence as it captures the rich structure semantics within documents. Previous works typically formulated layout reading order as a…

Computation and Language · Computer Science 2024-10-01 Chong Zhang , Yi Tu , Yixi Zhao , Chenshu Yuan , Huan Chen , Yue Zhang , Mingxu Chai , Ya Guo , Huijia Zhu , Qi Zhang , Tao Gui

Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation

Automated resume information extraction is critical for scaling talent acquisition, yet its real-world deployment faces three major challenges: the extreme heterogeneity of resume layouts and content, the high cost and latency of large…

Computation and Language · Computer Science 2025-10-14 Fanwei Zhu , Jinke Yu , Zulong Chen , Ying Zhou , Junhao Ji , Zhibo Yang , Yuxue Zhang , Haoyuan Hu , Zhenghao Liu

The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the…

Information Retrieval · Computer Science 2022-10-18 Minghan Li , Diana Nicoleta Popa , Johan Chagnon , Yagmur Gizem Cinar , Eric Gaussier

Enhancing Visually-Rich Document Understanding via Layout Structure Modeling

In recent years, the use of multi-modal pre-trained Transformers has led to significant advancements in visually-rich document understanding. However, existing models have mainly focused on features such as text and vision while neglecting…

Computation and Language · Computer Science 2023-08-16 Qiwei Li , Zuchao Li , Xiantao Cai , Bo Du , Hai Zhao

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

Structured document understanding has attracted considerable attention and made significant progress recently, owing to its crucial role in intelligent document processing. However, most existing related models can only deal with the…

Computation and Language · Computer Science 2022-03-01 Jiapeng Wang , Lianwen Jin , Kai Ding

DLAFormer: An End-to-End Transformer For Document Layout Analysis

Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. However, previous studies have typically…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Jiawei Wang , Kai Hu , Qiang Huo

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models…

Computation and Language · Computer Science 2022-04-06 Teakgyu Hong , Donghyun Kim , Mingi Ji , Wonseok Hwang , Daehyun Nam , Sungrae Park