Related papers: Graph-based Document Structure Analysis

A Graphical Approach to Document Layout Analysis

Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert…

Machine Learning · Computer Science 2023-08-07 Jilin Wang , Michael Krumdick , Baojia Tong , Hamima Halim , Maxim Sokolov , Vadym Barda , Delphine Vendryes , Chris Tanner

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

Document structure analysis (aka document layout analysis) is crucial for understanding the physical layout and logical structure of documents, with applications in information retrieval, document summarization, knowledge extraction, etc.…

Computer Vision and Pattern Recognition · Computer Science 2024-03-29 Jiawei Wang , Kai Hu , Zhuoyao Zhong , Lei Sun , Qiang Huo

Cross-Domain Document Layout Analysis Using Document Style Guide

The document layout analysis (DLA) aims to decompose document images into high-level semantic areas (i.e., figures, tables, texts, and background). Creating a DLA framework with strong generalization capabilities is a challenge due to…

Computer Vision and Pattern Recognition · Computer Science 2024-07-24 Xingjiao Wu , Luwei Xiao , Xiangcheng Du , Yingbin Zheng , Xin Li , Tianlong Ma , Cheng Jin , Liang He

UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis

Document structure analysis, aka document layout analysis, is crucial for understanding both the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-27 Jiawei Wang , Kai Hu , Qiang Huo

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on computer…

Computer Vision and Pattern Recognition · Computer Science 2022-09-20 Siwen Luo , Yihao Ding , Siqu Long , Josiah Poon , Soyeon Caren Han

Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs

The automatic analysis of document layouts in digital-born PDF documents remains a challenging problem due to the heterogeneous arrangement of textual and nontextual elements and the imprecision of the textual metadata in the Portable…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Miguel Lopez-Duran , Julian Fierrez , Aythami Morales , Ruben Tolosana , Oscar Delgado-Mohatar , Alvaro Ortigosa

DLAFormer: An End-to-End Transformer For Document Layout Analysis

Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. However, previous studies have typically…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Jiawei Wang , Kai Hu , Qiang Huo

Enhancing Visually-Rich Document Understanding via Layout Structure Modeling

In recent years, the use of multi-modal pre-trained Transformers has led to significant advancements in visually-rich document understanding. However, existing models have mainly focused on features such as text and vision while neglecting…

Computation and Language · Computer Science 2023-08-16 Qiwei Li , Zuchao Li , Xiantao Cai , Bo Du , Hai Zhao

Document Layout Annotation: Database and Benchmark in the Domain of Public Affairs

Every day, thousands of digital documents are generated with useful information for companies, public organizations, and citizens. Given the impossibility of processing them manually, the automatic processing of these documents is becoming…

Information Retrieval · Computer Science 2023-09-06 Alejandro Peña , Aythami Morales , Julian Fierrez , Javier Ortega-Garcia , Marcos Grande , Iñigo Puente , Jorge Cordova , Gonzalo Cordova

DocGraphLM: Documental Graph Language Model for Information Extraction

Advances in Visually Rich Document Understanding (VrDU) have enabled information extraction and question answering over documents with complex layouts. Two tropes of architectures have emerged -- transformer-based models inspired by LLMs,…

Computation and Language · Computer Science 2024-01-08 Dongsheng Wang , Zhiqiang Ma , Armineh Nourbakhsh , Kang Gu , Sameena Shah

The COTe score: A decomposable framework for evaluating Document Layout Analysis models

Document Layout analysis (DLA), is the process by which a page is parsed into meaningful elements, often using machine learning models. Typically, the quality of a model is judged using general object detection metrics such as IoU, F1 or…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Jonathan Bourne , Mwiza Simbeye , Ishtar Govia

HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document Structures

The problem of document structure reconstruction refers to converting digital or scanned documents into corresponding semantic structures. Most existing works mainly focus on splitting the boundary of each element in a single document page,…

Computation and Language · Computer Science 2023-03-27 Jiefeng Ma , Jun Du , Pengfei Hu , Zhenrong Zhang , Jianshu Zhang , Huihui Zhu , Cong Liu

DocBank: A Benchmark Dataset for Document Layout Analysis

Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are…

Computation and Language · Computer Science 2020-11-12 Minghao Li , Yiheng Xu , Lei Cui , Shaohan Huang , Furu Wei , Zhoujun Li , Ming Zhou

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Document parsing (DP) transforms unstructured or semi-structured documents into structured, machine-readable representations, enabling downstream applications such as knowledge base construction and retrieval-augmented generation (RAG).…

Multimedia · Computer Science 2026-04-07 Qintong Zhang , Bin Wang , Victor Shea-Jay Huang , Junyuan Zhang , Zhengren Wang , Hao Liang , Conghui He , Wentao Zhang

HybriDLA: Hybrid Generation for Document Layout Analysis

Conventional document layout analysis (DLA) traditionally depends on empirical priors or a fixed set of learnable queries executed in a single forward pass. While sufficient for early-generation documents with a small, predetermined number…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Yufan Chen , Omar Moured , Ruiping Liu , Junwei Zheng , Kunyu Peng , Jiaming Zhang , Rainer Stiefelhagen

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document…

Computer Vision and Pattern Recognition · Computer Science 2023-06-07 Yihao Ding , Siwen Luo , Hyunsuk Chung , Soyeon Caren Han

Multimodal Pre-training Based on Graph Attention Network for Document Understanding

Document intelligence as a relatively new research topic supports many business applications. Its main task is to automatically read, understand, and analyze documents. However, due to the diversity of formats (invoices, reports, forms,…

Computer Vision and Pattern Recognition · Computer Science 2022-10-25 Zhenrong Zhang , Jiefeng Ma , Jun Du , Licheng Wang , Jianshu Zhang

A Scalable Document-based Architecture for Text Analysis

Analyzing textual data is a very challenging task because of the huge volume of data generated daily. Fundamental issues in text analysis include the lack of structure in document datasets, the need for various preprocessing steps %(e.g.,…

Databases · Computer Science 2016-12-20 Ciprian-Octavian Truică , Jérôme Darmont , Julien Velcin

RoDLA: Benchmarking the Robustness of Document Layout Analysis Models

Before developing a Document Layout Analysis (DLA) model in real-world applications, conducting comprehensive robustness testing is essential. However, the robustness of DLA models remains underexplored in the literature. To address this,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Yufan Chen , Jiaming Zhang , Kunyu Peng , Junwei Zheng , Ruiping Liu , Philip Torr , Rainer Stiefelhagen

Rethinking Graph-Based Document Classification: Learning Data-Driven Structures Beyond Heuristic Approaches

In document classification, graph-based models effectively capture document structure, overcoming sequence length limitations and enhancing contextual understanding. However, most existing graph document representations rely on heuristics,…

Computation and Language · Computer Science 2025-08-05 Margarita Bugueño , Gerard de Melo