Related papers: LayoutParser: A Unified Toolkit for Deep Learning …

Deep Learning in Dental Image Analysis: A Systematic Review of Datasets, Methodologies, and Emerging Challenges

Efficient analysis and processing of dental images are crucial for dentists to achieve accurate diagnosis and optimal treatment planning. However, dental imaging inherently poses several challenges, such as low contrast, metallic artifacts,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Zhenhuan Zhou , Jingbo Zhu , Yuchen Zhang , Xiaohang Guan , Peng Wang , Tao Li

PubLayNet: largest dataset ever for document layout analysis

Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have…

Computation and Language · Computer Science 2019-08-22 Xu Zhong , Jianbin Tang , Antonio Jimeno Yepes

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

Document parsing from scanned images into structured formats remains a significant challenge due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables. Existing supervised fine-tuning methods often…

Computation and Language · Computer Science 2025-10-21 Baode Wang , Biao Wu , Weizhen Li , Meng Fang , Zuming Huang , Jun Huang , Haozhe Wang , Yanjie Liang , Ling Chen , Wei Chu , Yuan Qi

DLSIA: Deep Learning for Scientific Image Analysis

We introduce DLSIA (Deep Learning for Scientific Image Analysis), a Python-based machine learning library that empowers scientists and researchers across diverse scientific domains with a range of customizable convolutional neural network…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Eric J Roberts , Tanny Chavez , Alexander Hexemer , Petrus H. Zwart

Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis

Document AI aims to automatically analyze documents by leveraging natural language processing and computer vision techniques. One of the major tasks of Document AI is document layout analysis, which structures document pages by interpreting…

Computation and Language · Computer Science 2023-08-31 Sotirios Kastanas , Shaomu Tan , Yi He

LayoutReader: Pre-training of Text and Layout for Reading Order Detection

Reading order detection is the cornerstone to understanding visually-rich documents (e.g., receipts and forms). Unfortunately, no existing work took advantage of advanced deep learning models because it is too laborious to annotate a large…

Computation and Language · Computer Science 2021-08-30 Zilong Wang , Yiheng Xu , Lei Cui , Jingbo Shang , Furu Wei

dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model

Document Layout Parsing serves as a critical gateway for Artificial Intelligence (AI) to access and interpret the world's vast stores of structured knowledge. This process,which encompasses layout detection, text recognition, and relational…

Computer Vision and Pattern Recognition · Computer Science 2025-12-18 Yumeng Li , Guang Yang , Hao Liu , Bowen Wang , Colin Zhang

LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding

This paper proposes LayoutLLM, a more flexible document analysis method for understanding imaged documents. Visually Rich Document Understanding tasks, such as document image classification and information extraction, have gained…

Computation and Language · Computer Science 2024-03-22 Masato Fujitake

DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification

Deep learning (DL) has revolutionized the field of document image analysis, showcasing superhuman performance across a diverse set of tasks. However, the inherent black-box nature of deep learning models still presents a significant…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Saifullah Saifullah , Stefan Agne , Andreas Dengel , Sheraz Ahmed

OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning

Document AI has advanced rapidly and is attracting increasing attention. Yet, while most efforts have focused on document layout analysis (DLA), its generative counterpart, layout generation, remains underexplored. Distinct from traditional…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Hengrui Kang , Zhuangcheng Gu , Zhiyuan Zhao , Zichen Wen , Bin Wang , Weijia Li , Conghui He

Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion

We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by…

Computation and Language · Computer Science 2025-01-31 Nikolaos Livathinos , Christoph Auer , Maksym Lysak , Ahmed Nassar , Michele Dolfi , Panos Vagenas , Cesar Berrospi Ramis , Matteo Omenetti , Kasper Dinkla , Yusik Kim , Shubham Gupta , Rafael Teixeira de Lima , Valery Weber , Lucas Morin , Ingmar Meijer , Viktor Kuropiatnyk , Peter W. J. Staar

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

Automated parsing of scanned documents into richly structured, machine-readable formats remains a critical bottleneck in Document AI, as traditional multi-stage pipelines suffer from error propagation and limited adaptability to diverse…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Baode Wang , Biao Wu , Weizhen Li , Meng Fang , Zuming Huang , Jun Huang , Haozhe Wang , Yanjie Liang , Ling Chen , Wei Chu , Yuan Qi

Deep Reader: Information extraction from Document images via relation extraction and Natural Language

Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant…

Computer Vision and Pattern Recognition · Computer Science 2018-12-17 Vishwanath D , Rohit Rahul , Gunjan Sehgal , Swati , Arindam Chowdhury , Monika Sharma , Lovekesh Vig , Gautam Shroff , Ashwin Srinivasan

DocBank: A Benchmark Dataset for Document Layout Analysis

Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are…

Computation and Language · Computer Science 2020-11-12 Minghao Li , Yiheng Xu , Lei Cui , Shaohan Huang , Furu Wei , Zhoujun Li , Ming Zhou

READ: Recursive Autoencoders for Document Layout Generation

Layout is a fundamental component of any graphic design. Creating large varieties of plausible document layouts can be a tedious task, requiring numerous constraints to be satisfied, including local ones relating different semantic elements…

Computer Vision and Pattern Recognition · Computer Science 2020-04-20 Akshay Gadi Patil , Omri Ben-Eliezer , Or Perel , Hadar Averbuch-Elor

UnSupDLA: Towards Unsupervised Document Layout Analysis

Document layout analysis is a key area in document research, involving techniques like text mining and visual analysis. Despite various methods developed to tackle layout analysis, a critical but frequently overlooked problem is the…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Talha Uddin Sheikh , Tahira Shehzadi , Khurram Azeem Hashmi , Didier Stricker , Muhammad Zeshan Afzal

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of…

Computer Vision and Pattern Recognition · Computer Science 2024-04-22 Avinash Anand , Raj Jaiswal , Mohit Gupta , Siddhesh S Bangar , Pijush Bhuyan , Naman Lal , Rajeev Singh , Ritika Jha , Rajiv Ratn Shah , Shin'ichi Satoh

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while…

Computation and Language · Computer Science 2020-06-17 Yiheng Xu , Minghao Li , Lei Cui , Shaohan Huang , Furu Wei , Ming Zhou

DLAFormer: An End-to-End Transformer For Document Layout Analysis

Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. However, previous studies have typically…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Jiawei Wang , Kai Hu , Qiang Huo

DoPTA: Improving Document Layout Analysis using Patch-Text Alignment

The advent of multimodal learning has brought a significant improvement in document AI. Documents are now treated as multimodal entities, incorporating both textual and visual information for downstream analysis. However, works in this…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Nikitha SR , Tarun Ram Menta , Mausoom Sarkar