Related papers: PubLayNet: largest dataset ever for document layou…

DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

Accurate document layout analysis is a key requirement for high-quality PDF document conversion. With the recent availability of public, large ground-truth datasets such as PubLayNet and DocBank, deep-learning models have proven to be very…

Computer Vision and Pattern Recognition · Computer Science 2022-08-18 Birgit Pfitzmann , Christoph Auer , Michele Dolfi , Ahmed S Nassar , Peter W J Staar

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of…

Computer Vision and Pattern Recognition · Computer Science 2024-04-22 Avinash Anand , Raj Jaiswal , Mohit Gupta , Siddhesh S Bangar , Pijush Bhuyan , Naman Lal , Rajeev Singh , Ritika Jha , Rajiv Ratn Shah , Shin'ichi Satoh

DocBank: A Benchmark Dataset for Document Layout Analysis

Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are…

Computation and Language · Computer Science 2020-11-12 Minghao Li , Yiheng Xu , Lei Cui , Shaohan Huang , Furu Wei , Zhoujun Li , Ming Zhou

Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

In this paper, we introduce a fully convolutional network for the document layout analysis task. While state-of-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Mélodie Boillet , Christopher Kermorvant , Thierry Paquet

M$^{6}$Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis

Document layout analysis is a crucial prerequisite for document understanding, including document retrieval and conversion. Most public datasets currently contain only PDF documents and lack realistic documents. Models trained on these…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Hiuyi Cheng , Peirong Zhang , Sihang Wu , Jiaxin Zhang , Qiyuan Zhu , Zecheng Xie , Jing Li , Kai Ding , Lianwen Jin

Image-based table recognition: data, model, and evaluation

Important information that relates to a specific topic in a document is often organized in tabular format to assist readers with information retrieval and comparison, which may be difficult to provide in natural language. However, tabular…

Computer Vision and Pattern Recognition · Computer Science 2020-03-05 Xu Zhong , Elaheh ShafieiBavani , Antonio Jimeno Yepes

On The State of Data In Computer Vision: Human Annotations Remain Indispensable for Developing Deep Learning Models

High-quality labeled datasets play a crucial role in fueling the development of machine learning (ML), and in particular the development of deep learning (DL). However, since the emergence of the ImageNet dataset and the AlexNet model in…

Computer Vision and Pattern Recognition · Computer Science 2021-08-03 Zeyad Emam , Andrew Kondrich , Sasha Harrison , Felix Lau , Yushi Wang , Aerin Kim , Elliot Branson

IndicDLP: A Foundational Dataset for Multi-Lingual and Multi-Domain Document Layout Parsing

Document layout analysis is essential for downstream tasks such as information retrieval, extraction, OCR, and digitization. However, existing large-scale datasets like PubLayNet and DocBank lack fine-grained region labels and multilingual…

Computer Vision and Pattern Recognition · Computer Science 2025-12-24 Oikantik Nath , Sahithi Kukkala , Mitesh Khapra , Ravi Kiran Sarvadevabhatla

PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction

Document layout analysis is a critical preprocessing step in document intelligence, enabling the detection and localization of structural elements such as titles, text blocks, tables, and formulas. Despite its importance, existing layout…

Computer Vision and Pattern Recognition · Computer Science 2025-03-24 Ting Sun , Cheng Cui , Yuning Du , Yi Liu

A Hybrid Approach for Document Layout Analysis in Document images

Document layout analysis involves understanding the arrangement of elements within a document. This paper navigates the complexities of understanding various elements within document images, such as text, images, tables, and headings. The…

Computer Vision and Pattern Recognition · Computer Science 2024-05-02 Tahira Shehzadi , Didier Stricker , Muhammad Zeshan Afzal

SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Subhajit Maity , Sanket Biswas , Siladittya Manna , Ayan Banerjee , Josep Lladós , Saumik Bhattacharya , Umapada Pal

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various…

Computer Vision and Pattern Recognition · Computer Science 2021-06-22 Zejiang Shen , Ruochen Zhang , Melissa Dell , Benjamin Charles Germain Lee , Jacob Carlson , Weining Li

Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis

Document AI aims to automatically analyze documents by leveraging natural language processing and computer vision techniques. One of the major tasks of Document AI is document layout analysis, which structures document pages by interpreting…

Computation and Language · Computer Science 2023-08-31 Sotirios Kastanas , Shaomu Tan , Yi He

DocumentNet: Bridging the Data Gap in Document Pre-Training

Document understanding tasks, in particular, Visually-rich Document Entity Retrieval (VDER), have gained significant attention in recent years thanks to their broad applications in enterprise AI. However, publicly available data have been…

Computation and Language · Computer Science 2023-10-27 Lijun Yu , Jin Miao , Xiaoyu Sun , Jiayi Chen , Alexander G. Hauptmann , Hanjun Dai , Wei Wei

Enhancing Visually-Rich Document Understanding via Layout Structure Modeling

In recent years, the use of multi-modal pre-trained Transformers has led to significant advancements in visually-rich document understanding. However, existing models have mainly focused on features such as text and vision while neglecting…

Computation and Language · Computer Science 2023-08-16 Qiwei Li , Zuchao Li , Xiantao Cai , Bo Du , Hai Zhao

UnSupDLA: Towards Unsupervised Document Layout Analysis

Document layout analysis is a key area in document research, involving techniques like text mining and visual analysis. Despite various methods developed to tackle layout analysis, a critical but frequently overlooked problem is the…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Talha Uddin Sheikh , Tahira Shehzadi , Khurram Azeem Hashmi , Didier Stricker , Muhammad Zeshan Afzal

LayoutReader: Pre-training of Text and Layout for Reading Order Detection

Reading order detection is the cornerstone to understanding visually-rich documents (e.g., receipts and forms). Unfortunately, no existing work took advantage of advanced deep learning models because it is too laborious to annotate a large…

Computation and Language · Computer Science 2021-08-30 Zilong Wang , Yiheng Xu , Lei Cui , Jingbo Shang , Furu Wei

Vision-Based Layout Detection from Scientific Literature using Recurrent Convolutional Neural Networks

We present an approach for adapting convolutional neural networks for object recognition and classification to scientific literature layout detection (SLLD), a shared subtask of several information extraction problems. Scientific…

Computer Vision and Pattern Recognition · Computer Science 2020-10-23 Huichen Yang , William H. Hsu

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while…

Computation and Language · Computer Science 2020-06-17 Yiheng Xu , Minghao Li , Lei Cui , Shaohan Huang , Furu Wei , Ming Zhou

Table Detection in the Wild: A Novel Diverse Table Detection Dataset and Method

Recent deep learning approaches in table detection achieved outstanding performance and proved to be effective in identifying document layouts. Currently, available table detection benchmarks have many limitations, including the lack of…

Computer Vision and Pattern Recognition · Computer Science 2023-12-01 Mrinal Haloi , Shashank Shekhar , Nikhil Fande , Siddhant Swaroop Dash , Sanjay G