English
Related papers

Related papers: LayoutParser: A Unified Toolkit for Deep Learning …

200 papers

Efficient analysis and processing of dental images are crucial for dentists to achieve accurate diagnosis and optimal treatment planning. However, dental imaging inherently poses several challenges, such as low contrast, metallic artifacts,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Zhenhuan Zhou , Jingbo Zhu , Yuchen Zhang , Xiaohang Guan , Peng Wang , Tao Li

Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have…

Computation and Language · Computer Science 2019-08-22 Xu Zhong , Jianbin Tang , Antonio Jimeno Yepes

Document parsing from scanned images into structured formats remains a significant challenge due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables. Existing supervised fine-tuning methods often…

Computation and Language · Computer Science 2025-10-21 Baode Wang , Biao Wu , Weizhen Li , Meng Fang , Zuming Huang , Jun Huang , Haozhe Wang , Yanjie Liang , Ling Chen , Wei Chu , Yuan Qi

We introduce DLSIA (Deep Learning for Scientific Image Analysis), a Python-based machine learning library that empowers scientists and researchers across diverse scientific domains with a range of customizable convolutional neural network…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Eric J Roberts , Tanny Chavez , Alexander Hexemer , Petrus H. Zwart

Document AI aims to automatically analyze documents by leveraging natural language processing and computer vision techniques. One of the major tasks of Document AI is document layout analysis, which structures document pages by interpreting…

Computation and Language · Computer Science 2023-08-31 Sotirios Kastanas , Shaomu Tan , Yi He

Reading order detection is the cornerstone to understanding visually-rich documents (e.g., receipts and forms). Unfortunately, no existing work took advantage of advanced deep learning models because it is too laborious to annotate a large…

Computation and Language · Computer Science 2021-08-30 Zilong Wang , Yiheng Xu , Lei Cui , Jingbo Shang , Furu Wei

Document Layout Parsing serves as a critical gateway for Artificial Intelligence (AI) to access and interpret the world's vast stores of structured knowledge. This process,which encompasses layout detection, text recognition, and relational…

Computer Vision and Pattern Recognition · Computer Science 2025-12-18 Yumeng Li , Guang Yang , Hao Liu , Bowen Wang , Colin Zhang

This paper proposes LayoutLLM, a more flexible document analysis method for understanding imaged documents. Visually Rich Document Understanding tasks, such as document image classification and information extraction, have gained…

Computation and Language · Computer Science 2024-03-22 Masato Fujitake

Deep learning (DL) has revolutionized the field of document image analysis, showcasing superhuman performance across a diverse set of tasks. However, the inherent black-box nature of deep learning models still presents a significant…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Saifullah Saifullah , Stefan Agne , Andreas Dengel , Sheraz Ahmed

Document AI has advanced rapidly and is attracting increasing attention. Yet, while most efforts have focused on document layout analysis (DLA), its generative counterpart, layout generation, remains underexplored. Distinct from traditional…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Hengrui Kang , Zhuangcheng Gu , Zhiyuan Zhao , Zichen Wen , Bin Wang , Weijia Li , Conghui He

We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by…

Automated parsing of scanned documents into richly structured, machine-readable formats remains a critical bottleneck in Document AI, as traditional multi-stage pipelines suffer from error propagation and limited adaptability to diverse…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Baode Wang , Biao Wu , Weizhen Li , Meng Fang , Zuming Huang , Jun Huang , Haozhe Wang , Yanjie Liang , Ling Chen , Wei Chu , Yuan Qi

Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant…

Computer Vision and Pattern Recognition · Computer Science 2018-12-17 Vishwanath D , Rohit Rahul , Gunjan Sehgal , Swati , Arindam Chowdhury , Monika Sharma , Lovekesh Vig , Gautam Shroff , Ashwin Srinivasan

Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are…

Computation and Language · Computer Science 2020-11-12 Minghao Li , Yiheng Xu , Lei Cui , Shaohan Huang , Furu Wei , Zhoujun Li , Ming Zhou

Layout is a fundamental component of any graphic design. Creating large varieties of plausible document layouts can be a tedious task, requiring numerous constraints to be satisfied, including local ones relating different semantic elements…

Computer Vision and Pattern Recognition · Computer Science 2020-04-20 Akshay Gadi Patil , Omri Ben-Eliezer , Or Perel , Hadar Averbuch-Elor

Document layout analysis is a key area in document research, involving techniques like text mining and visual analysis. Despite various methods developed to tackle layout analysis, a critical but frequently overlooked problem is the…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Talha Uddin Sheikh , Tahira Shehzadi , Khurram Azeem Hashmi , Didier Stricker , Muhammad Zeshan Afzal

Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of…

Computer Vision and Pattern Recognition · Computer Science 2024-04-22 Avinash Anand , Raj Jaiswal , Mohit Gupta , Siddhesh S Bangar , Pijush Bhuyan , Naman Lal , Rajeev Singh , Ritika Jha , Rajiv Ratn Shah , Shin'ichi Satoh

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while…

Computation and Language · Computer Science 2020-06-17 Yiheng Xu , Minghao Li , Lei Cui , Shaohan Huang , Furu Wei , Ming Zhou

Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. However, previous studies have typically…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Jiawei Wang , Kai Hu , Qiang Huo

The advent of multimodal learning has brought a significant improvement in document AI. Documents are now treated as multimodal entities, incorporating both textual and visual information for downstream analysis. However, works in this…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Nikitha SR , Tarun Ram Menta , Mausoom Sarkar
‹ Prev 1 2 3 10 Next ›