Related papers: READ: Recursive Autoencoders for Document Layout G…

DocSynthv2: A Practical Autoregressive Modeling for Document Generation

While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge. This paper delves into this advanced domain, proposing a novel…

Computer Vision and Pattern Recognition · Computer Science 2024-06-13 Sanket Biswas , Rajiv Jain , Vlad I. Morariu , Jiuxiang Gu , Puneet Mathur , Curtis Wigington , Tong Sun , Josep Lladós

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

Document parsing from scanned images into structured formats remains a significant challenge due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables. Existing supervised fine-tuning methods often…

Computation and Language · Computer Science 2025-10-21 Baode Wang , Biao Wu , Weizhen Li , Meng Fang , Zuming Huang , Jun Huang , Haozhe Wang , Yanjie Liang , Ling Chen , Wei Chu , Yuan Qi

GRASS: Generative Recursive Autoencoders for Shape Structures

We introduce a novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures. Our key insight is that 3D shapes are effectively characterized by their hierarchical organization of parts, which…

Graphics · Computer Science 2017-05-16 Jun Li , Kai Xu , Siddhartha Chaudhuri , Ersin Yumer , Hao Zhang , Leonidas Guibas

Synthetic Document Generator for Annotation-free Layout Recognition

Analyzing the layout of a document to identify headers, sections, tables, figures etc. is critical to understanding its content. Deep learning based approaches for detecting the layout structure of document images have been promising.…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Natraj Raman , Sameena Shah , Manuela Veloso

LayoutReader: Pre-training of Text and Layout for Reading Order Detection

Reading order detection is the cornerstone to understanding visually-rich documents (e.g., receipts and forms). Unfortunately, no existing work took advantage of advanced deep learning models because it is too laborious to annotate a large…

Computation and Language · Computer Science 2021-08-30 Zilong Wang , Yiheng Xu , Lei Cui , Jingbo Shang , Furu Wei

DREAM: Document Reconstruction via End-to-end Autoregressive Model

Document reconstruction constitutes a significant facet of document analysis and recognition, a field that has been progressively accruing interest within the scholarly community. A multitude of these researchers employ an array of document…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Xin Li , Mingming Gong , Yunfei Wu , Jianxin Dai , Antai Guo , Xinghua Jiang , Haoyu Cao , Yinsong Liu , Deqiang Jiang , Xing Sun

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

Automated parsing of scanned documents into richly structured, machine-readable formats remains a critical bottleneck in Document AI, as traditional multi-stage pipelines suffer from error propagation and limited adaptability to diverse…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Baode Wang , Biao Wu , Weizhen Li , Meng Fang , Zuming Huang , Jun Huang , Haozhe Wang , Yanjie Liang , Ling Chen , Wei Chu , Yuan Qi

A Hybrid Approach for Document Layout Analysis in Document images

Document layout analysis involves understanding the arrangement of elements within a document. This paper navigates the complexities of understanding various elements within document images, such as text, images, tables, and headings. The…

Computer Vision and Pattern Recognition · Computer Science 2024-05-02 Tahira Shehzadi , Didier Stricker , Muhammad Zeshan Afzal

Vision-Based Layout Detection from Scientific Literature using Recurrent Convolutional Neural Networks

We present an approach for adapting convolutional neural networks for object recognition and classification to scientific literature layout detection (SLLD), a shared subtask of several information extraction problems. Scientific…

Computer Vision and Pattern Recognition · Computer Science 2020-10-23 Huichen Yang , William H. Hsu

Enhancing Visually-Rich Document Understanding via Layout Structure Modeling

In recent years, the use of multi-modal pre-trained Transformers has led to significant advancements in visually-rich document understanding. However, existing models have mainly focused on features such as text and vision while neglecting…

Computation and Language · Computer Science 2023-08-16 Qiwei Li , Zuchao Li , Xiantao Cai , Bo Du , Hai Zhao

LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation

Controllable layout generation aims to create plausible visual arrangements of element bounding boxes within a graphic design according to certain optional constraints, such as the type or position of a specific component. While recent…

Computer Vision and Pattern Recognition · Computer Science 2025-06-04 Yuxuan Wu , Le Wang , Sanping Zhou , Mengnan Liu , Gang Hua , Haoxiang Li

DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to…

Computer Vision and Pattern Recognition · Computer Science 2021-07-07 Sanket Biswas , Pau Riba , Josep Lladós , Umapada Pal

Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness

While large language models (LLMs) demonstrate impressive capabilities, their reliance on parametric knowledge often leads to factual inaccuracies. Retrieval-Augmented Generation (RAG) mitigates this by leveraging external documents, yet…

Computation and Language · Computer Science 2025-10-07 Lingnan Xu , Chong Feng , Kaiyuan Zhang , Liu Zhengyong , Wenqiang Xu , Fanqing Meng

Robust PDF Document Conversion Using Recurrent Neural Networks

The number of published PDF documents has increased exponentially in recent decades. There is a growing need to make their rich content discoverable to information retrieval tools. In this paper, we present a novel approach to document…

Machine Learning · Computer Science 2021-02-19 Nikolaos Livathinos , Cesar Berrospi , Maksym Lysak , Viktor Kuropiatnyk , Ahmed Nassar , Andre Carvalho , Michele Dolfi , Christoph Auer , Kasper Dinkla , Peter Staar

LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding

Question answering over visually rich documents (VRDs) requires reasoning not only over isolated content but also over documents' structural organization and cross-page dependencies. However, conventional retrieval-augmented generation…

Computation and Language · Computer Science 2026-03-03 Zhivar Sourati , Zheng Wang , Marianne Menglin Liu , Yazhe Hu , Mengqing Guo , Sujeeth Bharadwaj , Kyu Han , Tao Sheng , Sujith Ravi , Morteza Dehghani , Dan Roth

Spatial Information Integration in Small Language Models for Document Layout Generation and Classification

Document layout understanding is a field of study that analyzes the spatial arrangement of information in a document hoping to understand its structure and layout. Models such as LayoutLM (and its subsequent iterations) can understand…

Computation and Language · Computer Science 2025-01-13 Pablo Melendez , Clemens Havas

SG-VAE: Scene Grammar Variational Autoencoder to generate new indoor scenes

Deep generative models have been used in recent years to learn coherent latent representations in order to synthesize high-quality images. In this work, we propose a neural network to learn a generative model for sampling consistent indoor…

Computer Vision and Pattern Recognition · Computer Science 2020-08-24 Pulak Purkait , Christopher Zach , Ian Reid

A Deep Generative Model for Graph Layout

Different layouts can characterize different aspects of the same graph. Finding a "good" layout of a graph is thus an important task for graph visualization. In practice, users often visualize a graph in multiple layouts by using different…

Social and Information Networks · Computer Science 2019-10-16 Oh-Hyun Kwon , Kwan-Liu Ma

LayerD: Decomposing Raster Graphic Designs into Layers

Designers craft and edit graphic designs in a layer representation, but layer-based editing becomes impossible once composited into a raster image. In this work, we propose LayerD, a method to decompose raster graphic designs into layers…

Graphics · Computer Science 2025-09-30 Tomoyuki Suzuki , Kang-Jun Liu , Naoto Inoue , Kota Yamaguchi

Towards Book Cover Design via Layout Graphs

Book covers are intentionally designed and provide an introduction to a book. However, they typically require professional skills to design and produce the cover images. Thus, we propose a generative neural network that can produce book…

Computer Vision and Pattern Recognition · Computer Science 2021-06-16 Wensheng Zhang , Yan Zheng , Taiga Miyazono , Seiichi Uchida , Brian Kenji Iwana