Related papers: Docling Technical Report

Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion

We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by…

Computation and Language · Computer Science 2025-01-31 Nikolaos Livathinos , Christoph Auer , Maksym Lysak , Ahmed Nassar , Michele Dolfi , Panos Vagenas , Cesar Berrospi Ramis , Matteo Omenetti , Kasper Dinkla , Yusik Kim , Shubham Gupta , Rafael Teixeira de Lima , Valery Weber , Lucas Morin , Ingmar Meijer , Viktor Kuropiatnyk , Peter W. J. Staar

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Cong Yao

DocAgent: A Multi-Agent System for Automated Code Documentation Generation

High-quality code documentation is crucial for software development especially in the era of AI. However, generating it automatically using Large Language Models (LLMs) remains challenging, as existing approaches often produce incomplete,…

Software Engineering · Computer Science 2025-05-27 Dayu Yang , Antoine Simoulin , Xin Qian , Xiaoyi Liu , Yuwei Cao , Zhaopu Teng , Grey Yang

Advanced Layout Analysis Models for Docling

This technical report documents the development of novel Layout Analysis models integrated into the Docling document-conversion pipeline. We trained several state-of-the-art object detectors based on the RT-DETR, RT-DETRv2 and DFINE…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Nikolaos Livathinos , Christoph Auer , Ahmed Nassar , Rafael Teixeira de Lima , Maksym Lysak , Brown Ebouky , Cesar Berrospi , Michele Dolfi , Panagiotis Vagenas , Matteo Omenetti , Kasper Dinkla , Yusik Kim , Valery Weber , Lucas Morin , Ingmar Meijer , Viktor Kuropiatnyk , Tim Strohmeyer , A. Said Gurbuz , Peter W. J. Staar

PDFMathTranslate: Scientific Document Translation Preserving Layouts

Language barriers in scientific documents hinder the diffusion and development of science and technologies. However, prior efforts in translating such documents largely overlooked the information in layouts. To bridge the gap, we introduce…

Computation and Language · Computer Science 2025-09-23 Rongxin Ouyang , Chang Chu , Zhikuang Xin , Xiangyao Ma

DOCUEVAL: An LLM-based AI Engineering Tool for Building Customisable Document Evaluation Workflows

Foundation models, such as large language models (LLMs), have the potential to streamline evaluation workflows and improve their performance. However, practical adoption faces challenges, such as customisability, accuracy, and scalability.…

Information Retrieval · Computer Science 2025-11-11 Hao Zhang , Qinghua Lu , Liming Zhu

SciWING -- A Software Toolkit for Scientific Document Processing

We introduce SciWING, an open-source software toolkit which provides access to pre-trained models for scientific document processing tasks, inclusive of citation string parsing and logical structure recovery. SciWING enables researchers to…

Digital Libraries · Computer Science 2020-10-26 Abhinav Ramesh Kashyap , Min-Yen Kan

DocBank: A Benchmark Dataset for Document Layout Analysis

Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are…

Computation and Language · Computer Science 2020-11-12 Minghao Li , Yiheng Xu , Lei Cui , Shaohan Huang , Furu Wei , Zhoujun Li , Ming Zhou

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Document parsing (DP) transforms unstructured or semi-structured documents into structured, machine-readable representations, enabling downstream applications such as knowledge base construction and retrieval-augmented generation (RAG).…

Multimedia · Computer Science 2026-04-07 Qintong Zhang , Bin Wang , Victor Shea-Jay Huang , Junyuan Zhang , Zhengren Wang , Hao Liang , Conghui He , Wentao Zhang

Free and Customizable Code Documentation with LLMs: A Fine-Tuning Approach

Automated documentation of programming source code is a challenging task with significant practical and scientific implications for the developer community. We present a large language model (LLM)-based application that developers can use…

Software Engineering · Computer Science 2025-12-17 Sayak Chakrabarty , Souradip Pal

Document AI: Benchmarks, Models and Applications

Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents. It is an important research direction for natural language…

Computation and Language · Computer Science 2021-11-17 Lei Cui , Yiheng Xu , Tengchao Lv , Furu Wei

appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit

We present appjsonify, a Python-based PDF-to-JSON conversion toolkit for academic papers. It parses a PDF file using several visual-based document layout analysis models and rule-based text processing approaches. appjsonify is a flexible…

Computation and Language · Computer Science 2023-10-04 Atsuki Yamaguchi , Terufumi Morishita

DocRefine: An Intelligent Framework for Scientific Document Understanding and Content Optimization based on Multimodal Large Model Agents

The exponential growth of scientific literature in PDF format necessitates advanced tools for efficient and accurate document understanding, summarization, and content optimization. Traditional methods fall short in handling complex layouts…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Kun Qian , Wenjie Li , Tianyu Sun , Wenhong Wang , Wenhan Luo

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

Structured document understanding has attracted considerable attention and made significant progress recently, owing to its crucial role in intelligent document processing. However, most existing related models can only deal with the…

Computation and Language · Computer Science 2022-03-01 Jiapeng Wang , Lianwen Jin , Kai Ding

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Renqiu Xia , Song Mao , Xiangchao Yan , Hongbin Zhou , Bo Zhang , Haoyang Peng , Jiahao Pi , Daocheng Fu , Wenjie Wu , Hancheng Ye , Shiyang Feng , Bin Wang , Chao Xu , Conghui He , Pinlong Cai , Min Dou , Botian Shi , Sheng Zhou , Yongwei Wang , Bin Wang , Junchi Yan , Fei Wu , Yu Qiao

DocuBot : Generating financial reports using natural language interactions

The financial services industry perpetually processes an overwhelming amount of complex data. Digital reports are often created based on tedious manual analysis as well as visualization of the underlying trends and characteristics of data.…

Computation and Language · Computer Science 2021-02-03 Vineeth Ravi , Selim Amrouni , Andrea Stefanucci , Armineh Nourbakhsh , Prashant Reddy , Manuela Veloso

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

We introduce SmolDocling, an ultra-compact vision-language model targeting end-to-end document conversion. Our model comprehensively processes entire pages by generating DocTags, a new universal markup format that captures all page elements…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Ahmed Nassar , Andres Marafioti , Matteo Omenetti , Maksym Lysak , Nikolaos Livathinos , Christoph Auer , Lucas Morin , Rafael Teixeira de Lima , Yusik Kim , A. Said Gurbuz , Michele Dolfi , Miquel Farré , Peter W. J. Staar

Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown

Academic documents stored in PDF format can be transformed into plain text structured markup languages to enhance accessibility and enable scalable digital library workflows. Markup languages allow for easier updates and customization,…

Multimedia · Computer Science 2025-12-23 Changxu Duan

From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering

Retrieval-Augmented Generation (RAG) systems depend critically on the quality of document preprocessing, yet no prior study has evaluated PDF processing frameworks by their impact on downstream question-answering accuracy. We address this…

Information Retrieval · Computer Science 2026-05-27 José Guilherme Marques dos Santos , Ricardo Yang , Rui Humberto Pereira , Alexandre Sousa , Brígida Mónica Faria , Henrique Lopes Cardoso , José Duarte , José Luís Reis , Luís Paulo Reis , Pedro Pimenta , José Paulo Marques dos Santos

Collage: Decomposable Rapid Prototyping for Information Extraction on Scientific PDFs

Recent years in NLP have seen the continued development of domain-specific information extraction tools for scientific documents, alongside the release of increasingly multimodal pretrained transformer models. While the opportunity for…

Computation and Language · Computer Science 2025-06-25 Sireesh Gururaja , Yueheng Zhang , Guannan Tang , Tianhao Zhang , Kevin Murphy , Yu-Tsen Yi , Junwon Seo , Anthony Rollett , Emma Strubell