English

Docling Technical Report

Computation and Language 2024-12-10 v5 Computer Vision and Pattern Recognition Software Engineering

Abstract

This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. The code interface allows for easy extensibility and addition of new features and models.

Keywords

Cite

@article{arxiv.2408.09869,
  title  = {Docling Technical Report},
  author = {Christoph Auer and Maksym Lysak and Ahmed Nassar and Michele Dolfi and Nikolaos Livathinos and Panos Vagenas and Cesar Berrospi Ramis and Matteo Omenetti and Fabian Lindlbauer and Kasper Dinkla and Lokesh Mishra and Yusik Kim and Shubham Gupta and Rafael Teixeira de Lima and Valery Weber and Lucas Morin and Ingmar Meijer and Viktor Kuropiatnyk and Peter W. J. Staar},
  journal= {arXiv preprint arXiv:2408.09869},
  year   = {2024}
}

Comments

Docling v1 report