English

Self-Supervised Representation Learning on Document Images

Computer Vision and Pattern Recognition 2020-05-28 v2 Machine Learning Image and Video Processing Machine Learning

Abstract

This work analyses the impact of self-supervised pre-training on document images in the context of document image classification. While previous approaches explore the effect of self-supervision on natural images, we show that patch-based pre-training performs poorly on document images because of their different structural properties and poor intra-sample semantic information. We propose two context-aware alternatives to improve performance on the Tobacco-3482 image classification task. We also propose a novel method for self-supervision, which makes use of the inherent multi-modality of documents (image and text), which performs better than other popular self-supervised methods, including supervised ImageNet pre-training, on document image classification scenarios with a limited amount of data.

Keywords

Cite

@article{arxiv.2004.10605,
  title  = {Self-Supervised Representation Learning on Document Images},
  author = {Adrian Cosma and Mihai Ghidoveanu and Michael Panaitescu-Liess and Marius Popescu},
  journal= {arXiv preprint arXiv:2004.10605},
  year   = {2020}
}

Comments

15 pages, 5 figures. Accepted at DAS 2020: IAPR International Workshop on Document Analysis Systems

R2 v1 2026-06-23T15:01:41.501Z