English

DocMAE: Document Image Rectification via Self-supervised Representation Learning

Computer Vision and Pattern Recognition 2023-04-21 v1

Abstract

Tremendous efforts have been made on document image rectification, but how to learn effective representation of such distorted images is still under-explored. In this paper, we present DocMAE, a novel self-supervised framework for document image rectification. Our motivation is to encode the structural cues in document images by leveraging masked autoencoder to benefit the rectification, i.e., the document boundaries, and text lines. Specifically, we first mask random patches of the background-excluded document images and then reconstruct the missing pixels. With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents by restoring document boundaries and missing text lines. Transfer performance in the downstream rectification task validates the effectiveness of our method. Extensive experiments are conducted to demonstrate the effectiveness of our method.

Keywords

Cite

@article{arxiv.2304.10341,
  title  = {DocMAE: Document Image Rectification via Self-supervised Representation Learning},
  author = {Shaokai Liu and Hao Feng and Wengang Zhou and Houqiang Li and Cong Liu and Feng Wu},
  journal= {arXiv preprint arXiv:2304.10341},
  year   = {2023}
}

Comments

Accepted to ICME 2023

R2 v1 2026-06-28T10:12:31.412Z