English

Arbitrary Shape Text Detection using Transformers

Computer Vision and Pattern Recognition 2022-02-24 v1 Machine Learning

Abstract

Recent text detection frameworks require several handcrafted components such as anchor generation, non-maximum suppression (NMS), or multiple processing stages (e.g. label generation) to detect arbitrarily shaped text images. In contrast, we propose an end-to-end trainable architecture based on Detection using Transformers (DETR), that outperforms previous state-of-the-art methods in arbitrary-shaped text detection. At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio. This is possible due to a hybrid shape representation made from Bezier curves, that are further split into piece-wise polygons. The proposed loss function is then a combination of a generalized-split-intersection-over-union loss defined over the piece-wise polygons and regularized by a Smooth-ln\ln regression over the Bezier curve's control points. We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text, and show that the proposed method outperforms the previous state-of-the-art methods in arbitrary-shape text detection tasks.

Keywords

Cite

@article{arxiv.2202.11221,
  title  = {Arbitrary Shape Text Detection using Transformers},
  author = {Zobeir Raisi and Georges Younes and John Zelek},
  journal= {arXiv preprint arXiv:2202.11221},
  year   = {2022}
}