Improving Structured Text Recognition with Regular Expression Biasing

Baoguang Shi; Wenfeng Cheng; Yijuan Lu; Cha Zhang; Dinei Florencio

Improving Structured Text Recognition with Regular Expression Biasing

Computer Vision and Pattern Recognition 2021-11-15 v1

Authors: Baoguang Shi , Wenfeng Cheng , Yijuan Lu , Cha Zhang , Dinei Florencio

Abstract

We study the problem of recognizing structured text, i.e. text that follows certain formats, and propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing. A biased recognizer recognizes text that matches the specified regexes with significantly improved accuracy, at the cost of a generally small degradation on other text. The biasing is realized by modeling regexes as a Weighted Finite-State Transducer (WFST) and injecting it into the decoder via dynamic replacement. A single hyperparameter controls the biasing strength. The method is useful for recognizing text lines with known formats or containing words from a domain vocabulary. Examples include driver license numbers, drug names in prescriptions, etc. We demonstrate the efficacy of regex biasing on datasets of printed and handwritten structured text and measures its side effects.

Keywords

scene text detection and recognition text classification model transformation

Cite

@article{arxiv.2111.06738,
  title  = {Improving Structured Text Recognition with Regular Expression Biasing},
  author = {Baoguang Shi and Wenfeng Cheng and Yijuan Lu and Cha Zhang and Dinei Florencio},
  journal= {arXiv preprint arXiv:2111.06738},
  year   = {2021}
}

Improving Structured Text Recognition with Regular Expression Biasing

Abstract

Keywords

Cite

Related papers