Related papers: Geometric Perception based Efficient Text Recognit…
Scene Text Recognition (STR) models have achieved high performance in recent years on benchmark datasets where text images are presented with minimal noise. Traditional STR recognition pipelines take a cropped image as sole input and…
Existing Scene Text Recognition (STR) methods typically use a language model to optimize the joint probability of the 1D character sequence predicted by a visual recognition (VR) model, which ignore the 2D spatial context of visual…
State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word text in images of natural scenes and ignore the semantic coherence within a region of text. However, when interpreted together, seemingly isolated…
Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In re- cent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have…
Scene text recognition (STR) is very challenging due to the diversity of text instances and the complexity of scenes. The community has paid increasing attention to boost the performance by improving the pre-processing image module, like…
Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have been successively proposed and incorporating linguistic knowledge into STR…
Multi-modal models have shown appealing performance in visual recognition tasks, as free-form text-guided training evokes the ability to understand fine-grained visual content. However, current models cannot be trivially applied to scene…
Scene text recognition (STR) enables computers to read text in natural scenes such as object labels, road signs and instructions. STR helps machines perform informed decisions such as what object to pick, which direction to go, and what is…
Scene text recognition (STR) is the task of recognizing character sequences in natural scenes. While there have been great advances in STR methods, current methods still fail to recognize texts in arbitrary shapes, such as heavily curved or…
Geometry-aware modules are widely applied in recent deep learning architectures for scene representation and rendering. However, these modules require intrinsic camera information that might not be obtained accurately. In this paper, we…
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective. We begin by revisiting the six commonly used benchmarks in STR and observe a trend of performance saturation, whereby only 2.91% of the benchmark…
Scene text recognition (STR) suffers from challenges of either less realistic synthetic training data or the difficulty of collecting sufficient high-quality real-world data, limiting the effectiveness of trained models. Meanwhile, despite…
The prevalent perspectives of scene text recognition are from sequence to sequence (seq2seq) and segmentation. Nevertheless, the former is composed of many components which makes implementation and deployment complicated, while the latter…
In an era where wearable technology is reshaping applications, Scene Text Detection and Recognition (STDR) becomes a straightforward choice through the lens of egocentric vision. Leveraging Meta's Project Aria smart glasses, this paper…
Scene Text Recognition (STR), the task of recognizing text against complex image backgrounds, is an active area of research. Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes. In this…
The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models. However, training protocol (i.e., settings of the hyper-parameters involved in the training of STR…
Semantic segmentation is a crucial image understanding task, where each pixel of image is categorized into a corresponding label. Since the pixel-wise labeling for ground-truth is tedious and labor intensive, in practical applications, many…
This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition…
Robustness certification, which aims to formally certify the predictions of neural networks against adversarial inputs, has become an integral part of important tool for safety-critical applications. Despite considerable progress, existing…
Deep learning based methods have achieved surprising progress in Scene Text Recognition (STR), one of classic problems in computer vision. In this paper, we propose a feasible framework for multi-lingual arbitrary-shaped STR, including…