Related papers: IPAD: Iterative, Parallel, and Diffusion-based Net…

PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition

Nowadays, scene text recognition has attracted more and more attention due to its various applications. Most state-of-the-art methods adopt an encoder-decoder framework with attention mechanism, which generates text autoregressively from…

Computer Vision and Pattern Recognition · Computer Science 2021-09-10 Zhi Qiao , Yu Zhou , Jin Wei , Wei Wang , Yuan Zhang , Ning Jiang , Hongbin Wang , Weiping Wang

Parallel Scale-wise Attention Network for Effective Scene Text Recognition

The paper proposes a new text recognition network for scene-text images. Many state-of-the-art methods employ the attention mechanism either in the text encoder or decoder for the text alignment. Although the encoder-based attention yields…

Computer Vision and Pattern Recognition · Computer Science 2021-04-27 Usman Sajid , Michael Chow , Jin Zhang , Taejoon Kim , Guanghui Wang

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Ling Fu , Zijie Wu , Yingying Zhu , Yuliang Liu , Xiang Bai

SCAN: Sliding Convolutional Attention Network for Scene Text Recognition

Scene text recognition has drawn great attentions in the community of computer vision and artificial intelligence due to its challenges and wide applications. State-of-the-art recurrent neural networks (RNN) based models map an input…

Computer Vision and Pattern Recognition · Computer Science 2018-06-05 Yi-Chao Wu , Fei Yin , Xu-Yao Zhang , Li Liu , Cheng-Lin Liu

Primitive Representation Learning for Scene Text Recognition

Scene text recognition is a challenging task due to diverse variations of text instances in natural scene images. Conventional methods based on CNN-RNN-CTC or encoder-decoder with attention mechanism may not fully investigate stable and…

Computer Vision and Pattern Recognition · Computer Science 2021-05-11 Ruijie Yan , Liangrui Peng , Shanyu Xiao , Gang Yao

DiffusionSTR: Diffusion Model for Scene Text Recognition

This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition…

Computer Vision and Pattern Recognition · Computer Science 2023-06-30 Masato Fujitake

Deep Direct Regression for Multi-Oriented Scene Text Detection

In this paper, we first provide a new perspective to divide existing high performance object detection methods into direct and indirect regressions. Direct regression performs boundary regression by predicting the offsets from a given…

Computer Vision and Pattern Recognition · Computer Science 2017-03-27 Wenhao He , Xu-Yao Zhang , Fei Yin , Cheng-Lin Liu

Context Perception Parallel Decoder for Scene Text Recognition

Scene text recognition (STR) methods have struggled to attain high accuracy and fast inference speed. Autoregressive (AR)-based models implement the recognition in a character-by-character manner, showing superiority in accuracy but with…

Computer Vision and Pattern Recognition · Computer Science 2023-10-10 Yongkun Du , Zhineng Chen , Caiyan Jia , Xiaoting Yin , Chenxia Li , Yuning Du , Yu-Gang Jiang

Latent Beam Diffusion Models for Generating Visual Sequences

While diffusion models excel at generating high-quality images from text prompts, they struggle with visual consistency when generating image sequences. Existing methods generate each image independently, leading to disjointed narratives -…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Guilherme Fernandes , Vasco Ramos , Regev Cohen , Idan Szpektor , João Magalhães

SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

Scene text recognition is a hot research topic in computer vision. Recently, many recognition methods based on the encoder-decoder framework have been proposed, and they can handle scene texts of perspective distortion and curve shape.…

Computer Vision and Pattern Recognition · Computer Science 2020-05-25 Zhi Qiao , Yu Zhou , Dongbao Yang , Yucan Zhou , Weiping Wang

Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks

In this work, we jointly address the problem of text detection and recognition in natural scene images based on convolutional recurrent neural networks. We propose a unified network that simultaneously localizes and recognizes text with a…

Computer Vision and Pattern Recognition · Computer Science 2017-07-14 Hui Li , Peng Wang , Chunhua Shen

Scene Text Recognition with Temporal Convolutional Encoder

Texts from scene images typically consist of several characters and exhibit a characteristic sequence structure. Existing methods capture the structure with the sequence-to-sequence models by an encoder to have the visual representations…

Computer Vision and Pattern Recognition · Computer Science 2020-02-18 Xiangcheng Du , Tianlong Ma , Yingbin Zheng , Hao Ye , Xingjiao Wu , Liang He

DiffVC: A Non-autoregressive Framework Based on Diffusion Model for Video Captioning

Current video captioning methods usually use an encoder-decoder structure to generate text autoregressively. However, autoregressive methods have inherent limitations such as slow generation speed and large cumulative error. Furthermore,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Junbo Wang , Liangyu Fu , Yuke Li , Yining Zhu , Ya Jing , Xuecheng Wu , Jiangbin Zheng

PICD: Versatile Perceptual Image Compression with Diffusion Rendering

Recently, perceptual image compression has achieved significant advancements, delivering high visual quality at low bitrates for natural images. However, for screen content, existing methods often produce noticeable artifacts when…

Computer Vision and Pattern Recognition · Computer Science 2025-05-12 Tongda Xu , Jiahao Li , Bin Li , Yan Wang , Ya-Qin Zhang , Yan Lu

A Holistic Representation Guided Attention Network for Scene Text Recognition

Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Lu Yang , Fan Dang , Peng Wang , Hui Li , Zhen Li , Yanning Zhang

Text Image Generation for Low-Resource Languages with Dual Translation Learning

Scene text recognition in low-resource languages frequently faces challenges due to the limited availability of training datasets derived from real-world scenes. This study proposes a novel approach that generates text images in…

Computer Vision and Pattern Recognition · Computer Science 2024-09-27 Chihiro Noguchi , Shun Fukuda , Shoichiro Mihara , Masao Yamanaka

Efficient Scene Text Detection with Textual Attention Tower

Scene text detection has received attention for years and achieved an impressive performance across various benchmarks. In this work, we propose an efficient and accurate approach to detect multioriented text in scene images. The proposed…

Computer Vision and Pattern Recognition · Computer Science 2020-02-11 Liang Zhang , Yufei Liu , Hang Xiao , Lu Yang , Guangming Zhu , Syed Afaq Shah , Mohammed Bennamoun , Peiyi Shen

Visual Re-ranking with Natural Language Understanding for Text Spotting

Many scene text recognition approaches are based on purely visual information and ignore the semantic relation between scene and text. In this paper, we tackle this problem from natural language processing perspective to fill the gap…

Computer Vision and Pattern Recognition · Computer Science 2018-10-31 Ahmed Sabir , Francesc Moreno-Noguer , Lluís Padró

A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition

Modern scene text recognition systems often depend on large end-to-end architectures that require extensive training and are prohibitively expensive for real-time scenarios. In such cases, the deployment of heavy models becomes impractical…

Computer Vision and Pattern Recognition · Computer Science 2025-03-21 Ritabrata Chakraborty , Shivakumara Palaiahnakote , Umapada Pal , Cheng-Lin Liu

TextScanner: Reading Characters in Order for Robust Scene Text Recognition

Driven by deep learning and the large volume of data, scene text recognition has evolved rapidly in recent years. Formerly, RNN-attention based methods have dominated this field, but suffer from the problem of \textit{attention drift} in…

Computer Vision and Pattern Recognition · Computer Science 2020-01-03 Zhaoyi Wan , Minghang He , Haoran Chen , Xiang Bai , Cong Yao