Related papers: DiffusionSTR: Diffusion Model for Scene Text Recog…

On Manipulating Scene Text in the Wild with Diffusion Models

Diffusion models have gained attention for image editing yielding impressive results in text-to-image tasks. On the downside, one might notice that generated images of stable diffusion models suffer from deteriorated details. This pitfall…

Computer Vision and Pattern Recognition · Computer Science 2024-05-09 Joshua Santoso , Christian Simon , Williem

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Ling Fu , Zijie Wu , Yingying Zhu , Yuliang Liu , Xiang Bai

DiffSTR: Controlled Diffusion Models for Scene Text Removal

To prevent unauthorized use of text in images, Scene Text Removal (STR) has become a crucial task. It focuses on automatically removing text and replacing it with a natural, text-less background while preserving significant details such as…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Sanhita Pathak , Vinay Kaushik , Brejesh Lall

Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution

Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images, consequently elevating recognition accuracy in Scene Text Recognition (STR). Previous methods predominantly…

Computer Vision and Pattern Recognition · Computer Science 2023-11-23 Yuxuan Zhou , Liangcai Gao , Zhi Tang , Baole Wei

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Recently, diffusion-based image generation methods are credited for their remarkable text-to-image generation capabilities, while still facing challenges in accurately generating multilingual scene text images. To tackle this problem, we…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Lingjun Zhang , Xinyuan Chen , Yaohui Wang , Yue Lu , Yu Qiao

JSTR: Judgment Improves Scene Text Recognition

In this paper, we present a method for enhancing the accuracy of scene text recognition tasks by judging whether the image and text match each other. While previous studies focused on generating the recognition results from input images,…

Computer Vision and Pattern Recognition · Computer Science 2024-04-10 Masato Fujitake

Improving Diffusion Models for Scene Text Editing with Dual Encoders

Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop…

Computer Vision and Pattern Recognition · Computer Science 2023-04-13 Jiabao Ji , Guanhua Zhang , Zhaowen Wang , Bairu Hou , Zhifei Zhang , Brian Price , Shiyu Chang

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

Scene text recognition (STR) suffers from challenges of either less realistic synthetic training data or the difficulty of collecting sufficient high-quality real-world data, limiting the effectiveness of trained models. Meanwhile, despite…

Computer Vision and Pattern Recognition · Computer Science 2025-09-11 Xingsong Ye , Yongkun Du , Yunbo Tao , Zhineng Chen

Layout Agnostic Scene Text Image Synthesis with Diffusion Models

While diffusion models have significantly advanced the quality of image generation their capability to accurately and coherently render text within these images remains a substantial challenge. Conventional diffusion-based methods for scene…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Qilong Zhangli , Jindong Jiang , Di Liu , Licheng Yu , Xiaoliang Dai , Ankit Ramchandani , Guan Pang , Dimitris N. Metaxas , Praveen Krishnan

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Xuehai He , Weixi Feng , Tsu-Jui Fu , Varun Jampani , Arjun Akula , Pradyumna Narayana , Sugato Basu , William Yang Wang , Xin Eric Wang

Text Detection and Recognition in the Wild: A Review

Detection and recognition of text in natural images are two main problems in the field of computer vision that have a wide variety of applications in analysis of sports videos, autonomous driving, industrial automation, to name a few. They…

Computer Vision and Pattern Recognition · Computer Science 2020-07-02 Zobeir Raisi , Mohamed A. Naiel , Paul Fieguth , Steven Wardell , John Zelek

SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model

With the rapid development of diffusion models, style transfer has made remarkable progress. However, flexible and localized style editing for scene text remains an unsolved challenge. Although existing scene text editing methods have…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Honghui Yuan , Keiji Yanai

IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition

Nowadays, scene text recognition has attracted more and more attention due to its diverse applications. Most state-of-the-art methods adopt an encoder-decoder framework with the attention mechanism, autoregressively generating text from…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Xiaomeng Yang , Zhi Qiao , Yu Zhou

ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models

Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Jianyi Zhang , Yufan Zhou , Jiuxiang Gu , Curtis Wigington , Tong Yu , Yiran Chen , Tong Sun , Ruiyi Zhang

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

Centred on content modification and style preservation, Scene Text Editing (STE) remains a challenging task despite considerable progress in text-to-image synthesis and text-driven image manipulation recently. GAN-based STE methods…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Weichao Zeng , Yan Shu , Zhenhang Li , Dongbao Yang , Yu Zhou

Text Recognition in the Wild: A Survey

The history of text can be traced back over thousands of years. Rich and precise semantic information carried by text is important in a wide range of vision-based application scenarios. Therefore, text recognition in natural scenes has been…

Computer Vision and Pattern Recognition · Computer Science 2020-12-04 Xiaoxue Chen , Lianwen Jin , Yuanzhi Zhu , Canjie Luo , Tianwei Wang

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in…

Computer Vision and Pattern Recognition · Computer Science 2018-08-02 Pengyuan Lyu , Minghui Liao , Cong Yao , Wenhao Wu , Xiang Bai

TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution

Real-world text image super-resolution aims to restore overall visual quality and text legibility in images suffering from diverse degradations and text distortions. However, the scarcity of text image data in existing datasets results in…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Haodong He , Xin Zhan , Yancheng Bai , Rui Lan , Lei Sun , Xiangxiang Chu

TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance

While recent advancements in Image Super-Resolution (SR) using diffusion models have shown promise in improving overall image quality, their application to scene text images has revealed limitations. These models often struggle with…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Keren Ye , Ignacio Garcia Dorado , Michalis Raptis , Mauricio Delbracio , Irene Zhu , Peyman Milanfar , Hossein Talebi

Context Diffusion: In-Context Aware Image Generation

We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is…

Computer Vision and Pattern Recognition · Computer Science 2025-07-24 Ivona Najdenkoska , Animesh Sinha , Abhimanyu Dubey , Dhruv Mahajan , Vignesh Ramanathan , Filip Radenovic