Related papers: DiffSTR: Controlled Diffusion Models for Scene Tex…

DiffusionSTR: Diffusion Model for Scene Text Recognition

This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition…

Computer Vision and Pattern Recognition · Computer Science 2023-06-30 Masato Fujitake

On Manipulating Scene Text in the Wild with Diffusion Models

Diffusion models have gained attention for image editing yielding impressive results in text-to-image tasks. On the downside, one might notice that generated images of stable diffusion models suffer from deteriorated details. This pitfall…

Computer Vision and Pattern Recognition · Computer Science 2024-05-09 Joshua Santoso , Christian Simon , Williem

Improving Diffusion Models for Scene Text Editing with Dual Encoders

Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop…

Computer Vision and Pattern Recognition · Computer Science 2023-04-13 Jiabao Ji , Guanhua Zhang , Zhaowen Wang , Bairu Hou , Zhifei Zhang , Brian Price , Shiyu Chang

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling

Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling. In this paper, we aim to address this issue by introducing a Text-aware Masked Image Modeling algorithm (TMIM), which…

Computer Vision and Pattern Recognition · Computer Science 2024-09-23 Zixiao Wang , Hongtao Xie , YuXin Wang , Yadong Qu , Fengjun Guo , Pengwei Liu

PSSTRNet: Progressive Segmentation-guided Scene Text Removal Network

Scene text removal (STR) is a challenging task due to the complex text fonts, colors, sizes, and background textures in scene images. However, most previous methods learn both text location and background inpainting implicitly within a…

Computer Vision and Pattern Recognition · Computer Science 2023-06-14 Guangtao Lyu , Anna Zhu

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Recently, diffusion-based image generation methods are credited for their remarkable text-to-image generation capabilities, while still facing challenges in accurately generating multilingual scene text images. To tackle this problem, we…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Lingjun Zhang , Xinyuan Chen , Yaohui Wang , Yue Lu , Yu Qiao

Stroke-Based Scene Text Erasing Using Synthetic Data for Training

Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn significant attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text…

Computer Vision and Pattern Recognition · Computer Science 2021-12-06 Zhengmi Tang , Tomo Miyazaki , Yoshihiro Sugaya , Shinichiro Omachi

Structure-Guided Diffusion Models for High-Fidelity Portrait Shadow Removal

We present a diffusion-based portrait shadow removal approach that can robustly produce high-fidelity results. Unlike previous methods, we cast shadow removal as diffusion-based inpainting. To this end, we first train a shadow-independent…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Wanchang Yu , Qing Zhang , Rongjia Zheng , Wei-Shi Zheng

Inverse Scene Text Removal

Scene text removal (STR) aims to erase textual elements from images. It was originally intended for removing privacy-sensitiveor undesired texts from natural scene images, but is now also appliedto typographic images. STR typically detects…

Computer Vision and Pattern Recognition · Computer Science 2025-06-27 Takumi Yoshimatsu , Shumpei Takezaki , Seiichi Uchida

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

Centred on content modification and style preservation, Scene Text Editing (STE) remains a challenging task despite considerable progress in text-to-image synthesis and text-driven image manipulation recently. GAN-based STE methods…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Weichao Zeng , Yan Shu , Zhenhang Li , Dongbao Yang , Yu Zhou

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Ling Fu , Zijie Wu , Yingying Zhu , Yuliang Liu , Xiang Bai

TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution

The goal of scene text image super-resolution is to reconstruct high-resolution text-line images from unrecognizable low-resolution inputs. The existing methods relying on the optimization of pixel-level loss tend to yield text edges that…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Baolin Liu , Zongyuan Yang , Pengfei Wang , Junjie Zhou , Ziqi Liu , Ziyi Song , Yan Liu , Yongping Xiong

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

Denoising diffusion probabilistic models for image inpainting aim to add the noise to the texture of image during the forward process and recover masked regions with unmasked ones of the texture via the reverse denoising process. Despite…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Haipeng Liu , Yang Wang , Biao Qian , Meng Wang , Yong Rui

Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution

Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images, consequently elevating recognition accuracy in Scene Text Recognition (STR). Previous methods predominantly…

Computer Vision and Pattern Recognition · Computer Science 2023-11-23 Yuxuan Zhou , Liangcai Gao , Zhi Tang , Baole Wei

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

Scene text recognition (STR) suffers from challenges of either less realistic synthetic training data or the difficulty of collecting sufficient high-quality real-world data, limiting the effectiveness of trained models. Meanwhile, despite…

Computer Vision and Pattern Recognition · Computer Science 2025-09-11 Xingsong Ye , Yongkun Du , Yunbo Tao , Zhineng Chen

MDiff4STR: Mask Diffusion Model for Scene Text Recognition

Mask Diffusion Models (MDMs) have recently emerged as a promising alternative to auto-regressive models (ARMs) for vision-language tasks, owing to their flexible balance of efficiency and accuracy. In this paper, for the first time, we…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Yongkun Du , Miaomiao Zhao , Songlin Fan , Zhineng Chen , Caiyan Jia , Yu-Gang Jiang

Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt

Text-to-image generation has witnessed great progress, especially with the recent advancements in diffusion models. Since texts cannot provide detailed conditions like object appearance, reference images are usually leveraged for the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Zhiqi Huang , Huixin Xiong , Haoyu Wang , Longguang Wang , Zhiheng Li

SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model

Generic image inpainting aims to complete a corrupted image by borrowing surrounding information, which barely generates novel content. By contrast, multi-modal inpainting provides more flexible and useful controls on the inpainted content,…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Shaoan Xie , Zhifei Zhang , Zhe Lin , Tobias Hinz , Kun Zhang

DiffEdit: Diffusion-based semantic image editing with mask guidance

Image generation has recently seen tremendous advances, with diffusion models allowing to synthesize convincing images for a large variety of text prompts. In this article, we propose DiffEdit, a method to take advantage of text-conditioned…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Guillaume Couairon , Jakob Verbeek , Holger Schwenk , Matthieu Cord

SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model

With the rapid development of diffusion models, style transfer has made remarkable progress. However, flexible and localized style editing for scene text remains an unsolved challenge. Although existing scene text editing methods have…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Honghui Yuan , Keiji Yanai