English
Related papers

Related papers: DiffSTR: Controlled Diffusion Models for Scene Tex…

200 papers

This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition…

Computer Vision and Pattern Recognition · Computer Science 2023-06-30 Masato Fujitake

Diffusion models have gained attention for image editing yielding impressive results in text-to-image tasks. On the downside, one might notice that generated images of stable diffusion models suffer from deteriorated details. This pitfall…

Computer Vision and Pattern Recognition · Computer Science 2024-05-09 Joshua Santoso , Christian Simon , Williem

Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop…

Computer Vision and Pattern Recognition · Computer Science 2023-04-13 Jiabao Ji , Guanhua Zhang , Zhaowen Wang , Bairu Hou , Zhifei Zhang , Brian Price , Shiyu Chang

Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling. In this paper, we aim to address this issue by introducing a Text-aware Masked Image Modeling algorithm (TMIM), which…

Computer Vision and Pattern Recognition · Computer Science 2024-09-23 Zixiao Wang , Hongtao Xie , YuXin Wang , Yadong Qu , Fengjun Guo , Pengwei Liu

Scene text removal (STR) is a challenging task due to the complex text fonts, colors, sizes, and background textures in scene images. However, most previous methods learn both text location and background inpainting implicitly within a…

Computer Vision and Pattern Recognition · Computer Science 2023-06-14 Guangtao Lyu , Anna Zhu

Recently, diffusion-based image generation methods are credited for their remarkable text-to-image generation capabilities, while still facing challenges in accurately generating multilingual scene text images. To tackle this problem, we…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Lingjun Zhang , Xinyuan Chen , Yaohui Wang , Yue Lu , Yu Qiao

Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn significant attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text…

Computer Vision and Pattern Recognition · Computer Science 2021-12-06 Zhengmi Tang , Tomo Miyazaki , Yoshihiro Sugaya , Shinichiro Omachi

We present a diffusion-based portrait shadow removal approach that can robustly produce high-fidelity results. Unlike previous methods, we cast shadow removal as diffusion-based inpainting. To this end, we first train a shadow-independent…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Wanchang Yu , Qing Zhang , Rongjia Zheng , Wei-Shi Zheng

Scene text removal (STR) aims to erase textual elements from images. It was originally intended for removing privacy-sensitiveor undesired texts from natural scene images, but is now also appliedto typographic images. STR typically detects…

Computer Vision and Pattern Recognition · Computer Science 2025-06-27 Takumi Yoshimatsu , Shumpei Takezaki , Seiichi Uchida

Centred on content modification and style preservation, Scene Text Editing (STE) remains a challenging task despite considerable progress in text-to-image synthesis and text-driven image manipulation recently. GAN-based STE methods…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Weichao Zeng , Yan Shu , Zhenhang Li , Dongbao Yang , Yu Zhou

Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Ling Fu , Zijie Wu , Yingying Zhu , Yuliang Liu , Xiang Bai

The goal of scene text image super-resolution is to reconstruct high-resolution text-line images from unrecognizable low-resolution inputs. The existing methods relying on the optimization of pixel-level loss tend to yield text edges that…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Baolin Liu , Zongyuan Yang , Pengfei Wang , Junjie Zhou , Ziqi Liu , Ziyi Song , Yan Liu , Yongping Xiong

Denoising diffusion probabilistic models for image inpainting aim to add the noise to the texture of image during the forward process and recover masked regions with unmasked ones of the texture via the reverse denoising process. Despite…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Haipeng Liu , Yang Wang , Biao Qian , Meng Wang , Yong Rui

Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images, consequently elevating recognition accuracy in Scene Text Recognition (STR). Previous methods predominantly…

Computer Vision and Pattern Recognition · Computer Science 2023-11-23 Yuxuan Zhou , Liangcai Gao , Zhi Tang , Baole Wei

Scene text recognition (STR) suffers from challenges of either less realistic synthetic training data or the difficulty of collecting sufficient high-quality real-world data, limiting the effectiveness of trained models. Meanwhile, despite…

Computer Vision and Pattern Recognition · Computer Science 2025-09-11 Xingsong Ye , Yongkun Du , Yunbo Tao , Zhineng Chen

Mask Diffusion Models (MDMs) have recently emerged as a promising alternative to auto-regressive models (ARMs) for vision-language tasks, owing to their flexible balance of efficiency and accuracy. In this paper, for the first time, we…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Yongkun Du , Miaomiao Zhao , Songlin Fan , Zhineng Chen , Caiyan Jia , Yu-Gang Jiang

Text-to-image generation has witnessed great progress, especially with the recent advancements in diffusion models. Since texts cannot provide detailed conditions like object appearance, reference images are usually leveraged for the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Zhiqi Huang , Huixin Xiong , Haoyu Wang , Longguang Wang , Zhiheng Li

Generic image inpainting aims to complete a corrupted image by borrowing surrounding information, which barely generates novel content. By contrast, multi-modal inpainting provides more flexible and useful controls on the inpainted content,…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Shaoan Xie , Zhifei Zhang , Zhe Lin , Tobias Hinz , Kun Zhang

Image generation has recently seen tremendous advances, with diffusion models allowing to synthesize convincing images for a large variety of text prompts. In this article, we propose DiffEdit, a method to take advantage of text-conditioned…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Guillaume Couairon , Jakob Verbeek , Holger Schwenk , Matthieu Cord

With the rapid development of diffusion models, style transfer has made remarkable progress. However, flexible and localized style editing for scene text remains an unsolved challenge. Although existing scene text editing methods have…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Honghui Yuan , Keiji Yanai
‹ Prev 1 2 3 10 Next ›