Related papers: DECDM: Document Enhancement using Cycle-Consistent…

CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation

We introduce a diffusion-based cross-domain image translator in the absence of paired training data. Unlike GAN-based methods, our approach integrates diffusion models to learn the image translation process, allowing for more coverable…

Computer Vision and Pattern Recognition · Computer Science 2026-01-30 Shilong Zou , Yuhang Huang , Renjiao Yi , Chenyang Zhu , Kai Xu

DODO: Discrete OCR Diffusion Models

Optical Character Recognition (OCR) is a fundamental task for digitizing information, serving as a critical bridge between visual data and textual understanding. While modern Vision-Language Models (VLM) have achieved high accuracy in this…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Sean Man , Gilad Deutch , Roy Ganz , Roi Ronen , Shahar Tsiper , Shai Mazor , Niv Nayman

Text Change Detection in Multilingual Documents Using Image Comparison

Document comparison typically relies on optical character recognition (OCR) as its core technology. However, OCR requires the selection of appropriate language models for each document and the performance of multilingual or hybrid models…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Doyoung Park , Naresh Reddy Yarram , Sunjin Kim , Minkyu Kim , Seongho Cho , Taehee Lee

OCR accuracy improvement on document images through a novel pre-processing approach

Digital camera and mobile document image acquisition are new trends arising in the world of Optical Character Recognition and text detection. In some cases, such process integrates many distortions and produces poorly scanned text or…

Computer Vision and Pattern Recognition · Computer Science 2015-09-14 Abdeslam El Harraj , Naoufal Raissouni

NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement

Real-world documents may suffer various forms of degradation, often resulting in lower accuracy in optical character recognition (OCR) systems. Therefore, a crucial preprocessing step is essential to eliminate noise while preserving text…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Giordano Cicchetti , Danilo Comminiello

DocDiff: Document Enhancement via Residual Diffusion Models

Removing degradation from document images not only improves their visual quality and readability, but also enhances the performance of numerous automated document analysis and recognition tasks. However, existing regression-based methods…

Computer Vision and Pattern Recognition · Computer Science 2023-08-10 Zongyuan Yang , Baolin Liu , Yongping Xiong , Lan Yi , Guibin Wu , Xiaojun Tang , Ziqi Liu , Junjie Zhou , Xing Zhang

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

In recent years, text-image joint pre-training techniques have shown promising results in various tasks. However, in Optical Character Recognition (OCR) tasks, aligning text instances with their corresponding text regions in images poses a…

Computer Vision and Pattern Recognition · Computer Science 2024-04-18 Chen Duan , Pei Fu , Shan Guo , Qianyi Jiang , Xiaoming Wei

Enhancing Diffusion-Based Quantitatively Controllable Image Generation via Matrix-Form EDM and Adaptive Vicinal Training

Continuous Conditional Diffusion Model (CCDM) is a diffusion-based framework designed to generate high-quality images conditioned on continuous regression labels. Although CCDM has demonstrated clear advantages over prior approaches across…

Computer Vision and Pattern Recognition · Computer Science 2026-02-03 Xin Ding , Yun Chen , Sen Zhang , Kao Zhang , Nenglun Chen , Peibei Cao , Yongwei Wang , Fei Wu

Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model

Image restoration and enhancement are pivotal for numerous computer vision applications, yet unifying these tasks efficiently remains a significant challenge. Inspired by the iterative refinement capabilities of diffusion models, we propose…

Computer Vision and Pattern Recognition · Computer Science 2024-12-20 Minglong Xue , Jinhong He , Shivakumara Palaiahnakote , Mingliang Zhou

Multi-Sensor Diffusion-Driven Optical Image Translation for Large-Scale Applications

Comparing images captured by disparate sensors is a common challenge in remote sensing. This requires image translation -- converting imagery from one sensor domain to another while preserving the original content. Denoising Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 João Gabriel Vinholi , Marco Chini , Anis Amziane , Renato Machado , Danilo Silva , Patrick Matgen

Unknown-box Approximation to Improve Optical Character Recognition Performance

Optical character recognition (OCR) is a widely used pattern recognition application in numerous domains. There are several feature-rich, general-purpose OCR solutions available for consumers, which can provide moderate to excellent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Ayantha Randika , Nilanjan Ray , Xiao Xiao , Allegra Latimer

Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot Conditions

The challenge in fine-grained visual categorization lies in how to explore the subtle differences between different subclasses and achieve accurate discrimination. Previous research has relied on large-scale annotated data and pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2024-05-16 Tianxu Wu , Shuo Ye , Shuhuang Chen , Qinmu Peng , Xinge You

TransDocs: Optical Character Recognition with word to word translation

While OCR has been used in various applications, its output is not always accurate, leading to misfit words. This research work focuses on improving the optical character recognition (OCR) with ML techniques with integration of OCR with…

Computer Vision and Pattern Recognition · Computer Science 2023-04-18 Abhishek Bamotra , Phani Krishna Uppala

Advancing Image Classification with Discrete Diffusion Classification Modeling

Image classification is a well-studied task in computer vision, and yet it remains challenging under high-uncertainty conditions, such as when input images are corrupted or training data are limited. Conventional classification approaches…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Omer Belhasin , Shelly Golan , Ran El-Yaniv , Michael Elad

DiffUCD:Unsupervised Hyperspectral Image Change Detection with Semantic Correlation Diffusion Model

Hyperspectral image change detection (HSI-CD) has emerged as a crucial research area in remote sensing due to its ability to detect subtle changes on the earth's surface. Recently, diffusional denoising probabilistic models (DDPM) have…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Xiangrong Zhang , Shunli Tian , Guanchun Wang , Huiyu Zhou , Licheng Jiao

One-Step Diffusion Model for Image Motion-Deblurring

Currently, methods for single-image deblurring based on CNNs and transformers have demonstrated promising performance. However, these methods often suffer from perceptual limitations, poor generalization ability, and struggle with heavy or…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Xiaoyang Liu , Yuquan Wang , Zheng Chen , Jiezhang Cao , He Zhang , Yulun Zhang , Xiaokang Yang

Conditional Consistency Guided Image Translation and Enhancement

Consistency models have emerged as a promising alternative to diffusion models, offering high-quality generative capabilities through single-step sample generation. However, their application to multi-domain image translation tasks, such as…

Computer Vision and Pattern Recognition · Computer Science 2025-01-06 Amil Bhagat , Milind Jain , A. V. Subramanyam

Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model

Denoising diffusion models have emerged as a dominant paradigm in image generation. Discretizing image data into tokens is a critical step for effectively integrating images with Transformer and other architectures. Although the Denoising…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Fei Kong

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Optical character recognition (OCR) has evolved from line-level transcription to structured document parsing, requiring models to recover long-form sequences containing layout, tables, and formulas. Despite recent advances in…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Hejun Dong , Junbo Niu , Bin Wang , Weijun Zeng , Wentao Zhang , Conghui He

DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion

Diffusion models, known for their powerful generative capabilities, play a crucial role in addressing real-world super-resolution challenges. However, these models often focus on improving local textures while neglecting the impacts of…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Chunyang Bi , Xin Luo , Sheng Shen , Mengxi Zhang , Huanjing Yue , Jingyu Yang