Related papers: MinerU-Diffusion: Rethinking Document OCR as Inver…

DODO: Discrete OCR Diffusion Models

Optical Character Recognition (OCR) is a fundamental task for digitizing information, serving as a critical bridge between visual data and textual understanding. While modern Vision-Language Models (VLM) have achieved high accuracy in this…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Sean Man , Gilad Deutch , Roy Ganz , Roi Ronen , Shahar Tsiper , Shai Mazor , Niv Nayman

DECDM: Document Enhancement using Cycle-Consistent Diffusion Models

The performance of optical character recognition (OCR) heavily relies on document image quality, which is crucial for automatic document processing and document intelligence. However, most existing document enhancement methods require…

Computer Vision and Pattern Recognition · Computer Science 2023-11-17 Jiaxin Zhang , Joy Rimchala , Lalla Mouatadid , Kamalika Das , Sricharan Kumar

DiffuRank: Effective Document Reranking with Diffusion Language Models

Recent advances in large language models (LLMs) have inspired new paradigms for document reranking. While this paradigm better exploits the reasoning and contextual understanding capabilities of LLMs, most existing LLM-based rerankers rely…

Information Retrieval · Computer Science 2026-02-16 Qi Liu , Kun Ai , Jiaxin Mao , Yanzhao Zhang , Mingxin Li , Dingkun Long , Pengjun Xie , Fengbin Zhu , Ji-Rong Wen

TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance

While recent advancements in Image Super-Resolution (SR) using diffusion models have shown promise in improving overall image quality, their application to scene text images has revealed limitations. These models often struggle with…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Keren Ye , Ignacio Garcia Dorado , Michalis Raptis , Mauricio Delbracio , Irene Zhu , Peyman Milanfar , Hossein Talebi

MinerU: An Open-Source Solution for Precise Document Content Extraction

Document content analysis has been a crucial research area in computer vision. Despite significant advancements in methods such as OCR, layout detection, and formula recognition, existing open-source solutions struggle to consistently…

Computer Vision and Pattern Recognition · Computer Science 2024-09-30 Bin Wang , Chao Xu , Xiaomeng Zhao , Linke Ouyang , Fan Wu , Zhiyuan Zhao , Rui Xu , Kaiwen Liu , Yuan Qu , Fukai Shang , Bo Zhang , Liqun Wei , Zhihao Sui , Wei Li , Botian Shi , Yu Qiao , Dahua Lin , Conghui He

Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model

Deciphering oracle bone scripts plays an important role in Chinese archaeology and philology. However, a significant challenge remains due to the scarcity of oracle character images. To overcome this issue, we propose Diff-Oracle, a novel…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Jing Li , Qiu-Feng Wang , Siyuan Wang , Rui Zhang , Kaizhu Huang , Erik Cambria

Efficient OCR for Building a Diverse Digital History

Thousands of users consult digital archives daily, but the information they can access is unrepresentative of the diversity of documentary history. The sequence-to-sequence architecture typically used for optical character recognition (OCR)…

Computer Vision and Pattern Recognition · Computer Science 2024-07-29 Jacob Carlson , Tom Bryan , Melissa Dell

Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion

Rendering and inverse rendering are pivotal tasks in both computer vision and graphics. The rendering equation is the core of the two tasks, as an ideal conditional distribution transfer function from intrinsic properties to RGB images.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-06 Zhifei Chen , Tianshuo Xu , Wenhang Ge , Leyi Wu , Dongyu Yan , Jing He , Luozhou Wang , Lu Zeng , Shunsi Zhang , Yingcong Chen

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently.…

Computation and Language · Computer Science 2023-12-14 Tong Wu , Zhihao Fan , Xiao Liu , Yeyun Gong , Yelong Shen , Jian Jiao , Hai-Tao Zheng , Juntao Li , Zhongyu Wei , Jian Guo , Nan Duan , Weizhu Chen

MinerU-Popo: Universal Post-Processing Model for Structured Document Parsing

VLM-based OCR models have become the de facto choice for document parsing, as they can accurately extract page-level elements (e.g., paragraphs within individual pages) together with their bounding boxes and textual content. However,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Bangrui Xu , Ziyang Miao , Xuanhe Zhou , Yiming Lin , Zirui Tang , Xiaomeng Zhao , Fan Wu , Cheng Tan , Fan Wu , Bin Wang , Conghui He

DeepOtsu: Document Enhancement and Binarization using Iterative Deep Learning

This paper presents a novel iterative deep learning framework and apply it for document enhancement and binarization. Unlike the traditional methods which predict the binary label of each pixel on the input image, we train the neural…

Computer Vision and Pattern Recognition · Computer Science 2019-01-21 Sheng He , Lambert Schomaker

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

We introduce Orthrus, a simple and efficient dual-architecture framework that unifies the exact generation fidelity of autoregressive Large Language Models (LLMs) with the high-speed parallel token generation of diffusion models. The…

Machine Learning · Computer Science 2026-05-19 Chien Van Nguyen , Chaitra Hegde , Van Cuong Pham , Ryan A. Rossi , Franck Dernoncourt , Thien Huu Nguyen

Reversible Diffusion Decoding for Diffusion Language Models

Diffusion language models enable parallel token generation through block-wise decoding, but their irreversible commitments can lead to stagnation, where the reverse diffusion process fails to make further progress under a suboptimal…

Computation and Language · Computer Science 2026-02-03 Xinyun Wang , Min Zhang , Sen Cui , Zhikang Chen , Bo Jiang , Kun Kuang , Mingbao Lin

DiffuGR: Generative Document Retrieval with Diffusion Language Models

Generative retrieval (GR) reframes document retrieval as an end-to-end task of generating sequential document identifiers (DocIDs). Existing GR methods predominantly rely on left-to-right auto-regressive decoding, which suffers from two…

Information Retrieval · Computer Science 2026-02-04 Xinpeng Zhao , Zhaochun Ren , Yukun Zhao , Zhenyang Li , Mengqi Zhang , Jun Feng , Ran Chen , Ying Zhou , Zhumin Chen , Shuaiqiang Wang , Dawei Yin , Xin Xin

Text Change Detection in Multilingual Documents Using Image Comparison

Document comparison typically relies on optical character recognition (OCR) as its core technology. However, OCR requires the selection of appropriate language models for each document and the performance of multilingual or hybrid models…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Doyoung Park , Naresh Reddy Yarram , Sunjin Kim , Minkyu Kim , Seongho Cho , Taehee Lee

DocRevive: A Unified Pipeline for Document Text Restoration

In Document Understanding, the challenge of reconstructing damaged, occluded, or incomplete text remains a critical yet unexplored problem. Subsequent document understanding tasks can benefit from a document reconstruction process. In…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Kunal Purkayastha , Ayan Banerjee , Josep Llados , Umapada Pal

TransDocs: Optical Character Recognition with word to word translation

While OCR has been used in various applications, its output is not always accurate, leading to misfit words. This research work focuses on improving the optical character recognition (OCR) with ML techniques with integration of OCR with…

Computer Vision and Pattern Recognition · Computer Science 2023-04-18 Abhishek Bamotra , Phani Krishna Uppala

NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement

Real-world documents may suffer various forms of degradation, often resulting in lower accuracy in optical character recognition (OCR) systems. Therefore, a crucial preprocessing step is essential to eliminate noise while preserving text…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Giordano Cicchetti , Danilo Comminiello

Fast Diffusion EM: a diffusion model for blind inverse problems with application to deconvolution

Using diffusion models to solve inverse problems is a growing field of research. Current methods assume the degradation to be known and provide impressive results in terms of restoration quality and diversity. In this work, we leverage the…

Computer Vision and Pattern Recognition · Computer Science 2025-06-02 Charles Laroche , Andrés Almansa , Eva Coupete

OCR accuracy improvement on document images through a novel pre-processing approach

Digital camera and mobile document image acquisition are new trends arising in the world of Optical Character Recognition and text detection. In some cases, such process integrates many distortions and produces poorly scanned text or…

Computer Vision and Pattern Recognition · Computer Science 2015-09-14 Abdeslam El Harraj , Naoufal Raissouni