Related papers: Confidence-Aware Document OCR Error Detection

Enhancing OCR Performance through Post-OCR Models: Adopting Glyph Embedding for Improved Correction

The study investigates the potential of post-OCR models to overcome limitations in OCR models and explores the impact of incorporating glyph embedding on post-OCR correction performance. In this study, we have developed our own post-OCR…

Computer Vision and Pattern Recognition · Computer Science 2023-08-30 Yung-Hsin Chen , Yuli Zhou

OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set

Since the dawn of the computing era, information has been represented digitally so that it can be processed by electronic computers. Paper books and documents were abundant and widely being published at that time; and hence, there was a…

Computation and Language · Computer Science 2012-04-03 Youssef Bassil , Mohammad Alwani

Seeing Straight: Document Orientation Detection for Efficient OCR

Despite significant advances in document understanding, determining the correct orientation of scanned or photographed documents remains a critical pre-processing step in the real world settings. Accurate rotation correction is essential…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Suranjan Goswami , Abhinav Ravi , Raja Kolla , Ali Faraz , Shaharukh Khan , Akash , Chandra Khatri , Shubham Agarwal

A Cost Efficient Approach to Correct OCR Errors in Large Document Collections

Word error rate of an ocr is often higher than its character error rate. This is especially true when ocrs are designed by recognizing characters. High word accuracies are critical to tasks like the creation of content in digital libraries…

Computer Vision and Pattern Recognition · Computer Science 2019-05-29 Deepayan Das , Jerin Philip , Minesh Mathew , C. V. Jawahar

Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces

Despite advances in Automatic Speech Recognition (ASR), transcription errors persist and require manual correction. Confidence scores, which indicate the certainty of ASR results, could assist users in identifying and correcting errors.…

Human-Computer Interaction · Computer Science 2025-03-20 Korbinian Kuhn , Verena Kersken , Gottfried Zimmermann

Statistical Learning for OCR Text Correction

The accuracy of Optical Character Recognition (OCR) is crucial to the success of subsequent applications used in text analyzing pipeline. Recent models of OCR post-processing significantly improve the quality of OCR-generated text, but are…

Computer Vision and Pattern Recognition · Computer Science 2016-11-22 Jie Mei , Aminul Islam , Yajing Wu , Abidalrahman Moh'd , Evangelos E. Milios

Confidence Prediction for Lexicon-Free OCR

Having a reliable accuracy score is crucial for real world applications of OCR, since such systems are judged by the number of false readings. Lexicon-based OCR systems, which deal with what is essentially a multi-class classification…

Computer Vision and Pattern Recognition · Computer Science 2018-07-17 Noam Mor , Lior Wolf

3D Rendering Framework for Data Augmentation in Optical Character Recognition

In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset.…

Computer Vision and Pattern Recognition · Computer Science 2022-09-30 Andreas Spruck , Maximiliane Hawesch , Anatol Maier , Christian Riess , Jürgen Seiler , André Kaup

OCR Error Correction Using Character Correction and Feature-Based Word Classification

This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast…

Information Retrieval · Computer Science 2020-06-11 Ido Kissos , Nachum Dershowitz

Design of an Optical Character Recognition System for Camera-based Handheld Devices

This paper presents a complete Optical Character Recognition (OCR) system for camera captured image/graphics embedded textual documents for handheld devices. At first, text regions are extracted and skew corrected. Then, these regions are…

Computer Vision and Pattern Recognition · Computer Science 2011-09-16 Ayatullah Faruk Mollah , Nabamita Majumder , Subhadip Basu , Mita Nasipuri

Unknown-box Approximation to Improve Optical Character Recognition Performance

Optical character recognition (OCR) is a widely used pattern recognition application in numerous domains. There are several feature-rich, general-purpose OCR solutions available for consumers, which can provide moderate to excellent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Ayantha Randika , Nilanjan Ray , Xiao Xiao , Allegra Latimer

Neural OCR Post-Hoc Correction of Historical Corpora

Optical character recognition (OCR) is crucial for a deeper access to historical collections. OCR needs to account for orthographic variations, typefaces, or language evolution (i.e., new letters, word spellings), as the main source of…

Computation and Language · Computer Science 2021-02-02 Lijun Lyu , Maria Koutraki , Martin Krickl , Besnik Fetahu

Bounding the Probability of Error for High Precision Recognition

We consider models for which it is important, early in processing, to estimate some variables with high precision, but perhaps at relatively low rates of recall. If some variables can be identified with near certainty, then they can be…

Computer Vision and Pattern Recognition · Computer Science 2009-07-03 Andrew Kae , Gary B. Huang , Erik Learned-Miller

Detection Masking for Improved OCR on Noisy Documents

Optical Character Recognition (OCR), the task of extracting textual information from scanned documents is a vital and broadly used technology for digitizing and indexing physical documents. Existing technologies perform well for clean…

Computer Vision and Pattern Recognition · Computer Science 2022-05-18 Daniel Rotman , Ophir Azulai , Inbar Shapira , Yevgeny Burshtein , Udi Barzelay

Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR

Conventional optical character recognition (OCR) techniques segmented each character and then recognized. This made them prone to error in character segmentation, and devoid of context to exploit language models. Advances in sequence to…

Computer Vision and Pattern Recognition · Computer Science 2025-09-01 Shashank Vempati , Nishit Anand , Gaurav Talebailkar , Arpan Garai , Chetan Arora

When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation

Industrial Retrieval-Augmented Generation (RAG) systems depend on optical character recognition (OCR) to transform visual documents into text. Existing OCR benchmarks rely on character-level metrics, which inadequately measure downstream…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Lin Sun , Wang Dexian , Jingang Huang , Linglin Zhang , Change Jia , Zhengwei Cheng , Xiangzheng Zhang

Quality of OCR for Degraded Text Images

Commercial OCR packages work best with high-quality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality, or because of excessive photocopying. The ability to…

Digital Libraries · Computer Science 2007-05-23 Roger T. Hartley , Kathleen Crumpton

OCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion

With the advent of digital optical scanners, a lot of paper-based books, textbooks, magazines, articles, and documents are being transformed into an electronic version that can be manipulated by a computer. For this purpose, OCR, short for…

Computation and Language · Computer Science 2012-04-03 Youssef Bassil , Mohammad Alwani

Text Detection Forgot About Document OCR

Detection and recognition of text from scans and other images, commonly denoted as Optical Character Recognition (OCR), is a widely used form of automated document processing with a number of methods available. Yet OCR systems still do not…

Computer Vision and Pattern Recognition · Computer Science 2023-01-24 Krzysztof Olejniczak , Milan Šulc

Text Change Detection in Multilingual Documents Using Image Comparison

Document comparison typically relies on optical character recognition (OCR) as its core technology. However, OCR requires the selection of appropriate language models for each document and the performance of multilingual or hybrid models…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Doyoung Park , Naresh Reddy Yarram , Sunjin Kim , Minkyu Kim , Seongho Cho , Taehee Lee