Related papers: OCR Post-Processing Error Correction Algorithm usi…

OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set

Since the dawn of the computing era, information has been represented digitally so that it can be processed by electronic computers. Paper books and documents were abundant and widely being published at that time; and hence, there was a…

Computation and Language · Computer Science 2012-04-03 Youssef Bassil , Mohammad Alwani

Statistical Learning for OCR Text Correction

The accuracy of Optical Character Recognition (OCR) is crucial to the success of subsequent applications used in text analyzing pipeline. Recent models of OCR post-processing significantly improve the quality of OCR-generated text, but are…

Computer Vision and Pattern Recognition · Computer Science 2016-11-22 Jie Mei , Aminul Islam , Yajing Wu , Abidalrahman Moh'd , Evangelos E. Milios

Post-OCR Document Correction with large Ensembles of Character Sequence-to-Sequence Models

In this paper, we propose a novel method based on character sequence-to-sequence models to correct documents already processed with Optical Character Recognition (OCR) systems. The main contribution of this paper is a set of strategies to…

Computation and Language · Computer Science 2022-01-26 Juan Ramirez-Orta , Eduardo Xamena , Ana Maguitman , Evangelos Milios , Axel J. Soto

A Cost Efficient Approach to Correct OCR Errors in Large Document Collections

Word error rate of an ocr is often higher than its character error rate. This is especially true when ocrs are designed by recognizing characters. High word accuracies are critical to tasks like the creation of content in digital libraries…

Computer Vision and Pattern Recognition · Computer Science 2019-05-29 Deepayan Das , Jerin Philip , Minesh Mathew , C. V. Jawahar

Neural OCR Post-Hoc Correction of Historical Corpora

Optical character recognition (OCR) is crucial for a deeper access to historical collections. OCR needs to account for orthographic variations, typefaces, or language evolution (i.e., new letters, word spellings), as the main source of…

Computation and Language · Computer Science 2021-02-02 Lijun Lyu , Maria Koutraki , Martin Krickl , Besnik Fetahu

A Simple and Practical Approach to Improve Misspellings in OCR Text

The focus of our paper is the identification and correction of non-word errors in OCR text. Such errors may be the result of incorrect insertion, deletion, or substitution of a character, or the transposition of two adjacent characters…

Computation and Language · Computer Science 2021-06-24 Junxia Lin , Johannes Ledolter

Post-Editing Error Correction Algorithm for Speech Recognition using Bing Spelling Suggestion

ASR short for Automatic Speech Recognition is the process of converting a spoken speech into text that can be manipulated by a computer. Although ASR has several applications, it is still erroneous and imprecise especially if used in a…

Computation and Language · Computer Science 2012-03-26 Youssef Bassil , Mohammad Alwani

A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing

Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Aishik Rakshit , Samyak Mehta , Anirban Dasgupta

OCR accuracy improvement on document images through a novel pre-processing approach

Digital camera and mobile document image acquisition are new trends arising in the world of Optical Character Recognition and text detection. In some cases, such process integrates many distortions and produces poorly scanned text or…

Computer Vision and Pattern Recognition · Computer Science 2015-09-14 Abdeslam El Harraj , Naoufal Raissouni

Context-sensitive Spelling Correction Using Google Web 1T 5-Gram Information

In computing, spell checking is the process of detecting and sometimes providing spelling suggestions for incorrectly spelled words in a text. Basically, a spell checker is a computer program that uses a dictionary of words to perform spell…

Computation and Language · Computer Science 2012-04-27 Youssef Bassil , Mohammad Alwani

OCR Post Correction for Endangered Language Texts

There is little to no data available to build natural language processing models for most endangered languages. However, textual data in these languages often exists in formats that are not machine-readable, such as paper books and scanned…

Computation and Language · Computer Science 2020-11-12 Shruti Rijhwani , Antonios Anastasopoulos , Graham Neubig

OCR Error Correction Using Character Correction and Feature-Based Word Classification

This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast…

Information Retrieval · Computer Science 2020-06-11 Ido Kissos , Nachum Dershowitz

Optimizing the Neural Network Training for OCR Error Correction of Historical Hebrew Texts

Over the past few decades, large archives of paper-based documents such as books and newspapers have been digitized using Optical Character Recognition. This technology is error-prone, especially for historical documents. To correct OCR…

Computation and Language · Computer Science 2023-08-01 Omri Suissa , Avshalom Elmalech , Maayan Zhitomirsky-Geffet

A Novel Approach to Skew-Detection and Correction of English Alphabets for OCR

Optical Character Recognition has been a challenging field in the advent of digital computers. It is needed where information is to be readable both to humans and machines. The process of OCR is composed of a set of pre and post processing…

Computer Vision and Pattern Recognition · Computer Science 2018-01-04 Chinmay Chinara , Nishant Nath , Subhajeet Mishra , Sangram Keshari Sahoo , Farida Ashraf Ali

Text Detection Forgot About Document OCR

Detection and recognition of text from scans and other images, commonly denoted as Optical Character Recognition (OCR), is a widely used form of automated document processing with a number of methods available. Yet OCR systems still do not…

Computer Vision and Pattern Recognition · Computer Science 2023-01-24 Krzysztof Olejniczak , Milan Šulc

Unknown-box Approximation to Improve Optical Character Recognition Performance

Optical character recognition (OCR) is a widely used pattern recognition application in numerous domains. There are several feature-rich, general-purpose OCR solutions available for consumers, which can provide moderate to excellent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Ayantha Randika , Nilanjan Ray , Xiao Xiao , Allegra Latimer

Parallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset

Spell-checking is the process of detecting and sometimes providing suggestions for incorrectly spelled words in a text. Basically, the larger the dictionary of a spell-checker is, the higher is the error detection rate; otherwise,…

Computation and Language · Computer Science 2012-04-03 Youssef Bassil

A Tool for Facilitating OCR Postediting in Historical Documents

Optical character recognition (OCR) for historical documents is a complex procedure subject to a unique set of material issues, including inconsistencies in typefaces and low quality scanning. Consequently, even the most sophisticated OCR…

Computation and Language · Computer Science 2020-04-27 Alberto Poncelas , Mohammad Aboomar , Jan Buts , James Hadley , Andy Way

Recognition of Text Image Using Multilayer Perceptron

The biggest challenge in the field of image processing is to recognize documents both in printed and handwritten format. Optical Character Recognition OCR is a type of document image analysis where scanned digital image that contains either…

Computer Vision and Pattern Recognition · Computer Science 2016-12-05 Singh Vijendra , Nisha Vasudeva , Hem Jyotsana Parashar

Handwritten Text Recognition Using Convolutional Neural Network

OCR (Optical Character Recognition) is a technology that offers comprehensive alphanumeric recognition of handwritten and printed characters at electronic speed by merely scanning the document. Recently, the understanding of visual data has…

Computer Vision and Pattern Recognition · Computer Science 2023-07-12 Atman Mishra , A. Sharath Ram , Kavyashree C