English
Related papers

Related papers: OCR Post-Processing Error Correction Algorithm usi…

200 papers

Since the dawn of the computing era, information has been represented digitally so that it can be processed by electronic computers. Paper books and documents were abundant and widely being published at that time; and hence, there was a…

Computation and Language · Computer Science 2012-04-03 Youssef Bassil , Mohammad Alwani

The accuracy of Optical Character Recognition (OCR) is crucial to the success of subsequent applications used in text analyzing pipeline. Recent models of OCR post-processing significantly improve the quality of OCR-generated text, but are…

Computer Vision and Pattern Recognition · Computer Science 2016-11-22 Jie Mei , Aminul Islam , Yajing Wu , Abidalrahman Moh'd , Evangelos E. Milios

In this paper, we propose a novel method based on character sequence-to-sequence models to correct documents already processed with Optical Character Recognition (OCR) systems. The main contribution of this paper is a set of strategies to…

Computation and Language · Computer Science 2022-01-26 Juan Ramirez-Orta , Eduardo Xamena , Ana Maguitman , Evangelos Milios , Axel J. Soto

Word error rate of an ocr is often higher than its character error rate. This is especially true when ocrs are designed by recognizing characters. High word accuracies are critical to tasks like the creation of content in digital libraries…

Computer Vision and Pattern Recognition · Computer Science 2019-05-29 Deepayan Das , Jerin Philip , Minesh Mathew , C. V. Jawahar

Optical character recognition (OCR) is crucial for a deeper access to historical collections. OCR needs to account for orthographic variations, typefaces, or language evolution (i.e., new letters, word spellings), as the main source of…

Computation and Language · Computer Science 2021-02-02 Lijun Lyu , Maria Koutraki , Martin Krickl , Besnik Fetahu

The focus of our paper is the identification and correction of non-word errors in OCR text. Such errors may be the result of incorrect insertion, deletion, or substitution of a character, or the transposition of two adjacent characters…

Computation and Language · Computer Science 2021-06-24 Junxia Lin , Johannes Ledolter

ASR short for Automatic Speech Recognition is the process of converting a spoken speech into text that can be manipulated by a computer. Although ASR has several applications, it is still erroneous and imprecise especially if used in a…

Computation and Language · Computer Science 2012-03-26 Youssef Bassil , Mohammad Alwani

Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Aishik Rakshit , Samyak Mehta , Anirban Dasgupta

Digital camera and mobile document image acquisition are new trends arising in the world of Optical Character Recognition and text detection. In some cases, such process integrates many distortions and produces poorly scanned text or…

Computer Vision and Pattern Recognition · Computer Science 2015-09-14 Abdeslam El Harraj , Naoufal Raissouni

In computing, spell checking is the process of detecting and sometimes providing spelling suggestions for incorrectly spelled words in a text. Basically, a spell checker is a computer program that uses a dictionary of words to perform spell…

Computation and Language · Computer Science 2012-04-27 Youssef Bassil , Mohammad Alwani

There is little to no data available to build natural language processing models for most endangered languages. However, textual data in these languages often exists in formats that are not machine-readable, such as paper books and scanned…

Computation and Language · Computer Science 2020-11-12 Shruti Rijhwani , Antonios Anastasopoulos , Graham Neubig

This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast…

Information Retrieval · Computer Science 2020-06-11 Ido Kissos , Nachum Dershowitz

Over the past few decades, large archives of paper-based documents such as books and newspapers have been digitized using Optical Character Recognition. This technology is error-prone, especially for historical documents. To correct OCR…

Computation and Language · Computer Science 2023-08-01 Omri Suissa , Avshalom Elmalech , Maayan Zhitomirsky-Geffet

Optical Character Recognition has been a challenging field in the advent of digital computers. It is needed where information is to be readable both to humans and machines. The process of OCR is composed of a set of pre and post processing…

Computer Vision and Pattern Recognition · Computer Science 2018-01-04 Chinmay Chinara , Nishant Nath , Subhajeet Mishra , Sangram Keshari Sahoo , Farida Ashraf Ali

Detection and recognition of text from scans and other images, commonly denoted as Optical Character Recognition (OCR), is a widely used form of automated document processing with a number of methods available. Yet OCR systems still do not…

Computer Vision and Pattern Recognition · Computer Science 2023-01-24 Krzysztof Olejniczak , Milan Šulc

Optical character recognition (OCR) is a widely used pattern recognition application in numerous domains. There are several feature-rich, general-purpose OCR solutions available for consumers, which can provide moderate to excellent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Ayantha Randika , Nilanjan Ray , Xiao Xiao , Allegra Latimer

Spell-checking is the process of detecting and sometimes providing suggestions for incorrectly spelled words in a text. Basically, the larger the dictionary of a spell-checker is, the higher is the error detection rate; otherwise,…

Computation and Language · Computer Science 2012-04-03 Youssef Bassil

Optical character recognition (OCR) for historical documents is a complex procedure subject to a unique set of material issues, including inconsistencies in typefaces and low quality scanning. Consequently, even the most sophisticated OCR…

Computation and Language · Computer Science 2020-04-27 Alberto Poncelas , Mohammad Aboomar , Jan Buts , James Hadley , Andy Way

The biggest challenge in the field of image processing is to recognize documents both in printed and handwritten format. Optical Character Recognition OCR is a type of document image analysis where scanned digital image that contains either…

Computer Vision and Pattern Recognition · Computer Science 2016-12-05 Singh Vijendra , Nisha Vasudeva , Hem Jyotsana Parashar

OCR (Optical Character Recognition) is a technology that offers comprehensive alphanumeric recognition of handwritten and printed characters at electronic speed by merely scanning the document. Recently, the understanding of visual data has…

Computer Vision and Pattern Recognition · Computer Science 2023-07-12 Atman Mishra , A. Sharath Ram , Kavyashree C
‹ Prev 1 2 3 10 Next ›