English
Related papers

Related papers: Lexically Aware Semi-Supervised Learning for OCR P…

200 papers

We propose a post-OCR text correction approach for digitising texts in Romanised Sanskrit. Owing to the lack of resources our approach uses OCR models trained for other languages written in Roman. Currently, there exists no dataset…

Computation and Language · Computer Science 2018-09-10 Amrith Krishna , Bodhisattwa Prasad Majumder , Rajesh Shreedhar Bhat , Pawan Goyal

There is little to no data available to build natural language processing models for most endangered languages. However, textual data in these languages often exists in formats that are not machine-readable, such as paper books and scanned…

Computation and Language · Computer Science 2020-11-12 Shruti Rijhwani , Antonios Anastasopoulos , Graham Neubig

Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Aishik Rakshit , Samyak Mehta , Anirban Dasgupta

Historical corpora are known to contain errors introduced by OCR (optical character recognition) methods used in the digitization process, often said to be degrading the performance of NLP systems. Correcting these errors manually is a…

Computation and Language · Computer Science 2020-11-20 Quan Duong , Mika Hämäläinen , Simon Hengchen

A common approach for improving OCR quality is a post-processing step based on models correcting misdetected characters and tokens. These models are typically trained on aligned pairs of OCR read text and their manually corrected…

Computation and Language · Computer Science 2019-06-27 Kai Hakala , Aleksi Vesanto , Niko Miekka , Tapio Salakoski , Filip Ginter

Optical character recognition (OCR) is a widely used pattern recognition application in numerous domains. There are several feature-rich, general-purpose OCR solutions available for consumers, which can provide moderate to excellent…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Ayantha Randika , Nilanjan Ray , Xiao Xiao , Allegra Latimer

Digital camera and mobile document image acquisition are new trends arising in the world of Optical Character Recognition and text detection. In some cases, such process integrates many distortions and produces poorly scanned text or…

Computer Vision and Pattern Recognition · Computer Science 2015-09-14 Abdeslam El Harraj , Naoufal Raissouni

Optical character recognition (OCR) is crucial for a deeper access to historical collections. OCR needs to account for orthographic variations, typefaces, or language evolution (i.e., new letters, word spellings), as the main source of…

Computation and Language · Computer Science 2021-02-02 Lijun Lyu , Maria Koutraki , Martin Krickl , Besnik Fetahu

Over the past few decades, large archives of paper-based documents such as books and newspapers have been digitized using Optical Character Recognition. This technology is error-prone, especially for historical documents. To correct OCR…

Computation and Language · Computer Science 2023-08-01 Omri Suissa , Avshalom Elmalech , Maayan Zhitomirsky-Geffet

Optical character recognition (OCR) and multilingual text understanding remain major failure modes of multimodal large language models (MLLMs), particularly in real-world images containing cluttered layouts, small fonts, blur, occlusion,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Qinwu Xu , Yifan Jiang , Haoyu Ren

Contrary to popular belief, Optical Character Recognition (OCR) remains a challenging problem when text occurs in unconstrained environments, like natural scenes, due to geometrical distortions, complex backgrounds, and diverse fonts. In…

Computer Vision and Pattern Recognition · Computer Science 2019-06-06 Marcin Namysl , Iuliu Konya

While OCR has been used in various applications, its output is not always accurate, leading to misfit words. This research work focuses on improving the optical character recognition (OCR) with ML techniques with integration of OCR with…

Computer Vision and Pattern Recognition · Computer Science 2023-04-18 Abhishek Bamotra , Phani Krishna Uppala

Supervised Dictionary Learning has gained much interest in the recent decade and has shown significant performance improvements in image classification. However, in general, supervised learning needs a large number of labelled samples per…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Khanh-Hung Tran , Fred-Maurice Ngole-Mboula , Jean-Luc Starck , Vincent Prost

This paper explores the application of synthetic data in the post-OCR domain on multiple fronts by conducting experiments to assess the impact of data volume, augmentation, and synthetic data generation methods on model performance.…

Computation and Language · Computer Science 2024-08-14 Shuhao Guan , Derek Greene

With the advent of digital optical scanners, a lot of paper-based books, textbooks, magazines, articles, and documents are being transformed into an electronic version that can be manipulated by a computer. For this purpose, OCR, short for…

Computation and Language · Computer Science 2012-04-03 Youssef Bassil , Mohammad Alwani

Many real-world applications involve the use of Optical Character Recognition (OCR) engines to transform handwritten images into transcripts on which downstream Natural Language Processing (NLP) models are applied. In this process, OCR…

Computation and Language · Computer Science 2021-07-16 Guowei Xu , Wenbiao Ding , Weiping Fu , Zhongqin Wu , Zitao Liu

The success of deep learning in computer vision is rooted in the ability of deep networks to scale up model complexity as demanded by challenging visual tasks. As complexity is increased, so is the need for large amounts of labeled data to…

Computer Vision and Pattern Recognition · Computer Science 2017-08-22 Gustav Larsson

Recent advances in unsupervised representation learning have demonstrated the impact of pretraining on large amounts of read speech. We adapt these techniques for domain adaptation in low-resource -- both in terms of data and compute --…

Computation and Language · Computer Science 2022-02-14 Chak-Fai Li , Francis Keith , William Hartmann , Matthew Snover

An ongoing challenge in current natural language processing is how its major advancements tend to disproportionately favor resource-rich languages, leaving a significant number of under-resourced languages behind. Due to the lack of…

Computation and Language · Computer Science 2023-02-13 Ruoyu Xie , Antonios Anastasopoulos

Linked Data is used in various fields as a new way of structuring and connecting data. Cultural heritage institutions have been using linked data to improve archival descriptions and facilitate the discovery of information. Most archival…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Mariana Dias , Carla Teixeira Lopes
‹ Prev 1 2 3 10 Next ›