Related papers: Page image classification for content-specific dat…
The analysis of historical documents is still a topical issue given the importance of information that can be extracted and also the importance given by the institutions to preserve their heritage. The main idea in order to characterize the…
Web images come in hand with valuable contextual information. Although this information has long been mined for various uses such as image annotation, clustering of images, inference of image semantic content, etc., insufficient attention…
Digitized archives contain and preserve the knowledge of generations of scholars in millions of documents. The size of these archives calls for automatic analysis since a manual analysis by specialists is often too expensive. In this paper,…
This paper presents a systematic literature review of image datasets for document image analysis, focusing on historical documents, such as handwritten manuscripts and early prints. Finding appropriate datasets for historical document…
Historical Document Processing is the process of digitizing written material from the past for future use by historians and other scholars. It incorporates algorithms and software tools from various subfields of computer science, including…
Over the past decade, machine learning methods have given us driverless cars, voice recognition, effective web search, and a much better understanding of the human genome. Machine learning is so common today that it is used dozens of times…
As the information contained within the web is increasing day by day, organizing this information could be a necessary requirement.The data mining process is to extract information from a data set and transform it into an understandable…
Information on different fields which are collected by users requires appropriate management and organization to be structured in a standard way and retrieved fast and more easily. Document classification is a conventional method to…
Retrieving accurate details from documents is a crucial task, especially when handling a combination of scanned images and native digital formats. This document presents a combined framework for text extraction that merges Optical Character…
Document image has been the area of research for a couple of decades because of its potential application in the area of text recognition, line recognition or any other shape recognition from the image. For most of these purposes…
This paper presents a Convolutional Neural Network (CNN) based page segmentation method for handwritten historical document images. We consider page segmentation as a pixel labeling problem, i.e., each pixel is classified as one of the…
In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As a first step, the workflow involves scanning and Optical…
By taking into account the properties and limitations of the human visual system, images can be more efficiently compressed, colors more accurately reproduced, prints better rendered. To show all these advantages in this paper new adapted…
In recent years, with the rapid development of information on the Internet, the number of complex texts and documents has increased exponentially, which requires a deeper understanding of deep learning methods in order to accurately…
Digital libraries often face the challenge of processing a large volume of diverse document types. The manual collection and tagging of metadata can be a time-consuming and error-prone task. To address this, we aim to develop an automatic…
Web page categorization is one of the challenging tasks in the world of ever increasing web technologies. There are many ways of categorization of web pages based on different approach and features. This paper proposes a new dimension in…
The contextual information of Web images is investigated to address the issue of characterizing their content with semantic descriptors and therefore bridge the semantic gap, i.e. the gap between their automated low-level representation in…
This review presents various image segmentation methods using complex networks. Image segmentation is one of the important steps in image analysis as it helps analyze and understand complex images. At first, it has been tried to classify…
Figures visually represent an essential piece of information and provide an effective means to communicate scientific facts. Recently there have been many efforts toward extracting data directly from figures, specifically from tables,…
The biggest challenge in the field of image processing is to recognize documents both in printed and handwritten format. Optical Character Recognition OCR is a type of document image analysis where scanned digital image that contains either…