Related papers: Page image classification for content-specific dat…

Categorizing ancient documents

The analysis of historical documents is still a topical issue given the importance of information that can be extracted and also the importance given by the institutions to preserve their heritage. The main idea in order to characterize the…

Computer Vision and Pattern Recognition · Computer Science 2013-08-30 Nizar Zaghden , Remy Mullot , Mohamed Adel Alimi

Webpage Segmentation for Extracting Images and Their Surrounding Contextual Information

Web images come in hand with valuable contextual information. Although this information has long been mined for various uses such as image annotation, clustering of images, inference of image semantic content, etc., insufficient attention…

Multimedia · Computer Science 2020-05-21 F. Fauzi , H. J. Long , M. Belkhatir

Handwriting Classification for the Analysis of Art-Historical Documents

Digitized archives contain and preserve the knowledge of generations of scholars in millions of documents. The size of these archives calls for automatic analysis since a manual analysis by specialists is often too expensive. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2020-11-05 Christian Bartz , Hendrik Rätz , Christoph Meinel

A Survey of Historical Document Image Datasets

This paper presents a systematic literature review of image datasets for document image analysis, focusing on historical documents, such as handwritten manuscripts and early prints. Finding appropriate datasets for historical document…

Computer Vision and Pattern Recognition · Computer Science 2022-11-01 Konstantina Nikolaidou , Mathias Seuret , Hamam Mokayed , Marcus Liwicki

Historical Document Processing: Historical Document Processing: A Survey of Techniques, Tools, and Trends

Historical Document Processing is the process of digitizing written material from the past for future use by historians and other scholars. It incorporates algorithms and software tools from various subfields of computer science, including…

Computer Vision and Pattern Recognition · Computer Science 2020-09-14 James P. Philips , Nasseh Tabrizi

Classification of Documents Extracted from Images with Optical Character Recognition Methods

Over the past decade, machine learning methods have given us driverless cars, voice recognition, effective web search, and a much better understanding of the human genome. Machine learning is so common today that it is used dozens of times…

Computer Vision and Pattern Recognition · Computer Science 2021-06-22 Omer Aydin

Web Content Classification: A Survey

As the information contained within the web is increasing day by day, organizing this information could be a necessary requirement.The data mining process is to extract information from a data set and transform it into an understandable…

Information Retrieval · Computer Science 2014-05-22 Prabhjot Kaur

Document classification methods

Information on different fields which are collected by users requires appropriate management and organization to be structured in a standard way and retrieved fast and more easily. Document classification is a conventional method to…

Information Retrieval · Computer Science 2019-09-18 Madjid Khalilian , Shiva Hassanzadeh

Digitization of Document and Information Extraction using OCR

Retrieving accurate details from documents is a crucial task, especially when handling a combination of scanned images and native digital formats. This document presents a combined framework for text extraction that merges Optical Character…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Rasha Sinha , Rekha B S

An Analytical Study of different Document Image Binarization Methods

Document image has been the area of research for a couple of decades because of its potential application in the area of text recognition, line recognition or any other shape recognition from the image. For most of these purposes…

Computer Vision and Pattern Recognition · Computer Science 2015-02-02 Mahua Nandy , Satadal Saha

Convolutional Neural Networks for Page Segmentation of Historical Document Images

This paper presents a Convolutional Neural Network (CNN) based page segmentation method for handwritten historical document images. We consider page segmentation as a pixel labeling problem, i.e., each pixel is classified as one of the…

Computer Vision and Pattern Recognition · Computer Science 2017-04-10 Kai Chen , Mathias Seuret

Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features

In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As a first step, the workflow involves scanning and Optical…

Computation and Language · Computer Science 2019-03-26 Gregor Wiedemann , Gerhard Heyer

Image Classification and Optimized Image Reproduction

By taking into account the properties and limitations of the human visual system, images can be more efficiently compressed, colors more accurately reproduced, prints better rendered. To show all these advantages in this paper new adapted…

Computer Vision and Pattern Recognition · Computer Science 2015-03-13 Jaswinder Singh Dilawari , Ravinder Khanna

Text Classification: A Perspective of Deep Learning Methods

In recent years, with the rapid development of information on the Internet, the number of complex texts and documents has increased exponentially, which requires a deeper understanding of deep learning methods in order to accurately…

Computation and Language · Computer Science 2023-09-26 Zhongwei Wan

Automatic Recognition of Learning Resource Category in a Digital Library

Digital libraries often face the challenge of processing a large volume of diverse document types. The manual collection and tagging of metadata can be a time-consuming and error-prone task. To address this, we aim to develop an automatic…

Digital Libraries · Computer Science 2024-01-24 Soumya Banerjee , Debarshi Kumar Sanyal , Samiran Chattopadhyay , Plaban Kumar Bhowmick , Partha Pratim Das

Web Page Categorization Using Artificial Neural Networks

Web page categorization is one of the challenging tasks in the world of ever increasing web technologies. There are many ways of categorization of web pages based on different approach and features. This paper proposes a new dimension in…

Neural and Evolutionary Computing · Computer Science 2010-09-28 S. M. Kamruzzaman

Image understanding and the web

The contextual information of Web images is investigated to address the issue of characterizing their content with semantic descriptors and therefore bridge the semantic gap, i.e. the gap between their automated low-level representation in…

Information Retrieval · Computer Science 2020-05-06 Fariza Fauzi , Mohammed Belkhatir

Systematic review of image segmentation using complex networks

This review presents various image segmentation methods using complex networks. Image segmentation is one of the important steps in image analysis as it helps analyze and understand complex images. At first, it has been tried to classify…

Computer Vision and Pattern Recognition · Computer Science 2024-01-08 Amin Rezaei , Fatemeh Asadi

A Survey on Figure Classification Techniques in Scientific Documents

Figures visually represent an essential piece of information and provide an effective means to communicate scientific facts. Recently there have been many efforts toward extracting data directly from figures, specifically from tables,…

Information Retrieval · Computer Science 2023-07-13 Anurag Dhote , Mohammed Javed , David S Doermann

Recognition of Text Image Using Multilayer Perceptron

The biggest challenge in the field of image processing is to recognize documents both in printed and handwritten format. Optical Character Recognition OCR is a type of document image analysis where scanned digital image that contains either…

Computer Vision and Pattern Recognition · Computer Science 2016-12-05 Singh Vijendra , Nisha Vasudeva , Hem Jyotsana Parashar