Related papers: Document Image Coding and Clustering for Script Di…

An Approach to the Analysis of the South Slavic Medieval Labels Using Image Texture

The paper presents a new script classification method for the discrimination of the South Slavic medieval labels. It consists in the textural analysis of the script types. In the first step, each letter is coded by the equivalent script…

Computer Vision and Pattern Recognition · Computer Science 2015-09-08 Darko Brodic , Alessia Amelio , Zoran N. Milivojevic

Genetic Programming for Document Segmentation and Region Classification Using Discipulus

Document segmentation is a method of rending the document into distinct regions. A document is an assortment of information and a standard mode of conveying information to others. Pursuance of data from documents involves ton of human…

Computer Vision and Pattern Recognition · Computer Science 2013-03-05 N. Priyadharshini , M. S. Vijaya

Document clustering using graph based document representation with constraints

Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and…

Information Retrieval · Computer Science 2014-12-08 Muhammad Rafi , Farnaz Amin , Mohammad Shahid Shaikh

Document classification methods

Information on different fields which are collected by users requires appropriate management and organization to be structured in a standard way and retrieved fast and more easily. Document classification is a conventional method to…

Information Retrieval · Computer Science 2019-09-18 Madjid Khalilian , Shiva Hassanzadeh

Document Clustering based on Topic Maps

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

Experimental Estimation of Number of Clusters Based on Cluster Quality

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering…

Information Retrieval · Computer Science 2015-03-12 G. Hannah Grace , Kalyani Desikan

Clustering Document Parts: Detecting and Characterizing Influence Campaigns from Documents

We propose a novel clustering pipeline to detect and characterize influence campaigns from documents. This approach clusters parts of document, detects clusters that likely reflect an influence campaign, and then identifies documents linked…

Computation and Language · Computer Science 2024-04-30 Zhengxiang Wang , Owen Rambow

Analysis of the South Slavic Scripts by Run-Length Features of the Image Texture

The paper proposes an algorithm for the script recognition based on the texture characteristics. The image texture is achieved by coding each letter with the equivalent script type (number code) according to its position in the text line.…

Computer Vision and Pattern Recognition · Computer Science 2015-09-01 Darko Brodic , Zoran N. Milivojevic , Alessia Amelio

Categorizing ancient documents

The analysis of historical documents is still a topical issue given the importance of information that can be extracted and also the importance given by the institutions to preserve their heritage. The main idea in order to characterize the…

Computer Vision and Pattern Recognition · Computer Science 2013-08-30 Nizar Zaghden , Remy Mullot , Mohamed Adel Alimi

Source Printer Classification using Printer Specific Local Texture Descriptor

The knowledge of source printer can help in printed text document authentication, copyright ownership, and provide important clues about the author of a fraudulent document along with his/her potential means and motives. Development of…

Multimedia · Computer Science 2019-06-18 Sharad Joshi , Nitin Khanna

Combining Morphological and Histogram based Text Line Segmentation in the OCR Context

Text line segmentation is one of the pre-stages of modern optical character recognition systems. The algorithmic approach proposed by this paper has been designed for this exact purpose. Its main characteristic is the combination of two…

Computer Vision and Pattern Recognition · Computer Science 2023-06-22 Pit Schneider

Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language

In this paper, we show how selecting and combining encodings of natural and mathematical language affect classification and clustering of documents with mathematical content. We demonstrate this by using sets of documents, sections, and…

Digital Libraries · Computer Science 2020-05-25 Philipp Scharpf , Moritz Schubotz , Abdou Youssef , Felix Hamborg , Norman Meuschke , Bela Gipp

A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction

Typography and layout lead to the hierarchical organisation of text in words, text lines, paragraphs. This inherent structure is a key property of text in any script and language, which has nonetheless been minimally leveraged by existing…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Lluis Gomez , Dimosthenis Karatzas

Content-based Text Categorization using Wikitology

A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assign a real number between 0 and 1 to a pair of documents,…

Information Retrieval · Computer Science 2012-08-20 Muhammad Rafi , Sundus Hassan , Mohammad Shahid Shaikh

Recognizing Handwriting Styles in a Historical Scanned Document Using Unsupervised Fuzzy Clustering

The forensic attribution of the handwriting in a digitized document to multiple scribes is a challenging problem of high dimensionality. Unique handwriting styles may be dissimilar in a blend of several factors including character size,…

Computer Vision and Pattern Recognition · Computer Science 2023-06-30 Sriparna Majumdar , Aaron Brick

Discrimination of English to other Indian languages (Kannada and Hindi) for OCR system

India is a multilingual multi-script country. In every state of India there are two languages one is state local language and the other is English. For example in Andhra Pradesh, a state in India, the document may contain text words in…

Computer Vision and Pattern Recognition · Computer Science 2012-05-11 Ankit Kumar , Tushar Patnaik , Vivek Kr Verma

Text Classification and Distributional features techniques in Datamining and Warehousing

Text Categorization is traditionally done by using the term frequency and inverse document frequency.This type of method is not very good because, some words which are not so important may appear in the document .The term frequency of…

Information Retrieval · Computer Science 2016-11-25 Srikanth Bethu , G Charless Babu , J Vinoda , E Priyadarshini , M Raghavendra rao

A Survey on optimization approaches to text document clustering

Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that…

Information Retrieval · Computer Science 2014-01-13 R. Jensi , Dr. G. Wiselin Jiji

Document clustering with evolved multiword search queries

Text clustering holds significant value across various domains due to its ability to identify patterns and group related information. Current approaches which rely heavily on a computed similarity measure between documents are often limited…

Information Retrieval · Computer Science 2025-04-09 Laurence Hirsch , Robin Hirsch , Bayode Ogunleye

Interpretable Text-Guided Image Clustering via Iterative Search

Traditional clustering methods aim to group unlabeled data points based on their similarity to each other. However, clustering, in the absence of additional information, is an ill-posed problem as there may be many different, yet equally…

Computer Vision and Pattern Recognition · Computer Science 2025-09-10 Bingchen Zhao , Oisin Mac Aodha