Related papers: Identifier Namespaces in Mathematical Notation

Semantic Document Clustering on Named Entity Features

Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many…

Information Retrieval · Computer Science 2018-07-23 Tru H. Cao , Vuong M. Ngo , Dung T. Hong , Tho T. Quan

Incremental Entity Resolution from Linked Documents

In many government applications we often find that information about entities, such as persons, are available in disparate data sources such as passports, driving licences, bank accounts, and income tax records. Similar scenarios are…

Databases · Computer Science 2014-02-19 Pankaj Malhotra , Puneet Agarwal , Gautam Shroff

Document Clustering based on Topic Maps

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

On the Effect of Semantically Enriched Context Models on Software Modularization

Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies…

Software Engineering · Computer Science 2017-08-08 Amir Saeidi , Jurriaan Hage , Ravi Khadka , Slinger Jansen

Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng

Document clustering using graph based document representation with constraints

Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and…

Information Retrieval · Computer Science 2014-12-08 Muhammad Rafi , Farnaz Amin , Mohammad Shahid Shaikh

Document clustering with evolved multiword search queries

Text clustering holds significant value across various domains due to its ability to identify patterns and group related information. Current approaches which rely heavily on a computed similarity measure between documents are often limited…

Information Retrieval · Computer Science 2025-04-09 Laurence Hirsch , Robin Hirsch , Bayode Ogunleye

Name Disambiguation in Anonymized Graphs using Network Embedding

In real-world, our DNA is unique but many people share names. This phenomenon often causes erroneous aggregation of documents of multiple persons who are namesake of one another. Such mistakes deteriorate the performance of document…

Social and Information Networks · Computer Science 2017-09-12 Baichuan Zhang , Mohammad Al Hasan

Mathematical Language Processing Project

In natural language, words and phrases themselves imply the semantics. In contrast, the meaning of identifiers in mathematical formulae is undefined. Thus scientists must study the context to decode the meaning. The Mathematical Language…

Digital Libraries · Computer Science 2019-07-02 Robert Pagael , Moritz Schubotz

Automated Single-Label Patent Classification using Ensemble Classifiers

Many thousands of patent applications arrive at patent offices around the world every day. One important subtask when a patent application is submitted is to assign one or more classification codes from the complex and hierarchical patent…

Information Retrieval · Computer Science 2022-03-08 Eleni Kamateri , Vasileios Stamatis , Konstantinos Diamantaras , Michail Salampasis

Using Genetic Algorithms for Texts Classification Problems

The avalanche quantity of the information developed by mankind has led to concept of automation of knowledge extraction - Data Mining ([1]). This direction is connected with a wide spectrum of problems - from recognition of the fuzzy set to…

Machine Learning · Computer Science 2009-06-05 A. A. Shumeyko , S. L. Sotnik

A framework for benchmarking clustering algorithms

The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate…

Machine Learning · Computer Science 2023-10-27 Marek Gagolewski

Document Image Coding and Clustering for Script Discrimination

The paper introduces a new method for discrimination of documents given in different scripts. The document is mapped into a uniformly coded text of numerical values. It is derived from the position of the letters in the text line, based on…

Computer Vision and Pattern Recognition · Computer Science 2016-09-22 Darko Brodic , Alessia Amelio , Zoran N. Milivojevic , Milena Jevtic

Cluster Explanation via Polyhedral Descriptions

Clustering is an unsupervised learning problem that aims to partition unlabelled data points into groups with similar features. Traditional clustering algorithms provide limited insight into the groups they find as their main focus is…

Machine Learning · Computer Science 2022-10-18 Connor Lawless , Oktay Gunluk

Automatic Parameter Selection for Non-Redundant Clustering

High-dimensional datasets often contain multiple meaningful clusterings in different subspaces. For example, objects can be clustered either by color, weight, or size, revealing different interpretations of the given dataset. A variety of…

Machine Learning · Computer Science 2025-04-08 Collin Leiber , Dominik Mautz , Claudia Plant , Christian Böhm

Automated Document Indexing via Intelligent Hierarchical Clustering: A Novel Approach

With the rising quantity of textual data available in electronic format, the need to organize it become a highly challenging task. In the present paper, we explore a document organization framework that exploits an intelligent hierarchical…

Information Retrieval · Computer Science 2015-04-02 Rajendra Kumar Roul , Shubham Rohan Asthana , Sanjay Kumar Sahay

Publication venue recommendation using profiles based on clustering

In this paper we study the venue recommendation problem in order to help researchers to identify a journal or conference to submit a given paper. A common approach to tackle this problem is to build profiles defining the scope of each…

Information Retrieval · Computer Science 2024-01-22 Luis M. de Campos , Juan M. Fernández-Luna , Juan F. Huete

Data Structure Lower Bounds for Document Indexing Problems

We study data structure problems related to document indexing and pattern matching queries and our main contribution is to show that the pointer machine model of computation can be extremely useful in proving high and unconditional lower…

Data Structures and Algorithms · Computer Science 2016-04-22 Peyman Afshani , Jesper Sindahl Nielsen

Scientific Dataset Discovery via Topic-level Recommendation

Data intensive research requires the support of appropriate datasets. However, it is often time-consuming to discover usable datasets matching a specific research topic. We formulate the dataset discovery problem on an attributed…

Information Retrieval · Computer Science 2021-06-08 Basmah Altaf , Shichao Pei , Xiangliang Zhang

Clustering Document Parts: Detecting and Characterizing Influence Campaigns from Documents

We propose a novel clustering pipeline to detect and characterize influence campaigns from documents. This approach clusters parts of document, detects clusters that likely reflect an influence campaign, and then identifies documents linked…

Computation and Language · Computer Science 2024-04-30 Zhengxiang Wang , Owen Rambow