English
Related papers

Related papers: Document Clustering based on Topic Maps

200 papers

Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional…

Information Retrieval · Computer Science 2012-01-11 Muhammad Rafi , M. Maujood , M. M. Fazal , S. M. Ali

A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assigns a real number between 0 and 1 to a pair of documents,…

Information Retrieval · Computer Science 2013-03-19 Muhammad Rafi , Mohammad Shahid Shaikh

Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and…

Information Retrieval · Computer Science 2014-12-08 Muhammad Rafi , Farnaz Amin , Mohammad Shahid Shaikh

A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assign a real number between 0 and 1 to a pair of documents,…

Information Retrieval · Computer Science 2012-08-20 Muhammad Rafi , Sundus Hassan , Mohammad Shahid Shaikh

Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many…

Information Retrieval · Computer Science 2018-07-23 Tru H. Cao , Vuong M. Ngo , Dung T. Hong , Tho T. Quan

Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that…

Information Retrieval · Computer Science 2014-01-13 R. Jensi , Dr. G. Wiselin Jiji

There are many scenarios where we may want to find pairs of textually similar documents in a large corpus (e.g. a researcher doing literature review, or an R&D project manager analyzing project proposals). To programmatically discover those…

Computation and Language · Computer Science 2020-12-16 Carlos Badenes-Olmedo , Jose-Luis Redondo García , Oscar Corcho

Document clustering and topic modeling are two closely related tasks which can mutually benefit each other. Topic modeling can project documents into a topic space which facilitates effective document clustering. Cluster labels discovered…

Machine Learning · Computer Science 2013-09-27 Pengtao Xie , Eric P. Xing

While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the authors mood, gender, age, or sentiment.…

Information Retrieval · Computer Science 2014-01-22 Sajib Dasgupta , Vincent Ng

Text clustering holds significant value across various domains due to its ability to identify patterns and group related information. Current approaches which rely heavily on a computed similarity measure between documents are often limited…

Information Retrieval · Computer Science 2025-04-09 Laurence Hirsch , Robin Hirsch , Bayode Ogunleye

Document similarity is the problem of estimating the degree to which a given pair of documents has similar semantic content. An accurate document similarity measure can improve several enterprise relevant tasks such as document clustering,…

Computation and Language · Computer Science 2017-11-15 Gaurav Maheshwari , Priyansh Trivedi , Harshita Sahijwani , Kunal Jha , Sourish Dasgupta , Jens Lehmann

Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…

Information Retrieval · Computer Science 2010-03-11 Alok Ranjan , Harish Verma , Eatesh Kandpal , Joydip Dhar

We propose a novel clustering pipeline to detect and characterize influence campaigns from documents. This approach clusters parts of document, detects clusters that likely reflect an influence campaign, and then identifies documents linked…

Computation and Language · Computer Science 2024-04-30 Zhengxiang Wang , Owen Rambow

The multi-document summarization task requires the designed summarizer to generate a short text that covers the important information of original documents and satisfies content diversity. This paper proposes a multi-document summarization…

Computation and Language · Computer Science 2023-03-07 Bing Ma

In this paper, we propose an alternative to deep neural networks for semantic information retrieval for the case of long documents. This new approach exploiting clustering techniques to take into account the meaning of words in Information…

Information Retrieval · Computer Science 2025-07-29 Paul Mbathe Mekontchou , Armel Fotsoh , Bernabe Batchakui , Eddy Ella

With the huge upsurge of information in day-to-days life, it has become difficult to assemble relevant information in nick of time. But people, always are in dearth of time, they need everything quick. Hence clustering was introduced to…

Information Retrieval · Computer Science 2015-03-02 Rakesh Chandra Balabantaray , Chandrali Sarma , Monica Jha

In this paper we propose a graph-community detection approach to identify cross-document relationships at the topic segment level. Given a set of related documents, we automatically find these relationships by clustering segments with…

Computation and Language · Computer Science 2016-06-14 Pedro Mota , Maxine Eskenazi , Luisa Coheur

Now a days, the text document is spontaneously increasing over the internet, e-mail and web pages and they are stored in the electronic database format. To arrange and browse the document it becomes difficult. To overcome such problem the…

Computation and Language · Computer Science 2013-03-05 Leena H. Patil , Mohammed Atique

Document clustering is a text mining technique used to provide better document search and browsing in digital libraries or online corpora. A lot of research has been done on biomedical document clustering that is based on using existing…

Computation and Language · Computer Science 2018-10-24 Setu Shah , Xiao Luo

We present {\em generative clustering} (GC) for clustering a set of documents, $\mathrm{X}$, by using texts $\mathrm{Y}$ generated by large language models (LLMs) instead of by clustering the original documents $\mathrm{X}$. Because LLMs…

Machine Learning · Computer Science 2024-12-19 Xin Du , Kumiko Tanaka-Ishii
‹ Prev 1 2 3 10 Next ›