English
Related papers

Related papers: Resampling methods for document clustering

200 papers

In this paper, we present a semi-supervised learning algorithm for classification of text documents. A method of labeling unlabeled text documents is presented. The presented method is based on the principle of divide and conquer strategy.…

Machine Learning · Computer Science 2017-06-27 Harsha S. Gowda , Mahamad Suhil , D. S. Guru , Lavanya Narayana Raju

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

Text clustering holds significant value across various domains due to its ability to identify patterns and group related information. Current approaches which rely heavily on a computed similarity measure between documents are often limited…

Information Retrieval · Computer Science 2025-04-09 Laurence Hirsch , Robin Hirsch , Bayode Ogunleye

Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional…

Information Retrieval · Computer Science 2012-01-11 Muhammad Rafi , M. Maujood , M. M. Fazal , S. M. Ali

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering…

Information Retrieval · Computer Science 2015-03-12 G. Hannah Grace , Kalyani Desikan

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition. Particularly, clusters were leveraged to indicate information saliency as well…

Computation and Language · Computer Science 2022-05-23 Ori Ernst , Avi Caciularu , Ori Shapira , Ramakanth Pasunuru , Mohit Bansal , Jacob Goldberger , Ido Dagan

We present {\em generative clustering} (GC) for clustering a set of documents, $\mathrm{X}$, by using texts $\mathrm{Y}$ generated by large language models (LLMs) instead of by clustering the original documents $\mathrm{X}$. Because LLMs…

Machine Learning · Computer Science 2024-12-19 Xin Du , Kumiko Tanaka-Ishii

Translated texts are distinctively different from original ones, to the extent that supervised text classification methods can distinguish between them with high accuracy. These differences were proven useful for statistical machine…

Computation and Language · Computer Science 2016-09-13 Ella Rabinovich , Shuly Wintner

The abundance of text data being produced in the modern age makes it increasingly important to intuitively group, categorize, or classify text data by theme for efficient retrieval and search. Yet, the high dimensionality and imprecision of…

Computation and Language · Computer Science 2018-11-07 Robert Frank Martorano

Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that…

Information Retrieval · Computer Science 2014-01-13 R. Jensi , Dr. G. Wiselin Jiji

Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no…

Machine Learning · Computer Science 2021-04-27 Vincent Lemaire , Oumaima Alaoui Ismaili , Antoine Cornuéjols , Dominique Gay

In this paper, we propose an alternative to deep neural networks for semantic information retrieval for the case of long documents. This new approach exploiting clustering techniques to take into account the meaning of words in Information…

Information Retrieval · Computer Science 2025-07-29 Paul Mbathe Mekontchou , Armel Fotsoh , Bernabe Batchakui , Eddy Ella

Short text clustering is a challenging task due to the lack of signal contained in such short texts. In this work, we propose iterative classification as a method to b o ost the clustering quality (e.g., accuracy) of short texts. Given a…

Information Retrieval · Computer Science 2020-02-03 Md Rashadul Hasan Rakib , Norbert Zeh , Magdalena Jankowska , Evangelos Milios

Considering that words with different characteristic in the text have different importance for classification, grouping them together separately can strengthen the semantic expression of each part. Thus we propose a new text representation…

Computation and Language · Computer Science 2019-06-19 Xiaoye Tan , Rui Yan , Chongyang Tao , Mingrui Wu

Now a days, the text document is spontaneously increasing over the internet, e-mail and web pages and they are stored in the electronic database format. To arrange and browse the document it becomes difficult. To overcome such problem the…

Computation and Language · Computer Science 2013-03-05 Leena H. Patil , Mohammed Atique

A new fast algorithm for clustering and classification of large collections of text documents is introduced. The new algorithm employs the bipartite graph that realizes the word-document matrix of the collection. Namely, the modularity of…

Information Retrieval · Computer Science 2011-05-31 Grigory Pivovarov , Sergei Trunov

The deployment of language models brings challenges in generating reliable information, especially when these models are fine-tuned using human preferences. To extract encoded knowledge without (potentially) biased human labels,…

Artificial Intelligence · Computer Science 2024-10-07 Walter Laurito , Sharan Maiya , Grégoire Dhimoïla , Owen , Yeung , Kaarel Hänni

Clustering is a fundamental tool that has garnered significant interest across a wide range of applications including text analysis. To improve clustering accuracy, many researchers have incorporated background knowledge, typically in the…

Machine Learning · Computer Science 2026-01-19 Chaoqi Jia , Weihong Wu , Longkun Guo , Zhigang Lu , Chao Chen , Kok-Leong Ong

Several methods have been explored for automating parts of Systematic Mapping (SM) and Systematic Review (SR) methodologies. Challenges typically evolve around the gaps in semantic understanding of text, as well as lack of domain and…

Computation and Language · Computer Science 2021-02-10 Xiajing Li , Marios Daoutis

Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…

Information Retrieval · Computer Science 2010-03-11 Alok Ranjan , Harish Verma , Eatesh Kandpal , Joydip Dhar
‹ Prev 1 2 3 10 Next ›