Related papers: Clustering Introductory Computer Science Exercises…

Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms

Topic modeling is a powerful technique to discover hidden topics and patterns within a collection of documents without prior knowledge. Traditional topic modeling and clustering-based techniques encounter challenges in capturing contextual…

Computation and Language · Computer Science 2024-10-04 Melkamu Abay Mersha , Mesay Gemeda yigezu , Jugal Kalita

Mimicking Human Process: Text Representation via Latent Semantic Clustering for Classification

Considering that words with different characteristic in the text have different importance for classification, grouping them together separately can strengthen the semantic expression of each part. Thus we propose a new text representation…

Computation and Language · Computer Science 2019-06-19 Xiaoye Tan , Rui Yan , Chongyang Tao , Mingrui Wu

Analysis of Computational Science Papers from ICCS 2001-2016 using Topic Modeling and Graph Theory

This paper presents results of topic modeling and network models of topics using the International Conference on Computational Science corpus, which contains domain-specific (computational science) papers over sixteen years (a total of 5695…

Digital Libraries · Computer Science 2017-05-08 Tesfamariam M. Abuhay , Sergey V. Kovalchuk , Klavdiya O. Bochenina , George Kampis , Valeria V. Krzhizhanovskaya , Michael H. Lees

Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Topic models are a useful analysis tool to uncover the underlying themes within document collections. The dominant approach is to use probabilistic topic models that posit a generative story, but in this paper we propose an alternative way…

Computation and Language · Computer Science 2020-10-08 Suzanna Sia , Ayush Dalmia , Sabrina J. Mielke

Topic Modeling in the Voynich Manuscript

This article presents the results of investigations using topic modeling of the Voynich Manuscript (Beinecke MS408). Topic modeling is a set of computational methods which are used to identify clusters of subjects within text. We use latent…

Computation and Language · Computer Science 2021-07-08 Rachel Sterneck , Annie Polish , Claire Bowern

Document Clustering based on Topic Maps

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The…

Machine Learning · Statistics 2023-10-20 Dimitrios Saligkaras , Vasileios E. Papageorgiou

Neural Text Classification by Jointly Learning to Cluster and Align

Distributional text clustering delivers semantically informative representations and captures the relevance between each word and semantic clustering centroids. We extend the neural text clustering approach to text classification tasks by…

Computation and Language · Computer Science 2020-11-25 Yekun Chai , Haidong Zhang , Shuo Jin

Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling

We introduce Topic Grouper as a complementary approach in the field of probabilistic topic modeling. Topic Grouper creates a disjunctive partitioning of the training vocabulary in a stepwise manner such that resulting partitions represent…

Information Retrieval · Computer Science 2019-04-16 Daniel Pfeifer , Jochen L. Leidner

Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)

Topic modeling, a method for extracting the underlying themes from a collection of documents, is an increasingly important component of the design of intelligent systems enabling the sense-making of highly dynamic and diverse streams of…

Information Retrieval · Computer Science 2019-10-07 Chris Gropp , Alexander Herzog , Ilya Safro , Paul W. Wilson , Amy W. Apon

Topics in Contextualised Attention Embeddings

Contextualised word vectors obtained via pre-trained language models encode a variety of knowledge that has already been exploited in applications. Complementary to these language models are probabilistic topic models that learn thematic…

Computation and Language · Computer Science 2023-01-12 Mozhgan Talebpour , Alba Garcia Seco de Herrera , Shoaib Jameel

Semantic Document Clustering on Named Entity Features

Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many…

Information Retrieval · Computer Science 2018-07-23 Tru H. Cao , Vuong M. Ngo , Dung T. Hong , Tho T. Quan

Recommendation System based on Semantic Scholar Mining and Topic modeling: A behavioral analysis of researchers from six conferences

Recommendation systems have an important place to help online users in the internet society. Recommendation Systems in computer science are of very practical use these days in various aspects of the Internet portals, such as social…

Information Retrieval · Computer Science 2018-12-21 Hamed Jelodar , Yongli Wang , Mahdi Rabbani , Ru-xin Zhao , Seyedvalyallah Ayobi , Peng Hu , Isma Masood

Issues,Challenges and Tools of Clustering Algorithms

Clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure.…

Information Retrieval · Computer Science 2011-10-13 Parul Agarwal , M. Afshar Alam , Ranjit Biswas

Scalable Inference for Latent Dirichlet Allocation

We investigate the problem of learning a topic model - the well-known Latent Dirichlet Allocation - in a distributed manner, using a cluster of C processors and dividing the corpus to be learned equally among them. We propose a simple…

Machine Learning · Computer Science 2009-09-28 James Petterson , Tiberio Caetano

Clustering Students and Inferring Skill Set Profiles with Skill Hierarchies

Cognitive diagnosis models (CDMs) are a popular tool for assessing students' mastery of sets of skills. Given a set of $K$ skills tested on an assessment, students are classified into one of $2^K$ latent skill set profiles that represent…

Applications · Statistics 2021-04-07 Alan Mishler , Rebecca Nugent

Text Clustering as Classification with LLMs

Text clustering serves as a fundamental technique for organizing and interpreting unstructured textual data, particularly in contexts where manual annotation is prohibitively costly. With the rapid advancement of Large Language Models…

Computation and Language · Computer Science 2025-10-08 Chen Huang , Guoxiu He

Text-Guided Image Clustering

Image clustering divides a collection of images into meaningful groups, typically interpreted post-hoc via human-given annotations. Those are usually in the form of text, begging the question of using text as an abstraction for image…

Machine Learning · Computer Science 2024-02-20 Andreas Stephan , Lukas Miklautz , Kevin Sidak , Jan Philip Wahle , Bela Gipp , Claudia Plant , Benjamin Roth

Interpretable Deep Clustering for Tabular Data

Clustering is a fundamental learning task widely used as a first step in data analysis. For example, biologists use cluster assignments to analyze genome sequences, medical records, or images. Since downstream analysis is typically…

Machine Learning · Computer Science 2024-06-11 Jonathan Svirsky , Ofir Lindenbaum

Software Module Clustering: An In-Depth Literature Analysis

Software module clustering is an unsupervised learning method used to cluster software entities (e.g., classes, modules, or files) with similar features. The obtained clusters may be used to study, analyze, and understand the software…

Software Engineering · Computer Science 2020-12-03 Qusay I. Sarhan , Bestoun S. Ahmed , Miroslav Bures , Kamal Z. Zamli