Related papers: Clustering Text Using Attention

Optimized Algorithms for Text Clustering with LLM-Generated Constraints

Clustering is a fundamental tool that has garnered significant interest across a wide range of applications including text analysis. To improve clustering accuracy, many researchers have incorporated background knowledge, typically in the…

Machine Learning · Computer Science 2026-01-19 Chaoqi Jia , Weihong Wu , Longkun Guo , Zhigang Lu , Chao Chen , Kok-Leong Ong

Text Clustering as Classification with LLMs

Text clustering serves as a fundamental technique for organizing and interpreting unstructured textual data, particularly in contexts where manual annotation is prohibitively costly. With the rapid advancement of Large Language Models…

Computation and Language · Computer Science 2025-10-08 Chen Huang , Guoxiu He

An Introductory Survey on Attention Mechanisms in NLP Problems

First derived from human intuition, later adapted to machine translation for automatic token alignment, attention mechanism, a simple method that can be used for encoding sequence data based on the importance score each element is assigned,…

Computation and Language · Computer Science 2018-11-15 Dichao Hu

Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences

Sentence embedding methods offer a powerful approach for working with short textual constructs or sequences of words. By representing sentences as dense numerical vectors, many natural language processing (NLP) applications have improved…

Computation and Language · Computer Science 2021-10-05 Yuan An , Alexander Kalinowski , Jane Greenberg

A Survey on optimization approaches to text document clustering

Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that…

Information Retrieval · Computer Science 2014-01-13 R. Jensi , Dr. G. Wiselin Jiji

Computing Word Classes Using Spectral Clustering

Clustering a lexicon of words is a well-studied problem in natural language processing (NLP). Word clusters are used to deal with sparse data in statistical language processing, as well as features for solving various NLP tasks (text…

Computation and Language · Computer Science 2018-08-17 Effi Levi , Saggy Herman , Ari Rappoport

Short Text Clustering with Transformers

Recent techniques for the task of short text clustering often rely on word embeddings as a transfer learning component. This paper shows that sentence vector representations from Transformers in conjunction with different clustering methods…

Computation and Language · Computer Science 2021-02-02 Leonid Pugachev , Mikhail Burtsev

Experimental Estimation of Number of Clusters Based on Cluster Quality

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering…

Information Retrieval · Computer Science 2015-03-12 G. Hannah Grace , Kalyani Desikan

Text Clustering with Large Language Model Embeddings

Text clustering is an important method for organising the increasing volume of digital content, aiding in the structuring and discovery of hidden patterns in uncategorised data. The effectiveness of text clustering largely depends on the…

Computation and Language · Computer Science 2024-12-06 Alina Petukhova , João P. Matos-Carvalho , Nuno Fachada

Neural Text Classification by Jointly Learning to Cluster and Align

Distributional text clustering delivers semantically informative representations and captures the relevance between each word and semantic clustering centroids. We extend the neural text clustering approach to text classification tasks by…

Computation and Language · Computer Science 2020-11-25 Yekun Chai , Haidong Zhang , Shuo Jin

Attention in Natural Language Processing

Attention is an increasingly popular mechanism used in a wide range of neural architectures. The mechanism itself has been realized in a variety of formats. However, because of the fast-paced advances in this domain, a systematic overview…

Computation and Language · Computer Science 2021-10-12 Andrea Galassi , Marco Lippi , Paolo Torroni

Context-Aware Clustering using Large Language Models

Despite the remarkable success of Large Language Models (LLMs) in text understanding and generation, their potential for text clustering tasks remains underexplored. We observed that powerful closed-source LLMs provide good quality…

Computation and Language · Computer Science 2024-05-03 Sindhu Tipirneni , Ravinarayana Adkathimar , Nurendra Choudhary , Gaurush Hiranandani , Rana Ali Amjad , Vassilis N. Ioannidis , Changhe Yuan , Chandan K. Reddy

Attention-Based Clustering: Learning a Kernel from Context

In machine learning, no data point stands alone. We believe that context is an underappreciated concept in many machine learning methods. We propose Attention-Based Clustering (ABC), a neural architecture based on the attention mechanism,…

Machine Learning · Computer Science 2020-10-05 Samuel Coward , Erik Visse-Martindale , Chithrupa Ramesh

An end-to-end Neural Network Framework for Text Clustering

The unsupervised text clustering is one of the major tasks in natural language processing (NLP) and remains a difficult and complex problem. Conventional \mbox{methods} generally treat this task using separated steps, including text…

Computation and Language · Computer Science 2019-03-25 Jie Zhou , Xingyi Cheng , Jinchao Zhang

ClusterLLM: Large Language Models as a Guide for Text Clustering

We introduce ClusterLLM, a novel text clustering framework that leverages feedback from an instruction-tuned large language model, such as ChatGPT. Compared with traditional unsupervised methods that builds upon "small" embedders,…

Computation and Language · Computer Science 2023-11-07 Yuwei Zhang , Zihan Wang , Jingbo Shang

Mimicking Human Process: Text Representation via Latent Semantic Clustering for Classification

Considering that words with different characteristic in the text have different importance for classification, grouping them together separately can strengthen the semantic expression of each part. Thus we propose a new text representation…

Computation and Language · Computer Science 2019-06-19 Xiaoye Tan , Rui Yan , Chongyang Tao , Mingrui Wu

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the authors mood, gender, age, or sentiment.…

Information Retrieval · Computer Science 2014-01-22 Sajib Dasgupta , Vincent Ng

Query Clustering using Segment Specific Context Embeddings

This paper presents a novel query clustering approach to capture the broad interest areas of users querying search engines. We make use of recent advances in NLP - word2vec and extend it to get query2vec, vector representations of queries,…

Information Retrieval · Computer Science 2016-11-08 S. K Kolluru , Prasenjit Mukherjee

In-Context Clustering with Large Language Models

We propose In-Context Clustering (ICC), a flexible LLM-based procedure for clustering data from diverse distributions. Unlike traditional clustering algorithms constrained by predefined similarity measures, ICC flexibly captures complex…

Machine Learning · Computer Science 2025-10-10 Ying Wang , Mengye Ren , Andrew Gordon Wilson

Clustering by Attention: Leveraging Prior Fitted Transformers for Data Partitioning

Clustering is a core task in machine learning with wide-ranging applications in data mining and pattern recognition. However, its unsupervised nature makes it inherently challenging. Many existing clustering algorithms suffer from critical…

Machine Learning · Computer Science 2025-07-29 Ahmed Shokry , Ayman Khalafallah