English
Related papers

Related papers: Context-Aware Clustering using Large Language Mode…

200 papers

Text clustering serves as a fundamental technique for organizing and interpreting unstructured textual data, particularly in contexts where manual annotation is prohibitively costly. With the rapid advancement of Large Language Models…

Computation and Language · Computer Science 2025-10-08 Chen Huang , Guoxiu He

Entity Resolution (ER) is a fundamental data quality improvement task that identifies and links records referring to the same real-world entity. Traditional ER approaches often rely on pairwise comparisons, which can be costly in terms of…

Databases · Computer Science 2025-06-04 Jiajie Fu , Haitong Tang , Arijit Khan , Sharad Mehrotra , Xiangyu Ke , Yunjun Gao

Text clustering is an important method for organising the increasing volume of digital content, aiding in the structuring and discovery of hidden patterns in uncategorised data. The effectiveness of text clustering largely depends on the…

Computation and Language · Computer Science 2024-12-06 Alina Petukhova , João P. Matos-Carvalho , Nuno Fachada

Text clustering is a fundamental task in natural language processing, yet traditional clustering algorithms with pre-trained embeddings often struggle in domain-specific contexts without costly fine-tuning. Large language models (LLMs)…

Computation and Language · Computer Science 2025-12-05 Yiming Xu , Yuan Yuan , Vijay Viswanathan , Graham Neubig

Large Language Models (LLMs) are reshaping unsupervised learning by offering an unprecedented ability to perform text clustering based on their deep semantic understanding. However, their direct application is fundamentally limited by a…

Computation and Language · Computer Science 2026-04-08 Yuanjie Zhu , Liangwei Yang , Ke Xu , Weizhi Zhang , Zihe Song , Jindong Wang , Philip S. Yu

We propose In-Context Clustering (ICC), a flexible LLM-based procedure for clustering data from diverse distributions. Unlike traditional clustering algorithms constrained by predefined similarity measures, ICC flexibly captures complex…

Machine Learning · Computer Science 2025-10-10 Ying Wang , Mengye Ren , Andrew Gordon Wilson

Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure to the data, which helps the clustering algorithm to match the user's intent. Existing approaches to semi-supervised…

Computation and Language · Computer Science 2023-07-04 Vijay Viswanathan , Kiril Gashteovski , Carolin Lawrence , Tongshuang Wu , Graham Neubig

Large language models (LLMs) often rely on user-specific memories distilled from past interactions to enable personalized generation. A common practice is to concatenate these memories with the input prompt, but this approach quickly…

Computation and Language · Computer Science 2026-01-27 Ondrej Bohdal , Pramit Saha , Umberto Michieli , Mete Ozay , Taha Ceritli

We introduce ClusterLLM, a novel text clustering framework that leverages feedback from an instruction-tuned large language model, such as ChatGPT. Compared with traditional unsupervised methods that builds upon "small" embedders,…

Computation and Language · Computer Science 2023-11-07 Yuwei Zhang , Zihan Wang , Jingbo Shang

Clustering is a fundamental tool that has garnered significant interest across a wide range of applications including text analysis. To improve clustering accuracy, many researchers have incorporated background knowledge, typically in the…

Machine Learning · Computer Science 2026-01-19 Chaoqi Jia , Weihong Wu , Longkun Guo , Zhigang Lu , Chao Chen , Kok-Leong Ong

The advancements in large language models (LLMs) have brought significant progress in NLP tasks. However, if a task cannot be fully described in prompts, the models could fail to carry out the task. In this paper, we propose a simple yet…

Computation and Language · Computer Science 2025-06-10 Hwiyeol Jo , Hyunwoo Lee , Kang Min Yoo , Taiwoo Park

We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity…

Computation and Language · Computer Science 2021-01-28 Kailash Karthik Saravanakumar , Miguel Ballesteros , Muthu Kumar Chandrasekaran , Kathleen McKeown

Large language models have demonstrated exceptional performance across multiple crosslingual NLP tasks, including machine translation (MT). However, persistent challenges remain in addressing context-sensitive units (CSUs), such as…

Computation and Language · Computer Science 2025-05-30 Qiuyu Ding , Zhiqiang Cao , Hailong Cao , Tiejun Zhao

Clustering Text has been an important problem in the domain of Natural Language Processing. While there are techniques to cluster text based on using conventional clustering techniques on top of contextual or non-contextual vector space…

Computation and Language · Computer Science 2022-01-11 Lovedeep Singh

The advent of large language models (LLMs) has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit…

Computation and Language · Computer Science 2024-03-18 Pengcheng Jiang , Cao Xiao , Zifeng Wang , Parminder Bhatia , Jimeng Sun , Jiawei Han

Scaling test-time computation--generating and analyzing multiple or sequential outputs for a single input--has become a promising strategy for improving the reliability and quality of large language models (LLMs), as evidenced by advances…

Computation and Language · Computer Science 2025-06-03 Sungjae Lee , Hoyoung Kim , Jeongyeon Hwang , Eunhyeok Park , Jungseul Ok

Large Language Models (LLMs) have demonstrated impressive performance on a wide range of natural language processing (NLP) tasks, primarily through in-context learning (ICL). In ICL, the LLM is provided with examples that represent a given…

Computation and Language · Computer Science 2025-02-19 Abdellah El Mekki , Muhammad Abdul-Mageed

This paper proposes a Clustering, Labeling, then Augmenting framework that significantly enhances performance in Semi-Supervised Text Classification (SSTC) tasks, effectively addressing the challenge of vast datasets with limited labeled…

Computation and Language · Computer Science 2024-12-30 Shan Zhong , Jiahao Zeng , Yongxin Yu , Bohong Lin

Text clustering aims to automatically partition a collection of documents into coherent groups based on their linguistic features. In the literature, this task is formulated either as metric clustering over pre-trained text embeddings or as…

Computation and Language · Computer Science 2025-08-22 Hongtao Wang , Taiyan Zhang , Renchi Yang , Jianliang Xu

Vulnerability detection is a critical aspect of software security. Accurate detection is essential to prevent potential security breaches and protect software systems from malicious attacks. Recently, vulnerability detection methods…

Software Engineering · Computer Science 2025-04-24 Yixin Yang , Bowen Xu , Xiang Gao , Hailong Sun
‹ Prev 1 2 3 10 Next ›