English
Related papers

Related papers: Text Mining Through Label Induction Grouping Algor…

200 papers

In this paper, we propose an intuitive, training-free and label-free method for intent clustering in conversational search. Current approaches to short text clustering use LLM-generated pseudo-labels to enrich text representations or to…

Computation and Language · Computer Science 2026-02-26 I-Fan Lin , Faegheh Hasibi , Suzan Verberne

Text clustering serves as a fundamental technique for organizing and interpreting unstructured textual data, particularly in contexts where manual annotation is prohibitively costly. With the rapid advancement of Large Language Models…

Computation and Language · Computer Science 2025-10-08 Chen Huang , Guoxiu He

Token pruning has emerged as a mainstream approach for developing efficient Video Large Language Models (Video LLMs). This work revisits and advances the two predominant token-pruning paradigms: attention-based selection and…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Shukang Yin , Sirui Zhao , Hanchao Wang , Baozhi Jia , Xianquan Wang , Chaoyou Fu , Enhong Chen

Clustering is a fundamental tool that has garnered significant interest across a wide range of applications including text analysis. To improve clustering accuracy, many researchers have incorporated background knowledge, typically in the…

Machine Learning · Computer Science 2026-01-19 Chaoqi Jia , Weihong Wu , Longkun Guo , Zhigang Lu , Chao Chen , Kok-Leong Ong

In-context learning enables language models (LM) to adapt to downstream data or tasks by incorporating few samples as demonstrations within the prompts. It offers strong performance without the expense of fine-tuning. However, the…

Computation and Language · Computer Science 2024-10-15 Jian Gu , Aldeida Aleti , Chunyang Chen , Hongyu Zhang

The contribution of this paper is two-fold. First, we present Indexing by Latent Dirichlet Allocation (LDI), an automatic document indexing method. The probability distributions in LDI utilize those in Latent Dirichlet Allocation (LDA), a…

Information Retrieval · Computer Science 2014-12-12 Yanshan Wang , Jae-Sung Lee , In-Chan Choi

Text clustering is arguably one of the most important topics in modern data mining. Nevertheless, text data require tokenization which usually yields a very large and highly sparse term-document matrix, which is usually difficult to process…

Machine Learning · Computer Science 2020-02-25 Ali Hassani , Amir Iranmanesh , Najme Mansouri

Text clustering is an important method for organising the increasing volume of digital content, aiding in the structuring and discovery of hidden patterns in uncategorised data. The effectiveness of text clustering largely depends on the…

Computation and Language · Computer Science 2024-12-06 Alina Petukhova , João P. Matos-Carvalho , Nuno Fachada

Diffusion LLMs have emerged as a promising alternative to conventional autoregressive LLMs, offering significant potential for improved runtime efficiency. However, existing diffusion models lack the ability to provably enforce…

Machine Learning · Computer Science 2025-05-30 Tarun Suresh , Debangshu Banerjee , Shubham Ugare , Sasa Misailovic , Gagandeep Singh

This paper proposes a Clustering, Labeling, then Augmenting framework that significantly enhances performance in Semi-Supervised Text Classification (SSTC) tasks, effectively addressing the challenge of vast datasets with limited labeled…

Computation and Language · Computer Science 2024-12-30 Shan Zhong , Jiahao Zeng , Yongxin Yu , Bohong Lin

Long Document retrieval (DR) has always been a tremendous challenge for reading comprehension and information retrieval. The pre-training model has achieved good results in the retrieval stage and Ranking for long documents in recent years.…

Information Theory · Computer Science 2022-03-15 Chunyu Li , Jiajia Ding , Xing hu , Fan Wang

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering…

Information Retrieval · Computer Science 2015-03-12 G. Hannah Grace , Kalyani Desikan

There are two major approaches for sequence labeling. One is the probabilistic gradient-based methods such as conditional random fields (CRF) and neural networks (e.g., RNN), which have high accuracy but drawbacks: slow training, and no…

Machine Learning · Computer Science 2018-11-20 Xu Sun , Shuming Ma , Yi Zhang , Xuancheng Ren

Clustering short text is a difficult problem, due to the low word co-occurrence between short text documents. This work shows that large language models (LLMs) can overcome the limitations of traditional clustering approaches by generating…

Computation and Language · Computer Science 2025-04-08 Justin K. Miller , Tristram J. Alexander

Serving Large Language Models (LLMs) at scale requires meeting strict Service Level Objectives (SLOs) under severe computational and memory constraints. Nevertheless, traditional caching strategies fall short: exact-matching and prefix…

Databases · Computer Science 2025-08-27 Jungwoo Kim , Minsang Kim , Jaeheon Lee , Chanwoo Moon , Heejin Kim , Taeho Hwang , Woosuk Chung , Yeseong Kim , Sungjin Lee

Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index, which replaces or complements traditional index structures with machine learning models, has been actively…

Databases · Computer Science 2022-08-01 Yao Tian , Tingyun Yan , Xi Zhao , Kai Huang , Xiaofang Zhou

Scaling test-time computation--generating and analyzing multiple or sequential outputs for a single input--has become a promising strategy for improving the reliability and quality of large language models (LLMs), as evidenced by advances…

Computation and Language · Computer Science 2025-06-03 Sungjae Lee , Hoyoung Kim , Jeongyeon Hwang , Eunhyeok Park , Jungseul Ok

We present {\em generative clustering} (GC) for clustering a set of documents, $\mathrm{X}$, by using texts $\mathrm{Y}$ generated by large language models (LLMs) instead of by clustering the original documents $\mathrm{X}$. Because LLMs…

Machine Learning · Computer Science 2024-12-19 Xin Du , Kumiko Tanaka-Ishii

Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging…

Computation and Language · Computer Science 2026-03-20 Yibin Lei , Tao Shen , Yu Cao , Andrew Yates

We introduce a method for efficient multi-label text classification with large language models (LLMs), built on reformulating classification tasks as sequences of dichotomic (yes/no) decisions. Instead of generating all labels in a single…

Computation and Language · Computer Science 2025-11-07 Mikołaj Langner , Jan Eliasz , Ewa Rudnicka , Jan Kocoń
‹ Prev 1 2 3 10 Next ›