Related papers: ESMC: MLLM-Based Embedding Selection for Explainab…

Text Clustering with Large Language Model Embeddings

Text clustering is an important method for organising the increasing volume of digital content, aiding in the structuring and discovery of hidden patterns in uncategorised data. The effectiveness of text clustering largely depends on the…

Computation and Language · Computer Science 2024-12-06 Alina Petukhova , João P. Matos-Carvalho , Nuno Fachada

Human-interpretable clustering of short-text using large language models

Clustering short text is a difficult problem, due to the low word co-occurrence between short text documents. This work shows that large language models (LLMs) can overcome the limitations of traditional clustering approaches by generating…

Computation and Language · Computer Science 2025-04-08 Justin K. Miller , Tristram J. Alexander

Text Clustering as Classification with LLMs

Text clustering serves as a fundamental technique for organizing and interpreting unstructured textual data, particularly in contexts where manual annotation is prohibitively costly. With the rapid advancement of Large Language Models…

Computation and Language · Computer Science 2025-10-08 Chen Huang , Guoxiu He

Large Language Models Enable Few-Shot Clustering

Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure to the data, which helps the clustering algorithm to match the user's intent. Existing approaches to semi-supervised…

Computation and Language · Computer Science 2023-07-04 Vijay Viswanathan , Kiril Gashteovski , Carolin Lawrence , Tongshuang Wu , Graham Neubig

ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation

Text clustering is a fundamental task in natural language processing, yet traditional clustering algorithms with pre-trained embeddings often struggle in domain-specific contexts without costly fine-tuning. Large language models (LLMs)…

Computation and Language · Computer Science 2025-12-05 Yiming Xu , Yuan Yuan , Vijay Viswanathan , Graham Neubig

Resource-Efficient Adaptation of Large Language Models for Text Embeddings via Prompt Engineering and Contrastive Fine-tuning

Large Language Models (LLMs) have become a cornerstone in Natural Language Processing (NLP), achieving impressive performance in text generation. Their token-level representations capture rich, human-aligned semantics. However, pooling…

Computation and Language · Computer Science 2025-09-25 Benedikt Roth , Stephan Rappensperger , Tianming Qiu , Hamza Imamović , Julian Wörmann , Hao Shen

LLM-MemCluster: Empowering Large Language Models with Dynamic Memory for Text Clustering

Large Language Models (LLMs) are reshaping unsupervised learning by offering an unprecedented ability to perform text clustering based on their deep semantic understanding. However, their direct application is fundamentally limited by a…

Computation and Language · Computer Science 2026-04-08 Yuanjie Zhu , Liangwei Yang , Ke Xu , Weizhi Zhang , Zihe Song , Jindong Wang , Philip S. Yu

User-LLM: Efficient LLM Contextualization with User Embeddings

Large language models (LLMs) have achieved remarkable success across various domains, but effectively incorporating complex and potentially noisy user timeline data into LLMs remains a challenge. Current approaches often involve translating…

Computation and Language · Computer Science 2024-09-11 Lin Ning , Luyang Liu , Jiaxing Wu , Neo Wu , Devora Berlowitz , Sushant Prakash , Bradley Green , Shawn O'Banion , Jun Xie

Clustering-driven Memory Compression for On-device Large Language Models

Large language models (LLMs) often rely on user-specific memories distilled from past interactions to enable personalized generation. A common practice is to concatenate these memories with the input prompt, but this approach quickly…

Computation and Language · Computer Science 2026-01-27 Ondrej Bohdal , Pramit Saha , Umberto Michieli , Mete Ozay , Taha Ceritli

Explainable Search and Discovery of Visual Cultural Heritage Collections with Multimodal Large Language Models

Many cultural institutions have made large digitized visual collections available online, often under permissible re-use licences. Creating interfaces for exploring and searching these collections is difficult, particularly in the absence…

Computer Vision and Pattern Recognition · Computer Science 2024-11-08 Taylor Arnold , Lauren Tilton

Do LLMs Benefit from User and Item Embeddings in Recommendation Tasks?

Large Language Models (LLMs) have emerged as promising recommendation systems, offering novel ways to model user preferences through generative approaches. However, many existing methods often rely solely on text semantics or incorporate…

Machine Learning · Computer Science 2026-01-09 Mir Rayat Imtiaz Hossain , Leo Feng , Leonid Sigal , Mohamed Osama Ahmed

Explaining Multi-modal Large Language Models by Analyzing their Vision Perception

Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in understanding and generating content across various modalities, such as images and text. However, their interpretability remains a challenge, hindering…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Loris Giulivi , Giacomo Boracchi

Multi-Sense Embeddings for Language Models and Knowledge Distillation

Transformer-based large language models (LLMs) rely on contextual embeddings which generate different (continuous) representations for the same token depending on its surrounding context. Nonetheless, words and tokens typically have a…

Computation and Language · Computer Science 2025-07-10 Qitong Wang , Mohammed J. Zaki , Georgios Kollias , Vasileios Kalantzis

Demystifying Embedding Spaces using Large Language Models

Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream…

Computation and Language · Computer Science 2024-03-14 Guy Tennenholtz , Yinlam Chow , Chih-Wei Hsu , Jihwan Jeong , Lior Shani , Azamat Tulepbergenov , Deepak Ramachandran , Martin Mladenov , Craig Boutilier

Enhancing Lexicon-Based Text Embeddings with Large Language Models

Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveraging…

Computation and Language · Computer Science 2026-03-20 Yibin Lei , Tao Shen , Yu Cao , Andrew Yates

Language Modeling by Clustering with Word Embeddings for Text Readability Assessment

We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences.…

Computation and Language · Computer Science 2017-09-07 Miriam Cha , Youngjune Gwon , H. T. Kung

Learning for Multi-Type Subspace Clustering

Subspace clustering has been extensively studied from the hypothesis-and-test, algebraic, and spectral clustering based perspectives. Most assume that only a single type/class of subspace is present. Generalizations to multiple types are…

Computer Vision and Pattern Recognition · Computer Science 2019-04-04 Xun Xu , Loong-Fah Cheong , Zhuwen Li

Personalized Multimodal Large Language Models: A Survey

Multimodal Large Language Models (MLLMs) have become increasingly important due to their state-of-the-art performance and ability to integrate multiple data modalities, such as text, images, and audio, to perform complex tasks with high…

Computer Vision and Pattern Recognition · Computer Science 2024-12-04 Junda Wu , Hanjia Lyu , Yu Xia , Zhehao Zhang , Joe Barrow , Ishita Kumar , Mehrnoosh Mirtaheri , Hongjie Chen , Ryan A. Rossi , Franck Dernoncourt , Tong Yu , Ruiyi Zhang , Jiuxiang Gu , Nesreen K. Ahmed , Yu Wang , Xiang Chen , Hanieh Deilamsalehy , Namyong Park , Sungchul Kim , Huanrui Yang , Subrata Mitra , Zhengmian Hu , Nedim Lipka , Dang Nguyen , Yue Zhao , Jiebo Luo , Julian McAuley

LLMs Explain't: A Post-Mortem on Semantic Interpretability in Transformer Models

Large Language Models (LLMs) are becoming increasingly popular in pervasive computing due to their versatility and strong performance. However, despite their ubiquitous use, the exact mechanisms underlying their outstanding performance…

Computation and Language · Computer Science 2026-02-02 Alhassan Abdelhalim , Janick Edinger , Sören Laue , Michaela Regneri

ClusterLLM: Large Language Models as a Guide for Text Clustering

We introduce ClusterLLM, a novel text clustering framework that leverages feedback from an instruction-tuned large language model, such as ChatGPT. Compared with traditional unsupervised methods that builds upon "small" embedders,…

Computation and Language · Computer Science 2023-11-07 Yuwei Zhang , Zihan Wang , Jingbo Shang