Related papers: Vec2GC -- A Graph Based Clustering Method for Text…

Hybrid Topic-Semantic Labeling and Graph Embeddings for Unsupervised Legal Document Clustering

Legal documents pose unique challenges for text classification due to their domain-specific language and often limited labeled data. This paper proposes a hybrid approach for classifying legal texts by combining unsupervised topic and graph…

Machine Learning · Statistics 2025-09-03 Deepak Bastola , Woohyeok Choi

Graph-based Topic Extraction from Vector Embeddings of Text Documents: Application to a Corpus of News Articles

Production of news content is growing at an astonishing rate. To help manage and monitor the sheer amount of text, there is an increasing need to develop efficient methods that can provide insights into emerging content areas, and stratify…

Computation and Language · Computer Science 2020-10-29 M. Tarik Altuncu , Sophia N. Yaliraki , Mauricio Barahona

Graph-Convolutional Networks: Named Entity Recognition and Large Language Model Embedding in Document Clustering

Recent advances in machine learning, particularly Large Language Models (LLMs) such as BERT and GPT, provide rich contextual embeddings that improve text representation. However, current document clustering approaches often ignore the…

Computation and Language · Computer Science 2024-12-20 Imed Keraghel , Mohamed Nadif

Document clustering using graph based document representation with constraints

Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and…

Information Retrieval · Computer Science 2014-12-08 Muhammad Rafi , Farnaz Amin , Mohammad Shahid Shaikh

From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings

In this paper, we propose a novel approach for text classification based on clustering word embeddings, inspired by the bag of visual words model, which is widely used in computer vision. After each word in a collection of documents is…

Computation and Language · Computer Science 2017-07-26 Andrei M. Butnaru , Radu Tudor Ionescu

Graph2topic: an opensource topic modeling framework based on sentence embedding and community detection

It has been reported that clustering-based topic models, which cluster high-quality sentence embeddings with an appropriate word selection method, can generate better topics than generative probabilistic topic models. However, these…

Computation and Language · Computer Science 2023-06-07 Leihang Zhang , Jiapeng Liu , Qiang Yan

Content-driven, unsupervised clustering of news articles through multiscale graph partitioning

The explosion in the amount of news and journalistic content being generated across the globe, coupled with extended and instantaneous access to information through online media, makes it difficult and time-consuming to monitor news…

Computation and Language · Computer Science 2018-08-06 M. Tarik Altuncu , Sophia N. Yaliraki , Mauricio Barahona

An end-to-end Neural Network Framework for Text Clustering

The unsupervised text clustering is one of the major tasks in natural language processing (NLP) and remains a difficult and complex problem. Conventional \mbox{methods} generally treat this task using separated steps, including text…

Computation and Language · Computer Science 2019-03-25 Jie Zhou , Xingyi Cheng , Jinchao Zhang

Graph Clustering with Dynamic Embedding

Graph clustering (or community detection) has long drawn enormous attention from the research on web mining and information networks. Recent literature on this topic has reached a consensus that node contents and link structures should be…

Social and Information Networks · Computer Science 2017-12-25 Carl Yang , Mengxiong Liu , Zongyi Wang , Liyuan Liu , Jiawei Han

Clustering and Classification in Text Collections Using Graph Modularity

A new fast algorithm for clustering and classification of large collections of text documents is introduced. The new algorithm employs the bipartite graph that realizes the word-document matrix of the collection. Namely, the modularity of…

Information Retrieval · Computer Science 2011-05-31 Grigory Pivovarov , Sergei Trunov

Graph-based Clustering for Detecting Semantic Change Across Time and Languages

Despite the predominance of contextualized embeddings in NLP, approaches to detect semantic change relying on these embeddings and clustering methods underperform simpler counterparts based on static word embeddings. This stems from the…

Computation and Language · Computer Science 2024-02-05 Xianghe Ma , Michael Strube , Wei Zhao

A Novel Graph-Sequence Learning Model for Inductive Text Classification

Text classification plays an important role in various downstream text-related tasks, such as sentiment analysis, fake news detection, and public opinion analysis. Recently, text classification based on Graph Neural Networks (GNNs) has made…

Computation and Language · Computer Science 2025-12-24 Zuo Wang , Ye Yuan

Interpretable Text-Guided Image Clustering via Iterative Search

Traditional clustering methods aim to group unlabeled data points based on their similarity to each other. However, clustering, in the absence of additional information, is an ill-posed problem as there may be many different, yet equally…

Computer Vision and Pattern Recognition · Computer Science 2025-09-10 Bingchen Zhao , Oisin Mac Aodha

Graph Contrastive Learning via Cluster-refined Negative Sampling for Semi-supervised Text Classification

Graph contrastive learning (GCL) has been widely applied to text classification tasks due to its ability to generate self-supervised signals from unlabeled data, thus facilitating model training. However, existing GCL-based text…

Machine Learning · Computer Science 2024-10-25 Wei Ai , Jianbin Li , Ze Wang , Jiayi Du , Tao Meng , Yuntao Shou , Keqin Li

Graph-Community Detection for Cross-Document Topic Segment Relationship Identification

In this paper we propose a graph-community detection approach to identify cross-document relationships at the topic segment level. Given a set of related documents, we automatically find these relationships by clustering segments with…

Computation and Language · Computer Science 2016-06-14 Pedro Mota , Maxine Eskenazi , Luisa Coheur

Information-Theoretic Generative Clustering of Documents

We present {\em generative clustering} (GC) for clustering a set of documents, $\mathrm{X}$, by using texts $\mathrm{Y}$ generated by large language models (LLMs) instead of by clustering the original documents $\mathrm{X}$. Because LLMs…

Machine Learning · Computer Science 2024-12-19 Xin Du , Kumiko Tanaka-Ishii

CGC: Contrastive Graph Clustering for Community Detection and Tracking

Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of entities and track their evolution? In this paper, we approach this important task from graph clustering…

Social and Information Networks · Computer Science 2023-03-29 Namyong Park , Ryan Rossi , Eunyee Koh , Iftikhar Ahamath Burhanuddin , Sungchul Kim , Fan Du , Nesreen Ahmed , Christos Faloutsos

Graph Convolutional Networks for Text Classification

Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification.…

Computation and Language · Computer Science 2018-11-14 Liang Yao , Chengsheng Mao , Yuan Luo

Query Clustering using Segment Specific Context Embeddings

This paper presents a novel query clustering approach to capture the broad interest areas of users querying search engines. We make use of recent advances in NLP - word2vec and extend it to get query2vec, vector representations of queries,…

Information Retrieval · Computer Science 2016-11-08 S. K Kolluru , Prasenjit Mukherjee

Towards Real-Time Temporal Graph Learning

In recent years, graph representation learning has gained significant popularity, which aims to generate node embeddings that capture features of graphs. One of the methods to achieve this is employing a technique called random walks that…

Machine Learning · Computer Science 2022-10-13 Deniz Gurevin , Mohsin Shan , Tong Geng , Weiwen Jiang , Caiwen Ding , Omer Khan