Related papers: Correlation Clustering in Data Streams

Dynamic Correlation Clustering in Sublinear Update Time

We study the classic problem of correlation clustering in dynamic node streams. In this setting, nodes are either added or randomly deleted over time, and each node pair is connected by a positive or negative edge. The objective is to…

Data Structures and Algorithms · Computer Science 2024-06-14 Vincent Cohen-Addad , Silvio Lattanzi , Andreas Maggiori , Nikos Parotsidis

Solving the Correlation Cluster LP in Sublinear Time

Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of…

Data Structures and Algorithms · Computer Science 2025-11-05 Nairen Cao , Vincent Cohen-Addad , Shi Li , Euiwoong Lee , David Rasmussen Lolck , Alantha Newman , Mikkel Thorup , Lukas Vogl , Shuyi Yan , Hanwen Zhang

Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models

Given a graph with positive and negative edge labels, the correlation clustering problem aims to cluster the nodes so to minimize the total number of between-cluster positive and within-cluster negative edges. This problem has many…

Data Structures and Algorithms · Computer Science 2024-06-17 Mina Dalirrooyfard , Konstantin Makarychev , Slobodan Mitrović

Data Stream Clustering: A Review

Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for…

Machine Learning · Computer Science 2020-07-22 Alaettin Zubaroğlu , Volkan Atalay

Learning-Augmented Streaming Algorithms for Correlation Clustering

We study streaming algorithms for Correlation Clustering. Given a graph as an arbitrary-order stream of edges, with each edge labeled as positive or negative, the goal is to partition the vertices into disjoint clusters, such that the…

Data Structures and Algorithms · Computer Science 2025-10-14 Yinhao Dong , Shan Jiang , Shi Li , Pan Peng

Estimating Correlation Clustering Cost in Node-Arrival Stream

We study the correlation clustering problem in the node-arrival data stream model. Unlike previous work, where the stream consists of the graph's edges, we focus on the setting in which the stream contains only the nodes. This model better…

Data Structures and Algorithms · Computer Science 2026-05-11 Kaiwen Liu , Seba Daniela Villalobos , Qin Zhang

Sublinear Algorithms for MAXCUT and Correlation Clustering

We study sublinear algorithms for two fundamental graph problems, MAXCUT and correlation clustering. Our focus is on constructing core-sets as well as developing streaming algorithms for these problems. Constant space algorithms are known…

Data Structures and Algorithms · Computer Science 2018-02-21 Aditya Bhaskara , Samira Daruki , Suresh Venkatasubramanian

Clustering High Dimensional Dynamic Data Streams

We present data streaming algorithms for the $k$-median problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space $\{1, 2, \ldots \Delta\}^d$.…

Data Structures and Algorithms · Computer Science 2017-06-14 Vladimir Braverman , Gereon Frahling , Harry Lang , Christian Sohler , Lin F. Yang

Correlation Clustering in Constant Many Parallel Rounds

Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of…

Data Structures and Algorithms · Computer Science 2021-06-17 Vincent Cohen-Addad , Silvio Lattanzi , Slobodan Mitrović , Ashkan Norouzi-Fard , Nikos Parotsidis , Jakub Tarnawski

Correlation Clustering and Biclustering with Locally Bounded Errors

We consider a generalized version of the correlation clustering problem, defined as follows. Given a complete graph $G$ whose edges are labeled with $+$ or $-$, we wish to partition the graph into clusters while trying to avoid errors: $+$…

Data Structures and Algorithms · Computer Science 2016-05-25 Gregory J. Puleo , Olgica Milenkovic

Correlation Clustering via Strong Triadic Closure Labeling: Fast Approximation Algorithms and Practical Lower Bounds

Correlation clustering is a widely studied framework for clustering based on pairwise similarity and dissimilarity scores, but its best approximation algorithms rely on impractical linear programming relaxations. We present faster…

Data Structures and Algorithms · Computer Science 2022-06-27 Nate Veldt

Achieving Approximate Soft Clustering in Data Streams

In recent years, data streaming has gained prominence due to advances in technologies that enable many applications to generate continuous flows of data. This increases the need to develop algorithms that are able to efficiently process…

Data Structures and Algorithms · Computer Science 2015-03-20 Vaneet Aggarwal , Shankar Krishnan

Clustering Time Series Data Stream - A Literature Survey

Mining Time Series data has a tremendous growth of interest in today's world. To provide an indication various implementations are studied and summarized to identify the different problems in existing applications. Clustering time series is…

Information Retrieval · Computer Science 2010-05-25 V. Kavitha , M. Punithavalli

Online Correlation Clustering for Dynamic Complete Signed Graphs

In the correlation clustering problem for complete signed graphs, the input is a complete signed graph with edges weighted as $+1$ (denote recommendation to put this pair in the same cluster) or $-1$ (recommending to put this pair of…

Data Structures and Algorithms · Computer Science 2022-11-15 Ali Shakiba

Large Scale Correlation Clustering Optimization

Clustering is a fundamental task in unsupervised learning. The focus of this paper is the Correlation Clustering functional which combines positive and negative affinities between the data points. The contribution of this paper is two fold:…

Computer Vision and Pattern Recognition · Computer Science 2011-12-14 Shai Bagon , Meirav Galun

Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions

We present a new approach for solving (minimum disagreement) correlation clustering that results in sublinear algorithms with highly efficient time and space complexity for this problem. In particular, we obtain the following algorithms for…

Data Structures and Algorithms · Computer Science 2021-09-30 Sepehr Assadi , Chen Wang

A Clustering-based Framework for Classifying Data Streams

The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches…

Machine Learning · Computer Science 2021-06-23 Xuyang Yan , Abdollah Homaifar , Mrinmoy Sarkar , Abenezer Girma , Edward Tunstel

Clust-Splitter - an Efficient Nonsmooth Optimization-Based Algorithm for Clustering Large Datasets

Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on nonsmooth optimization, designed to solve the…

Machine Learning · Computer Science 2026-03-19 Jenni Lampainen , Kaisa Joki , Napsu Karmitsa , Marko M. Mäkelä

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…

Data Structures and Algorithms · Computer Science 2015-12-01 Ka-Chun Wong

Nearly Optimal Dynamic $k$-Means Clustering for High-Dimensional Data

We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, \Delta\}^d$ can be dynamically inserted to or deleted from the dataset. For this problem, we…

Data Structures and Algorithms · Computer Science 2019-02-08 Wei Hu , Zhao Song , Lin F. Yang , Peilin Zhong