Related papers: Learning-Augmented Streaming Algorithms for Correl…
Grouping together similar elements in datasets is a common task in data mining and machine learning. In this paper, we study streaming algorithms for correlation clustering, where each pair of elements is labeled either similar or…
We introduce a novel algorithm to perform graph clustering in the edge streaming setting. In this model, the graph is presented as a sequence of edges that can be processed strictly once. Our streaming algorithm has an extremely low memory…
Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as $k$-center, $k$-median, and $k$-means. Such algorithms…
We study sublinear algorithms for two fundamental graph problems, MAXCUT and correlation clustering. Our focus is on constructing core-sets as well as developing streaming algorithms for these problems. Constant space algorithms are known…
We present an efficient algorithm for the min-max correlation clustering problem. The input is a complete graph where edges are labeled as either positive $(+)$ or negative $(-)$, and the objective is to find a clustering that minimizes the…
Correlation clustering provides a method for separating the vertices of a signed graph into the optimum number of clusters without specifying that number in advance. The main goal in this type of clustering is to minimize the number of…
We study learning-augmented streaming algorithms for estimating the value of MAX-CUT in a graph. In the classical streaming model, while a $1/2$-approximation for estimating the value of MAX-CUT can be trivially achieved with $O(1)$ words…
In this paper, we study the problem of finding a maximum matching in the semi-streaming model when edges arrive in a random order. In the semi-streaming model, an algorithm receives a stream of edges and it is allowed to have a memory of…
Many real-world applications pose challenges in incorporating fairness constraints into the $k$-center clustering problem, where the dataset consists of $m$ demographic groups, each with a specified upper bound on the number of centers to…
Traditional graph-based semi-supervised learning (SSL) approaches, even though widely applied, are not suited for massive data and large label scenarios since they scale linearly with the number of edges $|E|$ and distinct labels $m$. To…
We explore the use of local algorithms in the design of streaming algorithms for the Maximum Directed Cut problem. Specifically, building on the local algorithm of Buchbinder et al. (FOCS'12) and Censor-Hillel et al. (ALGOSENSORS'17), we…
In machine learning, correlation clustering is an important problem whose goal is to partition the individuals into groups that correlate with their pairwise similarities as much as possible. In this work, we revisit the correlation…
We study the correlation clustering problem in the node-arrival data stream model. Unlike previous work, where the stream consists of the graph's edges, we focus on the setting in which the stream contains only the nodes. This model better…
We introduce a new computational model for data streams: asymptotically exact streaming algorithms. These algorithms have an approximation ratio that tends to one as the length of the stream goes to infinity while the memory used by the…
There has been a recent explosion in the size of stored data, partially due to advances in storage technology, and partially due to the growing popularity of cloud-computing and the vast quantities of data generated. This motivates the need…
We consider the problem of estimating the value of max cut in a graph in the streaming model of computation. At one extreme, there is a trivial $2$-approximation for this problem that uses only $O(\log n)$ space, namely, count the number of…
Graph partitioning plays a vital role in distributedlarge-scale web graph analytics, such as pagerank and labelpropagation. The quality and scalability of partitioning strategyhave a strong impact on such communication- and…
In the correlation clustering problem for complete signed graphs, the input is a complete signed graph with edges weighted as $+1$ (denote recommendation to put this pair in the same cluster) or $-1$ (recommending to put this pair of…
We present a new approach for solving (minimum disagreement) correlation clustering that results in sublinear algorithms with highly efficient time and space complexity for this problem. In particular, we obtain the following algorithms for…
Triangle counting and sampling are two fundamental problems for streaming algorithms. Arguably, designing sampling algorithms is more challenging than their counting variants. It may be noted that triangle counting has received far greater…