Related papers: A Streaming Algorithm for Graph Clustering
Graph partitioning plays a vital role in distributedlarge-scale web graph analytics, such as pagerank and labelpropagation. The quality and scalability of partitioning strategyhave a strong impact on such communication- and…
We present CluStRE, a novel streaming graph clustering algorithm that balances computational efficiency with high-quality clustering using multi-stage refinement. Unlike traditional in-memory clustering approaches, CluStRE processes graphs…
Partitioning graphs into blocks of roughly equal size is widely used when processing large graphs. Currently there is a gap in the space of available partitioning algorithms. On the one hand, there are streaming algorithms that have been…
In this paper, we introduce a novel community detection algorithm in graphs, called SCoDA (Streaming Community Detection Algorithm), based on an edge streaming setting. This algorithm has an extremely low memory footprint and a…
Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions…
We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs…
The data stream model has been defined for new classes of applications involving massive data being generated at a fast pace. Web click stream analysis and detection of network intrusions are two examples. Cluster analysis on data streams…
One of the most useful measures of cluster quality is the modularity of a partition, which measures the difference between the number of the edges joining vertices from the same cluster and the expected number of such edges in a random…
Many well-known, real-world problems involve dynamic data which describe the relationship among the entities. Hypergraphs are powerful combinatorial structures that are frequently used to model such data. For many of today's data-centric…
In the recent years, the scale of graph datasets has increased to such a degree that a single machine is not capable of efficiently processing large graphs. Thereby, efficient graph partitioning is necessary for those large graph…
There has been a recent explosion in the size of stored data, partially due to advances in storage technology, and partially due to the growing popularity of cloud-computing and the vast quantities of data generated. This motivates the need…
We study streaming algorithms for Correlation Clustering. Given a graph as an arbitrary-order stream of edges, with each edge labeled as positive or negative, the goal is to partition the vertices into disjoint clusters, such that the…
With the dawn of the Big Data era, data sets are growing rapidly. Data is streaming from everywhere - from cameras, mobile phones, cars, and other electronic devices. Clustering streaming data is a very challenging problem. Unlike the…
The most commonly used method to tackle the graph partitioning problem in practice is the multilevel approach. During a coarsening phase, a multilevel graph partitioning algorithm reduces the graph size by iteratively contracting nodes and…
Current modularity-based community detection algorithms attempt to find cluster memberships that maximize modularity within a fixed graph topology. Diverging from this conventional approach, our work introduces a novel strategy that employs…
Given a stream of heterogeneous graphs containing different types of nodes and edges, how can we spot anomalous ones in real-time while consuming bounded memory? This problem is motivated by and generalizes from its application in security…
We study large-scale, distributed graph clustering. Given an undirected graph, our objective is to partition the nodes into disjoint sets called clusters. A cluster should contain many internal edges while being sparsely connected to other…
Processing large-scale graphs, containing billions of entities, is critical across fields like bioinformatics, high-performance computing, navigation and route planning, among others. Efficient graph partitioning, which divides a graph into…
Real-world graphs often manifest as a massive temporal stream of edges. The need for real-time analysis of such large graph streams has led to progress on low memory, one-pass streaming graph algorithms. These algorithms were designed for…
In this paper, we consider sparse networks consisting of a finite number of non-overlapping communities, i.e. disjoint clusters, so that there is higher density within clusters than across clusters. Both the intra- and inter-cluster edge…