Related papers: Delayed Asynchronous Iterative Graph Algorithms
We developed a flexible parallel algorithm for graph summarization based on vertex-centric programming and parameterized message passing. The base algorithm supports infinitely many structural graph summary models defined in a formal…
In this paper, we consider the convergence of a very general asynchronous-parallel algorithm called ARock, that takes many well-known asynchronous algorithms as special cases (gradient descent, proximal gradient, Douglas Rachford, ADMM,…
Decentralized and asynchronous communications are two popular techniques to speedup communication complexity of distributed machine learning, by respectively removing the dependency over a central orchestrator and the need for…
On an evolving graph that is continuously updated by a high-velocity stream of edges, how can one efficiently maintain if two vertices are connected? This is the connectivity problem, a fundamental and widely studied problem on graphs. We…
We study dynamic graph algorithms in the Massively Parallel Computation model, which was inspired by practical data processing systems. Our goal is to provide algorithms that can efficiently handle large batches of edge insertions and…
The Bulk Synchronous Parallel(BSP) computational model has emerged as the dominant distributed framework to build large-scale iterative graph processing systems. While its implementations(e.g., Pregel, Giraph, and Hama) achieve high…
Myriad of graph-based algorithms in machine learning and data mining require parsing relational data iteratively. These algorithms are implemented in a large-scale distributed environment in order to scale to massive data sets. To…
With the rapidly growing demand of graph processing in the real scene, they have to efficiently handle massive concurrent jobs. Although existing work enable to efficiently handle single graph processing job, there are plenty of memory…
More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of…
We present a shared memory implementation of a parallel algorithm, called delta-stepping, for solving the single source shortest path problem for directed and undirected graphs. In order to reduce synchronization costs we make some…
Two characteristics that make convex decomposition algorithms attractive are simplicity of operations and generation of parallelizable structures. In principle, these schemes require that all coordinates update at the same time, i.e., they…
Deep neural networks have usually to be compressed and accelerated for their usage in low-power, e.g. mobile, devices. Recently, massively-parallel hardware accelerators were developed that offer high throughput and low latency at low power…
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk…
Graph pattern matching algorithms to handle million-scale dynamic graphs are widely used in many applications such as social network analytics and suspicious transaction detections from financial networks. On the other hand, the computation…
Not only with the large host memory for supporting large scale graph processing, GPU-accelerated heterogeneous architecture can also provide a great potential for high-performance computing. However, few existing heterogeneous systems can…
Currently, many machine learning algorithms contain lots of iterations. When it comes to existing large-scale distributed systems, some slave nodes may break down or have lower efficiency. Therefore traditional machine learning algorithm…
We reduce the cost of communication and synchronization in graph processing by analyzing the fastest way to process graphs: pushing the updates to a shared state or pulling the updates to a private state.We investigate the applicability of…
Graph Neural Networks (GNNs) have been widely used in various domains, and GNNs with sophisticated computational graph lead to higher latency and larger memory consumption. Optimizing the GNN computational graph suffers from: (1) Redundant…
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation in processing graphs. Recently, size, variety, and structural complexity of these networks has grown dramatically.…
Graphs are a ubiquitous data structure in diverse domains such as machine learning, social networks, and data mining. As real-world graphs continue to grow beyond the memory capacity of single machines, out-of-core graph processing systems…