Related papers: GraphCage: Cache Aware Graph Processing on GPUs
Modern hardware systems are heavily underutilized when running large-scale graph applications. While many in-memory graph frameworks have made substantial progress in optimizing these applications, we show that it is still possible to…
Not only with the large host memory for supporting large scale graph processing, GPU-accelerated heterogeneous architecture can also provide a great potential for high-performance computing. However, few existing heterogeneous systems can…
Graph algorithms and techniques are increasingly being used in scientific and commercial applications to express relations and explore large data sets. Although conventional or commodity computer architectures, like CPU or GPU, can compute…
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine…
High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel…
The performance of graph programs depends highly on the algorithm, the size and structure of the input graphs, as well as the features of the underlying hardware. No single set of optimizations or one hardware platform works well across all…
Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…
The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However,…
Graph processing on GPUs is gaining momentum due to the high throughputs observed compared to traditional CPUs, attributed to the vast number of processing cores on GPUs that can exploit parallelism in graph analytics. This paper discusses…
Designing flexible graph kernels that can run well on various platforms is a crucial research problem due to the frequent usage of graphs for modeling data and recent architectural advances and variety. In this work, we propose a novel…
Past decade has seen the development of many shared-memory graph processing frameworks, intended to reduce the effort of developing high performance parallel applications. However many of these frameworks, based on Vertex-centric or…
For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library.…
In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…
Extensive prior research has focused on alleviating the characteristic poor cache locality of graph analytics workloads. However, graph pre-processing tasks remain relatively unexplored. In many important scenarios, graph pre-processing…
Mining large graphs for information is becoming an increasingly important workload due to the plethora of graph structured data becoming available. An aspect of graph algorithms that has hitherto not received much interest is the effect of…
Processing large graphs with memory-limited GPU needs to resolve issues of host-GPU data transfer, which is a key performance bottleneck. Existing GPU-accelerated graph processing frameworks reduce the data transfers by managing the active…
Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly…
Structural clustering is one of the most popular graph clustering methods, which has achieved great performance improvement by utilizing GPUs. Even though, the state-of-the-art GPU-based structural clustering algorithm, GPUSCAN, still…
Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused by extensive data movement between the…
Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we…