Related papers: Graph Transformation and Specialized Code Generati…

A Graph Transformation Strategy for Optimizing SpTRSV

Sparse triangular solve (SpTRSV) is an extensively studied computational kernel. An important obstacle in parallel SpTRSV implementations is that in some parts of a sparse matrix the computation is serial. By transforming the dependency…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-14 Buse Yılmaz , Abdülkadir Furkan Yıldız

Sparsity-Specific Code Optimization using Expression Trees

We introduce a code generator that converts unoptimized C++ code operating on sparse data into vectorized and parallel CPU or GPU kernels. Our approach unrolls the computation into a massive expression graph, performs redundant expression…

Programming Languages · Computer Science 2022-03-15 Philipp Herholz , Xuan Tang , Teseo Schneider , Shoaib Kamil , Daniele Panozzo , Olga Sorkine-Hornung

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures

Designing efficient and scalable sparse linear algebra kernels on modern multi-GPU based HPC systems is a daunting task due to significant irregular memory references and workload imbalance across the GPUs. This is particularly the case for…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-15 Chenhao Xie , Jieyang Chen , Jesun S Firoz , Jiajia Li , Shuaiwen Leon Song , Kevin Barker , Mark Raugas , Ang Li

Composing Loop-carried Dependence with Other Loops

Sparse fusion is a compile-time loop transformation and runtime scheduling implemented as a domain-specific code generator. Sparse fusion generates efficient parallel code for the combination of two sparse matrix kernels where at least one…

Programming Languages · Computer Science 2021-11-25 Kazem Cheshmi , Michelle Mills Strout , Maryam Mehri Dehnavi

Sparse Tensor Algebra as a Parallel Programming Model

Dense and sparse tensors allow the representation of most bulk data structures in computational science applications. We show that sparse tensor algebra can also be used to express many of the transformations on these datasets, especially…

Mathematical Software · Computer Science 2015-12-02 Edgar Solomonik , Torsten Hoefler

Fast Graphlet Transform of Sparse Graphs

We introduce the computational problem of graphlet transform of a sparse large graph. Graphlets are fundamental topology elements of all graphs/networks. They can be used as coding elements to encode graph-topological information at…

Social and Information Networks · Computer Science 2020-09-02 Dimitris Floros , Nikos Pitsianis , Xiaobai Sun

A Generic Graph Sparsification Framework using Deep Reinforcement Learning

The interconnectedness and interdependence of modern graphs are growing ever more complex, causing enormous resources for processing, storage, communication, and decision-making of these graphs. In this work, we focus on the task graph…

Machine Learning · Computer Science 2023-01-16 Ryan Wickman , Xiaofei Zhang , Weizi Li

Graph Conditioned Sparse-Attention for Improved Source Code Understanding

Transformer architectures have been successfully used in learning source code representations. The fusion between a graph representation like Abstract Syntax Tree (AST) and a source code sequence makes the use of current approaches…

Machine Learning · Computer Science 2021-12-06 Junyan Cheng , Iordanis Fostiropoulos , Barry Boehm

SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations

Knowledge graph (KG) learning offers a powerful framework for generating new knowledge and making inferences. Training KG embedding can take a significantly long time, especially for larger datasets. Our analysis shows that the gradient…

Machine Learning · Computer Science 2025-05-01 Md Saidul Hoque Anik , Ariful Azad

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

Sparse tensor algebra is challenging to efficiently parallelize due to the irregular, data-dependent, and potentially skewed structure of sparse computation. We propose the first partitioning algorithm that provably load balances the…

Programming Languages · Computer Science 2026-04-23 Atharva Chougule , Alexander J Root , Rubens Lacouture , Bobby Yan , Rohan Yadav , Fredrik Kjolstad

Sparse Graph Learning from Spatiotemporal Time Series

Outstanding achievements of graph neural networks for spatiotemporal time series analysis show that relational constraints introduce an effective inductive bias into neural forecasting architectures. Often, however, the relational…

Machine Learning · Computer Science 2023-08-03 Andrea Cini , Daniele Zambon , Cesare Alippi

A work-efficient parallel sparse matrix-sparse vector multiplication algorithm

We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-26 Ariful Azad , Aydin Buluc

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Guyue Huang , Guohao Dai , Yu Wang , Yufei Ding , Yuan Xie

MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems

Sparse linear algebra kernels play a critical role in numerous applications, covering from exascale scientific simulation to large-scale data analytics. Offloading linear algebra kernels on one GPU will no longer be viable in these…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-19 Jieyang Chen , Chenhao Xie , Jesun S Firoz , Jiajia Li , Shuaiwen Leon Song , Kevin Barker , Mark Raugas , Ang Li

RSH-SpMM: A Row-Structured Hybrid Kernel for Sparse Matrix-Matrix Multiplication on GPUs

Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental computation in graph analytics, scientific simulation, and sparse deep learning workloads. However, the extreme irregularity of real-world sparse matrices prevents existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-11 Aiying Li , Jingwei Sun , Han Li , Wence Ji , Guangzhong Sun

Vectorizing Sparse Matrix Codes with Dependency Driven Trace Analysis

Sparse computations frequently appear in scientific simulations and the performance of these simulations rely heavily on the optimization of the sparse codes. The compact data structures and irregular computation patterns in sparse matrix…

Programming Languages · Computer Science 2021-12-10 Zachary Cetinic , Kazem Cheshmi , Maryam Mehri Dehnavi

Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

Sparse compiler is a promising solution for sparse tensor algebra optimization. In compiler implementation, reduction in sparse-dense hybrid algebra plays a key role in performance. Though GPU provides various reduction semantics that can…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-10 Genghan Zhang , Yuetong Zhao , Yanting Tao , Zhongming Yu , Guohao Dai , Sitao Huang , Yuan Wen , Pavlos Petoumenos , Yu Wang

Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining

Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, self-tunable,…

Numerical Analysis · Computer Science 2012-12-24 Xintian Yang , Srinivasan Parthasarathy , Ponnuswamy Sadayappan

Efficient Parallel Scheduling for Sparse Triangular Solvers

We develop and analyze new scheduling algorithms for solving sparse triangular linear systems (SpTRSV) in parallel. Our approach produces highly efficient synchronous schedules for the forward- and backward-substitution algorithm. Compared…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-06 Toni Böhnlein , Pál András Papp , Raphael S. Steiner , Christos K. Matzoros , A. N. Yzelman

GRAPHOPT: constrained-optimization-based parallelization of irregular graphs

Sparse, irregular graphs show up in various applications like linear algebra, machine learning, engineering simulations, robotic control, etc. These graphs have a high degree of parallelism, but their execution on parallel threads of modern…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-17 Nimish Shah , Wannes Meert , Marian Verhelst