English
Related papers

Related papers: Distributed-Memory Sparse Kernels for Machine Lear…

200 papers

We develop a fused matrix multiplication kernel that unifies sampled dense-dense matrix multiplication and sparse-dense matrix multiplication under a single operation called FusedMM. By using user-defined functions, FusedMM can capture…

Machine Learning · Computer Science 2021-10-28 Md. Khaledur Rahman , Majedul Haque Sujon , Ariful Azad

Existing 3D algorithms for distributed-memory sparse kernels suffer from limited scalability due to reliance on bulk sparsity-agnostic communication. While easier to use, sparsity-agnostic communication leads to unnecessary bandwidth and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-01 Nabil Abubaker , Torsten Hoefler

In this paper, we focus on three sparse matrix operations that are relevant for machine learning applications, namely, the sparse-dense matrix multiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM), and the composition…

Machine Learning · Computer Science 2023-11-02 Mohammad Zubair , Christoph Bauinger

We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, called TS-SpGEMM, has important applications in multi-source breadth-first search,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-23 Isuru Ranawaka , Md Taufique Hussain , Charles Block , Gerasimos Gerogiannis , Josep Torrellas , Ariful Azad

Distributed Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental operation in high-performance computing and deep learning applications. The major performance bottleneck in distributed SpMM lies in substantial communication overhead,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-14 Chen Zhuang , Lingqi Zhang , Benjamin Brock , Du Wu , Peng Chen , Toshio Endo , Satoshi Matsuoka , Mohamed Wahib

Multiplying two sparse matrices (SpGEMM) is a common computational primitive used in many areas including graph algorithms, bioinformatics, algebraic multigrid solvers, and randomized sketching. Distributed-memory parallel algorithms for…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-28 Yuxi Hong , Aydin Buluc

Sparse Matrix-matrix Multiplication (SpMM) and Sampled Dense-dense Matrix Multiplication (SDDMM) are important sparse operators in scientific computing and deep learning. Tensor Core Units (TCUs) enhance modern accelerators with superior…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-17 Jinliang Shi , Shigang Li , Youxuan Xu , Rongtian Fu , Xueying Wang , Tong Wu

Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. In this paper, we consider SpGEMMs performed on hundreds of thousands of processors generating…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-19 Md Taufique Hussain , Oguz Selvitopi , Aydin Buluç , Ariful Azad

Graph Neural Networks (GNNs) are a computationally efficient method to learn embeddings and classifications on graph data. However, GNN training has low computational intensity, making communication costs the bottleneck for scalability.…

Machine Learning · Computer Science 2025-04-08 Ujjaini Mukhodopadhyay , Alok Tripathy , Oguz Selvitopi , Katherine Yelick , Aydin Buluc

Sparse attention is a core building block in many leading neural network models, from graph-structured learning to sparse sequence modeling. It can be decomposed into a sequence of three sparse matrix operations (3S): sampled dense-dense…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-14 Zitong Li , Aparna Chandramowlishwaran

Adaptive moment estimation (Adam), as a Stochastic Gradient Descent (SGD) variant, has gained widespread popularity in federated learning (FL) due to its fast convergence. However, federated Adam (FedAdam) algorithms suffer from a threefold…

Machine Learning · Computer Science 2025-09-22 Xiumei Deng , Jun Li , Kang Wei , Long Shi , Zehui Xiong , Ming Ding , Wen Chen , Shi Jin , H. Vincent Poor

Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Da Zheng , Disa Mhembere , Vince Lyzinski , Joshua Vogelstein , Carey E. Priebe , Randal Burns

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Guyue Huang , Guohao Dai , Yu Wang , Yufei Ding , Yuan Xie

In recent years, novel AI accelerators have emerged as promising alternatives to GPU for AI model training and inference tasks. One such accelerator, the Cerebras CS-3, achieves strong performance on large model training as well as…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-01 Milan Shah , Sheng Di , Michela Becchi

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-13 Carl Yang , Aydin Buluc , John D. Owens

The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructured sparse matrices, which are more…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-27 Kobe Bergmans , Karl Meerbergen , Raf Vandebril

Sparse matrix multiplication is an important kernel for large-scale graph processing and other data-intensive applications. In this paper, we implement various asynchronous, RDMA-based sparse times dense (SpMM) and sparse times sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-06 Benjamin Brock , Aydın Buluç , Katherine Yelick

Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-14 Aditya Devarakonda , Ramakrishnan Kannan

Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like scientific computing and machine learning. However, existing works under-look the performance optimization of SpDM on modern many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-01 Shaohuai Shi , Qiang Wang , Xiaowen Chu

Sparse matrix-dense matrix multiplication (SpMM) is a critical kernel in scientific computing, graph analytics, and machine learning, whose performance is often constrained by memory bandwidth. In this work, we investigate the applicability…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-09 Matthew Qian , Yahia Ramadan , Suhita Anubha , Ariful Azad
‹ Prev 1 2 3 10 Next ›