Related papers: Sparse GPU Kernels for Deep Learning

Accelerating Sparse Deep Neural Networks

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero…

Machine Learning · Computer Science 2021-04-20 Asit Mishra , Jorge Albericio Latorre , Jeff Pool , Darko Stosic , Dusan Stosic , Ganesh Venkatesh , Chong Yu , Paulius Micikevicius

Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining

Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, self-tunable,…

Numerical Analysis · Computer Science 2012-12-24 Xintian Yang , Srinivasan Parthasarathy , Ponnuswamy Sadayappan

GPU Acceleration of Sparse Neural Networks

In this paper, we use graphics processing units(GPU) to accelerate sparse and arbitrary structured neural networks. Sparse networks have nodes in the network that are not fully connected with nodes in preceding and following layers, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-12 Aavaas Gajurel , Sushil J. Louis , Frederick C Harris

At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation

This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-04 Mert Hidayetoglu , Carl Pearson , Vikram Sharma Mailthody , Eiman Ebrahimi , Jinjun Xiong , Rakesh Nagi , Wen-Mei Hwu

A Novel Compiler Transformation for Fast Sparse Matrix Multiplication in GPUs

Sparse data structures are commonly used in neural networks to reduce the memory footprint. These data structures are compact but cause irregularities such as random memory accesses, which prevent efficient use of the memory hierarchy. GPUs…

Programming Languages · Computer Science 2025-06-19 Hossein Albakri , Kazem Cheshmi

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-01 Cong Guo , Bo Yang Hsueh , Jingwen Leng , Yuxian Qiu , Yue Guan , Zehuan Wang , Xiaoying Jia , Xipeng Li , Minyi Guo , Yuhao Zhu

Sparse Computations in Deep Learning Inference

The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and…

Computational Engineering, Finance, and Science · Computer Science 2025-12-03 Ioanna Tasou , Panagiotis Mpakos , Angelos Vlachos , Dionysios Adamopoulos , Georgios Giannakopoulos , Konstantinos Katsikopoulos , Ioannis Karaparisis , Maria Lazou , Spyridon Loukovitis , Areti Mei , Anastasia Poulopoulou , Angeliki Dimitriou , Giorgos Filandrianos , Dimitrios Galanopoulos , Vasileios Karampinis , Ilias Mitsouras , Nikolaos Spanos , Petros Anastasiadis , Ioannis Doudalis , Konstantinos Nikas , George Retsinas , Paraskevi Tzouveli , Christina Giannoula , Nectarios Koziris , Nikela Papadopoulou , Giorgos Stamou , Athanasios Voulodimos , Georgios Goumas

DBCSR: A Library for Dense Matrix Multiplications on Distributed GPU-Accelerated Systems

Most, if not all the modern scientific simulation packages utilize matrix algebra operations. Among the operation of the linear algebra, one of the most important kernels is the multiplication of matrices, dense and sparse. Examples of…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-14 Ilia Sivkov , Alfio Lazzaro , Juerg Hutter

Speeding Up Mixed-Integer Programming Solvers with Sparse Learning for Branching

Machine learning is increasingly used to improve decisions within branch-and-bound algorithms for mixed-integer programming. Many existing approaches rely on deep learning, which often requires very large training datasets and substantial…

Machine Learning · Computer Science 2026-04-02 Selin Bayramoğlu , George L Nemhauser , Nikolaos V Sahinidis

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation…

Machine Learning · Computer Science 2022-01-03 Marcin Pietroń , Dominik Żurek

Efficient Quantized Sparse Matrix Operations on Tensor Cores

The exponentially growing model size drives the continued success of deep learning, but it brings prohibitive computation and memory cost. From the algorithm perspective, model sparsification and quantization have been studied to alleviate…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-09 Shigang Li , Kazuki Osawa , Torsten Hoefler

Training Sparse Neural Networks

Deep neural networks with lots of parameters are typically used for large-scale computer vision tasks such as image classification. This is a result of using dense matrix multiplications and convolutions. However, sparse computations are…

Computer Vision and Pattern Recognition · Computer Science 2016-11-22 Suraj Srinivas , Akshayvarun Subramanya , R. Venkatesh Babu

Fast Training of Sparse Graph Neural Networks on Dense Hardware

Graph neural networks have become increasingly popular in recent years due to their ability to naturally encode relational input data and their ability to scale to large graphs by operating on a sparse representation of graph adjacency…

Machine Learning · Statistics 2019-06-28 Matej Balog , Bart van Merriënboer , Subhodeep Moitra , Yujia Li , Daniel Tarlow

SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations

Knowledge graph (KG) learning offers a powerful framework for generating new knowledge and making inferences. Training KG embedding can take a significantly long time, especially for larger datasets. Our analysis shows that the gradient…

Machine Learning · Computer Science 2025-05-01 Md Saidul Hoque Anik , Ariful Azad

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

In this paper, we focus on three sparse matrix operations that are relevant for machine learning applications, namely, the sparse-dense matrix multiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM), and the composition…

Machine Learning · Computer Science 2023-11-02 Mohammad Zubair , Christoph Bauinger

MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems

Sparse linear algebra kernels play a critical role in numerous applications, covering from exascale scientific simulation to large-scale data analytics. Offloading linear algebra kernels on one GPU will no longer be viable in these…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-19 Jieyang Chen , Chenhao Xie , Jesun S Firoz , Jiajia Li , Shuaiwen Leon Song , Kevin Barker , Mark Raugas , Ang Li

Benchmarking GPU and TPU Performance with Graph Neural Networks

Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly…

Machine Learning · Computer Science 2022-10-25 xiangyang Ju , Yunsong Wang , Daniel Murnane , Nicholas Choma , Steven Farrell , Paolo Calafiura

Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training

Parallel training of neural networks at scale is challenging due to significant overheads arising from communication. Recently, deep learning researchers have developed a variety of pruning algorithms that are capable of pruning (i.e.…

Machine Learning · Computer Science 2023-05-16 Siddharth Singh , Abhinav Bhatele

Sparse Networks from Scratch: Faster Training without Losing Performance

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…

Machine Learning · Computer Science 2019-08-27 Tim Dettmers , Luke Zettlemoyer

Design Principles for Sparse Matrix Multiplication on the GPU

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-13 Carl Yang , Aydin Buluc , John D. Owens