Related papers: Sparse GPU Kernels for Deep Learning
As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero…
Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, self-tunable,…
In this paper, we use graphics processing units(GPU) to accelerate sparse and arbitrary structured neural networks. Sparse networks have nodes in the network that are not fully connected with nodes in preceding and following layers, and…
This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory…
Sparse data structures are commonly used in neural networks to reduce the memory footprint. These data structures are compact but cause irregularities such as random memory accesses, which prevent efficient use of the memory hierarchy. GPUs…
Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse…
The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and…
Most, if not all the modern scientific simulation packages utilize matrix algebra operations. Among the operation of the linear algebra, one of the most important kernels is the multiplication of matrices, dense and sparse. Examples of…
Machine learning is increasingly used to improve decisions within branch-and-bound algorithms for mixed-integer programming. Many existing approaches rely on deep learning, which often requires very large training datasets and substantial…
This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation…
The exponentially growing model size drives the continued success of deep learning, but it brings prohibitive computation and memory cost. From the algorithm perspective, model sparsification and quantization have been studied to alleviate…
Deep neural networks with lots of parameters are typically used for large-scale computer vision tasks such as image classification. This is a result of using dense matrix multiplications and convolutions. However, sparse computations are…
Graph neural networks have become increasingly popular in recent years due to their ability to naturally encode relational input data and their ability to scale to large graphs by operating on a sparse representation of graph adjacency…
Knowledge graph (KG) learning offers a powerful framework for generating new knowledge and making inferences. Training KG embedding can take a significantly long time, especially for larger datasets. Our analysis shows that the gradient…
In this paper, we focus on three sparse matrix operations that are relevant for machine learning applications, namely, the sparse-dense matrix multiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM), and the composition…
Sparse linear algebra kernels play a critical role in numerous applications, covering from exascale scientific simulation to large-scale data analytics. Offloading linear algebra kernels on one GPU will no longer be viable in these…
Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly…
Parallel training of neural networks at scale is challenging due to significant overheads arising from communication. Recently, deep learning researchers have developed a variety of pruning algorithms that are capable of pruning (i.e.…
We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…