Related papers: cuDNN: Efficient Primitives for Deep Learning

{\mu}-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning. Specifically, cuDNN implements several equivalent convolution algorithms, whose performance and memory footprint may vary considerably,…

Machine Learning · Computer Science 2018-04-16 Yosuke Oyama , Tal Ben-Nun , Torsten Hoefler , Satoshi Matsuoka

Supervised Learning Based Algorithm Selection for Deep Neural Networks

Many recent deep learning platforms rely on third-party libraries (such as cuBLAS) to utilize the computing power of modern hardware accelerators (such as GPUs). However, we observe that they may achieve suboptimal performance because the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-20 Shaohuai Shi , Pengfei Xu , Xiaowen Chu

PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives

Deep Neural Networks (DNNs) have revolutionized many aspects of our lives. The use of DNNs is becoming ubiquitous including in softwares for image recognition, speech recognition, speech synthesis, language translation, to name a few. he…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-18 Sanket Tavarageri , Alexander Heinecke , Sasikanth Avancha , Gagandeep Goyal , Ramakrishna Upadrasta , Bharat Kaul

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Enabling Massive Deep Neural Networks with the GraphBLAS

Deep Neural Networks (DNNs) have emerged as a core tool for machine learning. The computations performed during DNN training and inference are dominated by operations on the weight matrices describing the DNN. As DNNs incorporate more…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-06 Jeremy Kepner , Manoj Kumar , José Moreira , Pratap Pattnaik , Mauricio Serrano , Henry Tufo

Towards Explainable Deep Neural Networks (xDNN)

In this paper, we propose an elegant solution that is directly addressing the bottlenecks of the traditional deep learning approaches and offers a clearly explainable internal architecture that can outperform the existing methods, requires…

Machine Learning · Computer Science 2019-12-09 Plamen Angelov , Eduardo Soares

Espresso: Efficient Forward Propagation for BCNNs

There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-08 Fabrizio Pedersoli , George Tzanetakis , Andrea Tagliasacchi

Deep Graph Library Optimizations for Intel(R) x86 Architecture

The Deep Graph Library (DGL) was designed as a tool to enable structure learning from graphs, by supporting a core abstraction for graphs, including the popular Graph Neural Networks (GNN). DGL contains implementations of all core graph…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-14 Sasikanth Avancha , Vasimuddin Md , Sanchit Misra , Ramanarayan Mohanty

Designing Interpretable Approximations to Deep Reinforcement Learning

In an ever expanding set of research and application areas, deep neural networks (DNNs) set the bar for algorithm performance. However, depending upon additional constraints such as processing power and execution time limits, or…

Machine Learning · Computer Science 2021-06-22 Nathan Dahlin , Krishna Chaitanya Kalagarla , Nikhil Naik , Rahul Jain , Pierluigi Nuzzo

dMath: A Scalable Linear Algebra and Math Library for Heterogeneous GP-GPU Architectures

A new scalable parallel math library, dMath, is presented in this paper that demonstrates leading scaling when using intranode, or internode, hybrid-parallelism for deep-learning. dMath provides easy-to-use distributed base primitives and a…

Neural and Evolutionary Computing · Computer Science 2016-04-07 Steven Eliuk , Cameron Upright , Anthony Skjellum

QuadraLib: A Performant Quadratic Neural Network Library for Architecture Optimization and Design Exploration

The significant success of Deep Neural Networks (DNNs) is highly promoted by the multiple sophisticated DNN libraries. On the contrary, although some work have proved that Quadratic Deep Neuron Networks (QDNNs) show better non-linearity and…

Machine Learning · Computer Science 2022-04-06 Zirui Xu , Fuxun Yu , Jinjun Xiong , Xiang Chen

PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives

At the heart of deep learning training and inferencing are computationally intensive primitives such as convolutions which form the building blocks of deep neural networks. Researchers have taken two distinct approaches to creating high…

Programming Languages · Computer Science 2020-02-07 Sanket Tavarageri , Alexander Heinecke , Sasikanth Avancha , Gagandeep Goyal , Ramakrishna Upadrasta , Bharat Kaul

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite…

Neural and Evolutionary Computing · Computer Science 2016-11-22 Matthew W. Moskewicz , Ali Jannesari , Kurt Keutzer

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators

KBLAS is a new open source high performance library that provides optimized kernels for a subset of Level 2 BLAS functionalities on CUDA-enabled GPUs. Since performance of dense matrix-vector multiplication is hindered by the overhead of…

Mathematical Software · Computer Science 2014-10-08 Ahmad Abdelfattah , David Keyes , Hatem Ltaief

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the…

Machine Learning · Computer Science 2022-08-30 Marcin Pietroń , Dominik Żurek

Compilation and Optimizations for Efficient Machine Learning on Embedded Systems

Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However,…

Machine Learning · Computer Science 2022-08-29 Xiaofan Zhang , Yao Chen , Cong Hao , Sitao Huang , Yuhong Li , Deming Chen

DLL: A Blazing Fast Deep Neural Network Library

Deep Learning Library (DLL) is a new library for machine learning with deep neural networks that focuses on speed. It supports feed-forward neural networks such as fully-connected Artificial Neural Networks (ANNs) and Convolutional Neural…

Machine Learning · Computer Science 2018-04-15 Baptiste Wicht , Jean Hennebert , Andreas Fischer

Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems

Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput…

Computer Vision and Pattern Recognition · Computer Science 2020-05-21 Miguel de Prado , Nuria Pazos , Luca Benini

Efficient and Robust Mixed-Integer Optimization Methods for Training Binarized Deep Neural Networks

Compared to classical deep neural networks its binarized versions can be useful for applications on resource-limited devices due to their reduction in memory consumption and computational demands. In this work we study deep neural networks…

Optimization and Control · Mathematics 2021-10-26 Jannis Kurtz , Bubacarr Bah

Productive Reproducible Workflows for DNNs: A Case Study for Industrial Defect Detection

As Deep Neural Networks (DNNs) have become an increasingly ubiquitous workload, the range of libraries and tooling available to aid in their development and deployment has grown significantly. Scalable, production quality tools are freely…

Machine Learning · Computer Science 2022-06-22 Perry Gibson , José Cano