Related papers: Supervised Learning Based Algorithm Selection for …

{\mu}-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning. Specifically, cuDNN implements several equivalent convolution algorithms, whose performance and memory footprint may vary considerably,…

Machine Learning · Computer Science 2018-04-16 Yosuke Oyama , Tal Ben-Nun , Torsten Hoefler , Satoshi Matsuoka

cuDNN: Efficient Primitives for Deep Learning

We present a library of efficient implementations of deep learning primitives. Deep learning workloads are computationally intensive, and optimizing their kernels is difficult and time-consuming. As parallel architectures evolve, kernels…

Neural and Evolutionary Computing · Computer Science 2014-12-19 Sharan Chetlur , Cliff Woolley , Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , Evan Shelhamer

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Towards a learning-based performance modeling for accelerating Deep Neural Networks

Emerging applications such as Deep Learning are often data-driven, thus traditional approaches based on auto-tuners are not performance effective across the wide range of inputs used in practice. In the present paper, we start an…

Machine Learning · Computer Science 2022-12-12 Damiano Perri , Paolo Sylos Labini , Osvaldo Gervasi , Sergio Tasso , Flavio Vella

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation…

Machine Learning · Computer Science 2022-01-03 Marcin Pietroń , Dominik Żurek

Benchmarking State-of-the-Art Deep Learning Software Tools

Deep learning has been shown as a successful machine learning method for a variety of tasks, and its popularity results in numerous open-source deep learning software tools. Training a deep network is usually a very time-consuming process.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-20 Shaohuai Shi , Qiang Wang , Pengfei Xu , Xiaowen Chu

Enabling Massive Deep Neural Networks with the GraphBLAS

Deep Neural Networks (DNNs) have emerged as a core tool for machine learning. The computations performed during DNN training and inference are dominated by operations on the weight matrices describing the DNN. As DNNs incorporate more…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-06 Jeremy Kepner , Manoj Kumar , José Moreira , Pratap Pattnaik , Mauricio Serrano , Henry Tufo

Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

Deep neural networks ( DNNs ) are becoming a key enabling technology for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long…

Machine Learning · Computer Science 2019-11-13 Vicent Sanz Marco , Ben Taylor , Zheng Wang , Yehia Elkhatib

A Meta-Learning Approach to the Optimal Power Flow Problem Under Topology Reconfigurations

Recently, there has been a surge of interest in adopting deep neural networks (DNNs) for solving the optimal power flow (OPF) problem in power systems. Computing optimal generation dispatch decisions using a trained DNN takes significantly…

Machine Learning · Computer Science 2021-09-28 Yexiang Chen , Subhash Lakshminarayana , Carsten Maple , H. Vincent Poor

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the…

Machine Learning · Computer Science 2022-08-30 Marcin Pietroń , Dominik Żurek

maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures. The design combines…

Neural and Evolutionary Computing · Computer Science 2015-02-03 Andrew Lavin

Incremental Learning in Deep Convolutional Neural Networks Using Partial Network Sharing

Deep convolutional neural network (DCNN) based supervised learning is a widely practiced approach for large-scale image classification. However, retraining these large networks to accommodate new, previously unseen data demands high…

Computer Vision and Pattern Recognition · Computer Science 2020-03-26 Syed Shakib Sarwar , Aayush Ankit , Kaushik Roy

TBD: Benchmarking and Analyzing Deep Neural Network Training

The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus is usually very narrow and limited to (i) inference -- i.e. how to…

Machine Learning · Computer Science 2018-04-17 Hongyu Zhu , Mohamed Akrout , Bojian Zheng , Andrew Pelegris , Amar Phanishayee , Bianca Schroeder , Gennady Pekhimenko

Leveraging the HW/SW Optimizations and Ecosystems that Drive the AI Revolution

This paper presents a state-of-the-art overview on how to architect, design, and optimize Deep Neural Networks (DNNs) such that performance is improved and accuracy is preserved. The paper covers a set of optimizations that span the entire…

Machine Learning · Computer Science 2022-08-05 Humberto Carvalho , Pavel Zaykov , Asim Ukaye

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural…

Machine Learning · Computer Science 2018-02-20 Yanzhi Wang , Caiwen Ding , Zhe Li , Geng Yuan , Siyu Liao , Xiaolong Ma , Bo Yuan , Xuehai Qian , Jian Tang , Qinru Qiu , Xue Lin

Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?

Deployment of real-time ML services on warehouse-scale infrastructures is on the increase. Therefore, decreasing latency and increasing throughput of deep neural network (DNN) inference applications that empower those services have…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-29 Seyed Morteza Nabavinejad , Masoumeh Ebrahimi , Sherief Reda

Towards Better Guided Attention and Human Knowledge Insertion in Deep Convolutional Neural Networks

Attention Branch Networks (ABNs) have been shown to simultaneously provide visual explanation and improve the performance of deep convolutional neural networks (CNNs). In this work, we introduce Multi-Scale Attention Branch Networks…

Computer Vision and Pattern Recognition · Computer Science 2023-06-28 Ankit Gupta , Ida-Maria Sintorn

Deeply-Supervised Nets

Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent. We make an attempt to boost the classification performance by studying…

Machine Learning · Statistics 2017-04-26 Chen-Yu Lee , Saining Xie , Patrick Gallagher , Zhengyou Zhang , Zhuowen Tu

Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs

Tabular datasets play a crucial role in various applications. Thus, developing efficient, effective, and widely compatible prediction algorithms for tabular data is important. Currently, two prominent model types, Gradient Boosted Decision…

Machine Learning · Computer Science 2024-07-16 Jiahuan Yan , Jintai Chen , Qianxing Wang , Danny Z. Chen , Jian Wu

MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration

This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy,…

Hardware Architecture · Computer Science 2026-02-11 Zhiqiang Que , Jose G. F. Coutinho , Ce Guo , Hongxiang Fan , Wayne Luk