Related papers: maxDNN: An Efficient Convolution Kernel for Deep L…

Fast convolution kernels on pascal GPU with high memory efficiency

The convolution computation is widely used in many fields, especially in CNNs. Because of the rapid growth of the training data in CNNs, GPUs have been used for the acceleration, and memory-efficient algorithms are focused because of thier…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-02 Qiong Chang , Masaki Onishi , Tsutomu Maruyama

cuConv: A CUDA Implementation of Convolution for CNN Inference

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-28 Marc Jordà , Pedro Valero-Lara , Antonio J. Peña

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

In recent years deep learning algorithms have shown extremely high performance on machine learning tasks such as image classification and speech recognition. In support of such applications, various FPGA accelerator architectures have been…

Machine Learning · Computer Science 2017-05-09 Xinyu Zhang , Srinjoy Das , Ojash Neopane , Ken Kreutz-Delgado

Energy-based Tuning of Convolutional Neural Networks on Multi-GPUs

Deep Learning (DL) applications are gaining momentum in the realm of Artificial Intelligence, particularly after GPUs have demonstrated remarkable skills for accelerating their challenging computational requirements. Within this context,…

Computer Vision and Pattern Recognition · Computer Science 2018-08-02 Francisco M. Castro , Nicolás Guil , Manuel J. Marín-Jiménez , Jesús Pérez-Serrano , Manuel Ujaldón

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs

Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-13 Chao Li , Yi Yang , Min Feng , Srimat Chakradhar , Huiyang Zhou

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-31 Xiaoming Chen , Jianxu Chen , Danny Z. Chen , Xiaobo Sharon Hu

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation…

Machine Learning · Computer Science 2022-01-03 Marcin Pietroń , Dominik Żurek

{\mu}-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning. Specifically, cuDNN implements several equivalent convolution algorithms, whose performance and memory footprint may vary considerably,…

Machine Learning · Computer Science 2018-04-16 Yosuke Oyama , Tal Ben-Nun , Torsten Hoefler , Satoshi Matsuoka

Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal

Deep Convolutional Neural Networks (CNNs) are widely employed in modern computer vision algorithms, where the input image is convolved iteratively by many kernels to extract the knowledge behind it. However, with the depth of convolutional…

Computer Vision and Pattern Recognition · Computer Science 2018-04-11 Chih-Ting Liu , Yi-Heng Wu , Yu-Sheng Lin , Shao-Yi Chien

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to…

Computer Vision and Pattern Recognition · Computer Science 2017-06-13 David Budden , Alexander Matveev , Shibani Santurkar , Shraman Ray Chaudhuri , Nir Shavit

MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training

In the acceleration of deep neural network training, the GPU has become the mainstream platform. GPUs face substantial challenges on GNNs, such as workload imbalance and memory access irregularities, leading to underutilized hardware.…

Machine Learning · Computer Science 2024-03-20 Hongwu Peng , Xi Xie , Kaustubh Shivdikar , MD Amit Hasan , Jiahui Zhao , Shaoyi Huang , Omer Khan , David Kaeli , Caiwen Ding

ZNNi - Maximizing the Inference Throughput of 3D Convolutional Networks on Multi-Core CPUs and GPUs

Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation, and object detection and localization. Here we consider the problem of inference, the application of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-21 Aleksandar Zlateski , Kisuk Lee , H. Sebastian Seung

PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks

Convolutional neural networks (CNNs) have been widely employed in many applications such as image classification, video analysis and speech recognition. Being compute-intensive, CNN computations are mainly accelerated by GPUs with high…

Hardware Architecture · Computer Science 2016-11-09 Dong Wang , Jianjing An , Ke Xu

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Convolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, neural machine translation and speech…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-22 Evangelos Georganas , Sasikanth Avancha , Kunal Banerjee , Dhiraj Kalamkar , Greg Henry , Hans Pabst , Alexander Heinecke

Accelerating convolutional neural network by exploiting sparsity on GPUs

Convolutional neural network (CNN) is an important deep learning method. The convolution operation takes a large proportion of the total execution time for CNN. Feature maps for convolution operation are usually sparse. Multiplications and…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-01 Weizhi Xu , Yintai Sun , fhengyu Fan , Hui Yu , Xin Fu

Optimizing Temporal Convolutional Network inference on FPGA-based accelerators

Convolutional Neural Networks are extensively used in a wide range of applications, commonly including computer vision tasks like image and video classification, recognition, and segmentation. Recent research results demonstrate that…

Signal Processing · Electrical Eng. & Systems 2020-05-11 Marco Carreras , Gianfranco Deriu , Luigi Raffo , Luca Benini , Paolo Meloni

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural…

Machine Learning · Computer Science 2018-02-20 Yanzhi Wang , Caiwen Ding , Zhe Li , Geng Yuan , Siyu Liao , Xiaolong Ma , Bo Yuan , Xuehai Qian , Jian Tang , Qinru Qiu , Xue Lin

MobileXNet: An Efficient Convolutional Neural Network for Monocular Depth Estimation

Depth is a vital piece of information for autonomous vehicles to perceive obstacles. Due to the relatively low price and small size of monocular cameras, depth estimation from a single RGB image has attracted great interest in the research…

Robotics · Computer Science 2021-11-25 Xingshuai Dong , Matthew A. Garratt , Sreenatha G. Anavatti , Hussein A. Abbass

GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation

Time, cost, and energy efficiency are critical considerations in Deep-Learning (DL), particularly when processing long texts. Transformers, which represent the current state of the art, exhibit quadratic computational complexity relative to…

Computation and Language · Computer Science 2025-07-11 Fardin Rastakhiz

Towards a Uniform Architecture for the Efficient Implementation of 2D and 3D Deconvolutional Neural Networks on FPGAs

Three-dimensional deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating 2D deconvolutional neural networks (DCNNs) on FPGAs, while the acceleration of 3D DCNNs has…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-08 Deguang Wang , Junzhong Shen , Mei Wen , Chunyuan Zhang