Related papers: Fast convolution kernels on pascal GPU with high m…

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-31 Xiaoming Chen , Jianxu Chen , Danny Z. Chen , Xiaobo Sharon Hu

Accelerating convolutional neural network by exploiting sparsity on GPUs

Convolutional neural network (CNN) is an important deep learning method. The convolution operation takes a large proportion of the total execution time for CNN. Feature maps for convolution operation are usually sparse. Multiplications and…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-01 Weizhi Xu , Yintai Sun , fhengyu Fan , Hui Yu , Xin Fu

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs

Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-13 Chao Li , Yi Yang , Min Feng , Srimat Chakradhar , Huiyang Zhou

cuConv: A CUDA Implementation of Convolution for CNN Inference

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-28 Marc Jordà , Pedro Valero-Lara , Antonio J. Peña

PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks

Convolutional neural networks (CNNs) have been widely employed in many applications such as image classification, video analysis and speech recognition. Being compute-intensive, CNN computations are mainly accelerated by GPUs with high…

Hardware Architecture · Computer Science 2016-11-09 Dong Wang , Jianjing An , Ke Xu

maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures. The design combines…

Neural and Evolutionary Computing · Computer Science 2015-02-03 Andrew Lavin

Optimizing Temporal Convolutional Network inference on FPGA-based accelerators

Convolutional Neural Networks are extensively used in a wide range of applications, commonly including computer vision tasks like image and video classification, recognition, and segmentation. Recent research results demonstrate that…

Signal Processing · Electrical Eng. & Systems 2020-05-11 Marco Carreras , Gianfranco Deriu , Luigi Raffo , Luca Benini , Paolo Meloni

Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal

Deep Convolutional Neural Networks (CNNs) are widely employed in modern computer vision algorithms, where the input image is convolved iteratively by many kernels to extract the knowledge behind it. However, with the depth of convolutional…

Computer Vision and Pattern Recognition · Computer Science 2018-04-11 Chih-Ting Liu , Yi-Heng Wu , Yu-Sheng Lin , Shao-Yi Chien

Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs

Accelerating the deep learning inference is very important for real-time applications. In this paper, we propose a novel method to fuse the layers of convolutional neural networks (CNNs) on Graphics Processing Units (GPUs), which applies…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-30 Xueying Wang , Guangli Li , Xiao Dong , Jiansong Li , Lei Liu , Xiaobing Feng

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models…

Machine Learning · Computer Science 2020-02-21 Valentin Radu , Kuba Kaszyk , Yuan Wen , Jack Turner , Jose Cano , Elliot J. Crowley , Bjorn Franke , Amos Storkey , Michael O'Boyle

Hyper-Convolutions via Implicit Kernels for Medical Imaging

The convolutional neural network (CNN) is one of the most commonly used architectures for computer vision tasks. The key building block of a CNN is the convolutional kernel that aggregates information from the pixel neighborhood and shares…

Image and Video Processing · Electrical Eng. & Systems 2022-02-08 Tianyu Ma , Alan Q. Wang , Adrian V. Dalca , Mert R. Sabuncu

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation…

Machine Learning · Computer Science 2022-01-03 Marcin Pietroń , Dominik Żurek

Maximizing CNN Accelerator Efficiency Through Resource Partitioning

Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs.…

Hardware Architecture · Computer Science 2018-04-13 Yongming Shen , Michael Ferdman , Peter Milder

Unified Kernel-Segregated Transpose Convolution Operation

The optimization of the transpose convolution layer for deep learning applications is achieved with the kernel segregation mechanism. However, kernel segregation has disadvantages, such as computing extra elements to obtain the output…

Machine Learning · Computer Science 2025-03-03 Vijay Srinivas Tida , Md Imran Hossen , Liqun Shan , Sai Venkatesh Chilukoti , Sonya Hsu , Xiali Hei

CARLA: A Convolution Accelerator with a Reconfigurable and Low-Energy Architecture

Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required…

Hardware Architecture · Computer Science 2020-10-05 Mehdi Ahmadi , Shervin Vakili , J. M. Pierre Langlois

Dynamic Convolution: Attention over Convolution Kernels

Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited…

Computer Vision and Pattern Recognition · Computer Science 2020-04-02 Yinpeng Chen , Xiyang Dai , Mengchen Liu , Dongdong Chen , Lu Yuan , Zicheng Liu

Energy-based Tuning of Convolutional Neural Networks on Multi-GPUs

Deep Learning (DL) applications are gaining momentum in the realm of Artificial Intelligence, particularly after GPUs have demonstrated remarkable skills for accelerating their challenging computational requirements. Within this context,…

Computer Vision and Pattern Recognition · Computer Science 2018-08-02 Francisco M. Castro , Nicolás Guil , Manuel J. Marín-Jiménez , Jesús Pérez-Serrano , Manuel Ujaldón

Parallel Multi Channel Convolution using General Matrix Multiplication

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve…

Computer Vision and Pattern Recognition · Computer Science 2017-07-04 Aravind Vasudevan , Andrew Anderson , David Gregg

FFCNN: Fast FPGA based Acceleration for Convolution neural network inference

We present a new efficient OpenCL-based Accelerator for large scale Convolutional Neural Networks called Fast Inference on FPGAs for Convolution Neural Network (FFCNN). FFCNN is based on a deeply pipelined OpenCL kernels architecture. As…

Machine Learning · Computer Science 2022-08-30 F. Keddous , H-N. Nguyen , A. Nakib

Computational optimization of convolutional neural networks using separated filters architecture

This paper considers a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Usage of convolutional neural networks (CNN) is the standard approach to image recognition…

Computer Vision and Pattern Recognition · Computer Science 2020-02-19 Elena Limonova , Alexander Sheshkus , Dmitry Nikolaev