Related papers: Fast convolution kernels on pascal GPU with high m…
Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even…
Convolutional neural network (CNN) is an important deep learning method. The convolution operation takes a large proportion of the total execution time for CNN. Feature maps for convolution operation are usually sparse. Multiplications and…
Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive…
Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in…
Convolutional neural networks (CNNs) have been widely employed in many applications such as image classification, video analysis and speech recognition. Being compute-intensive, CNN computations are mainly accelerated by GPUs with high…
This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures. The design combines…
Convolutional Neural Networks are extensively used in a wide range of applications, commonly including computer vision tasks like image and video classification, recognition, and segmentation. Recent research results demonstrate that…
Deep Convolutional Neural Networks (CNNs) are widely employed in modern computer vision algorithms, where the input image is convolved iteratively by many kernels to extract the knowledge behind it. However, with the depth of convolutional…
Accelerating the deep learning inference is very important for real-time applications. In this paper, we propose a novel method to fuse the layers of convolutional neural networks (CNNs) on Graphics Processing Units (GPUs), which applies…
Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models…
The convolutional neural network (CNN) is one of the most commonly used architectures for computer vision tasks. The key building block of a CNN is the convolutional kernel that aggregates information from the pixel neighborhood and shares…
This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation…
Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs.…
The optimization of the transpose convolution layer for deep learning applications is achieved with the kernel segregation mechanism. However, kernel segregation has disadvantages, such as computing extra elements to obtain the output…
Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required…
Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited…
Deep Learning (DL) applications are gaining momentum in the realm of Artificial Intelligence, particularly after GPUs have demonstrated remarkable skills for accelerating their challenging computational requirements. Within this context,…
Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve…
We present a new efficient OpenCL-based Accelerator for large scale Convolutional Neural Networks called Fast Inference on FPGAs for Convolution Neural Network (FFCNN). FFCNN is based on a deeply pipelined OpenCL kernels architecture. As…
This paper considers a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Usage of convolutional neural networks (CNN) is the standard approach to image recognition…