Related papers: Kernel-Segregated Transpose Convolution Operation

Unified Kernel-Segregated Transpose Convolution Operation

The optimization of the transpose convolution layer for deep learning applications is achieved with the kernel segregation mechanism. However, kernel segregation has disadvantages, such as computing extra elements to obtain the output…

Machine Learning · Computer Science 2025-03-03 Vijay Srinivas Tida , Md Imran Hossen , Liqun Shan , Sai Venkatesh Chilukoti , Sonya Hsu , Xiali Hei

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-31 Xiaoming Chen , Jianxu Chen , Danny Z. Chen , Xiaobo Sharon Hu

Network Deconvolution

Convolution is a central operation in Convolutional Neural Networks (CNNs), which applies a kernel to overlapping regions shifted across the image. However, because of the strong correlations in real-world image data, convolutional kernels…

Machine Learning · Computer Science 2020-02-27 Chengxi Ye , Matthew Evanusa , Hua He , Anton Mitrokhin , Tom Goldstein , James A. Yorke , Cornelia Fermüller , Yiannis Aloimonos

Optimization of XNOR Convolution for Binary Convolutional Neural Networks on GPU

Binary convolutional networks have lower computational load and lower memory foot-print compared to their full-precision counterparts. So, they are a feasible alternative for the deployment of computer vision applications on limited…

Computer Vision and Pattern Recognition · Computer Science 2020-07-29 Mete Can Kaya , Alperen İnci , Alptekin Temizel

Computational optimization of convolutional neural networks using separated filters architecture

This paper considers a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Usage of convolutional neural networks (CNN) is the standard approach to image recognition…

Computer Vision and Pattern Recognition · Computer Science 2020-02-19 Elena Limonova , Alexander Sheshkus , Dmitry Nikolaev

Reduce Computational Complexity for Convolutional Layers by Skipping Zeros

Convolutional neural networks necessitate good algorithms to reduce complexity, and sufficient utilization of parallel processors for acceleration. Within convolutional layers, there are three types of operators: convolution used in forward…

Machine Learning · Computer Science 2024-08-27 Zhiyi Zhang , Pengfei Zhang , Zhuopin Xu , Qi Wang

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to…

Computer Vision and Pattern Recognition · Computer Science 2017-06-13 David Budden , Alexander Matveev , Shibani Santurkar , Shraman Ray Chaudhuri , Nir Shavit

Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal

Deep Convolutional Neural Networks (CNNs) are widely employed in modern computer vision algorithms, where the input image is convolved iteratively by many kernels to extract the knowledge behind it. However, with the depth of convolutional…

Computer Vision and Pattern Recognition · Computer Science 2018-04-11 Chih-Ting Liu , Yi-Heng Wu , Yu-Sheng Lin , Shao-Yi Chien

High Performance Zero-Memory Overhead Direct Convolutions

The computation of convolution layers in deep neural networks typically rely on high performance routines that trade space for time by using additional memory (either for packing purposes or required as part of the algorithm) to improve…

Machine Learning · Computer Science 2018-09-28 Jiyuan Zhang , Franz Franchetti , Tze Meng Low

Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores

Convolution is one of the fundamental operations of deep neural networks with demanding matrix computation. In a graphic processing unit (GPU), Tensor Core is a specialized matrix processing hardware equipped with reduced-precision…

Machine Learning · Computer Science 2022-02-25 Junkyeong Choi , Hyucksung Kwon , Woongkyu Lee , Jungwook Choi , Jieun Lim

Accelerating convolutional neural network by exploiting sparsity on GPUs

Convolutional neural network (CNN) is an important deep learning method. The convolution operation takes a large proportion of the total execution time for CNN. Feature maps for convolution operation are usually sparse. Multiplications and…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-01 Weizhi Xu , Yintai Sun , fhengyu Fan , Hui Yu , Xin Fu

Fast convolution kernels on pascal GPU with high memory efficiency

The convolution computation is widely used in many fields, especially in CNNs. Because of the rapid growth of the training data in CNNs, GPUs have been used for the acceleration, and memory-efficient algorithms are focused because of thier…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-02 Qiong Chang , Masaki Onishi , Tsutomu Maruyama

An Energy-Efficient Edge Computing Paradigm for Convolution-based Image Upsampling

A novel energy-efficient edge computing paradigm is proposed for real-time deep learning-based image upsampling applications. State-of-the-art deep learning solutions for image upsampling are currently trained using either resize or…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Ian Colbert , Ken Kreutz-Delgado , Srinjoy Das

Deep Lookup Network

Convolutional neural networks are constructed with massive operations with different types and are highly computationally intensive. Among these operations, multiplication operation is higher in computational complexity and usually requires…

Computer Vision and Pattern Recognition · Computer Science 2025-09-18 Yulan Guo , Longguang Wang , Wendong Mao , Xiaoyu Dong , Yingqian Wang , Li Liu , Wei An

Separable Convolutions for Optimizing 3D Stereo Networks

Deep learning based 3D stereo networks give superior performance compared to 2D networks and conventional stereo methods. However, this improvement in the performance comes at the cost of increased computational complexity, thus making…

Computer Vision and Pattern Recognition · Computer Science 2021-08-24 Rafia Rahim , Faranak Shamsafar , Andreas Zell

HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs

We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as…

Computer Vision and Pattern Recognition · Computer Science 2019-03-26 Pravendra Singh , Vinay Kumar Verma , Piyush Rai , Vinay P. Namboodiri

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

We propose a simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning. Given a layer, we use non-linear least squares to compute a…

Computer Vision and Pattern Recognition · Computer Science 2015-04-27 Vadim Lebedev , Yaroslav Ganin , Maksim Rakhuba , Ivan Oseledets , Victor Lempitsky

cuConv: A CUDA Implementation of Convolution for CNN Inference

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-28 Marc Jordà , Pedro Valero-Lara , Antonio J. Peña

The Indirect Convolution Algorithm

Deep learning frameworks commonly implement convolution operators with GEMM-based algorithms. In these algorithms, convolution is implemented on top of matrix-matrix multiplication (GEMM) functions, provided by highly optimized BLAS…

Computer Vision and Pattern Recognition · Computer Science 2019-07-05 Marat Dukhan

How to Accelerate Capsule Convolutions in Capsule Networks

How to improve the efficiency of routing procedures in CapsNets has been studied a lot. However, the efficiency of capsule convolutions has largely been neglected. Capsule convolution, which uses capsules rather than neurons as the basic…

Artificial Intelligence · Computer Science 2021-04-07 Zhenhua Chen , Xiwen Li , Qian Lou , David Crandall