Related papers: MG3MConv: Multi-Grained Matrix-Multiplication-Mapp…

ZNNi - Maximizing the Inference Throughput of 3D Convolutional Networks on Multi-Core CPUs and GPUs

Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation, and object detection and localization. Here we consider the problem of inference, the application of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-21 Aleksandar Zlateski , Kisuk Lee , H. Sebastian Seung

MUXConv: Information Multiplexing in Convolutional Neural Networks

Convolutional neural networks have witnessed remarkable improvements in computational efficiency in recent years. A key driving force has been the idea of trading-off model expressivity and efficiency through a combination of $1\times 1$…

Computer Vision and Pattern Recognition · Computer Science 2020-04-08 Zhichao Lu , Kalyanmoy Deb , Vishnu Naresh Boddeti

maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures. The design combines…

Neural and Evolutionary Computing · Computer Science 2015-02-03 Andrew Lavin

Accelerating Transposed Convolutions on FPGA-based Edge Devices

Transposed Convolutions (TCONV) enable the up-scaling mechanism within generative Artificial Intelligence (AI) models. However, the predominant Input-Oriented Mapping (IOM) method for implementing TCONV has complex output mapping,…

Hardware Architecture · Computer Science 2025-07-11 Jude Haris , José Cano

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to…

Computer Vision and Pattern Recognition · Computer Science 2017-06-13 David Budden , Alexander Matveev , Shibani Santurkar , Shraman Ray Chaudhuri , Nir Shavit

ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation

Convolution is a compute-intensive operation placed at the heart of Convolution Neural Networks (CNNs). It has led to the development of many high-performance algorithms, such as Im2col-GEMM, Winograd, and Direct-Convolution. However, the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Lucas Alvarenga , Victor Ferrari , Rafael Souza , Marcio Pereira , Guido Araujo

Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions

In order to handle modern convolutional neural networks (CNNs) efficiently, a hardware architecture of CNN inference accelerator is proposed to handle depthwise convolutions and regular convolutions, which are both essential building blocks…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Tse-Wei Chen , Wei Tao , Deyu Wang , Dongchao Wen , Kinya Osa , Masami Kato

TetrisG-SDK: Efficient Convolutional Layer Mapping with Adaptive Windows and Grouped Convolutions for Fast In-Memory Computing

Shifted-and-Duplicated-Kernel (SDK) mapping has emerged as an effective strategy to accelerate convolutional layers on compute-in-memory (CIM) hardware. However, existing SDK variants (e.g., VWC-SDK) merely optimize mapping for a single CIM…

Hardware Architecture · Computer Science 2026-04-29 Ke Dong , Kejie Huang , Tao Luo , Bo Wang

Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and…

Machine Learning · Computer Science 2019-01-01 Ghouthi Boukli Hacene , Vincent Gripon , Matthieu Arzel , Nicolas Farrugia , Yoshua Bengio

Multi Voxel-Point Neurons Convolution (MVPConv) for Fast and Accurate 3D Deep Learning

We present a new convolutional neural network, called Multi Voxel-Point Neurons Convolution (MVPConv), for fast and accurate 3D deep learning. The previous works adopt either individual point-based features or local-neighboring voxel-based…

Computer Vision and Pattern Recognition · Computer Science 2021-05-03 Wei Zhou , Xin Cao , Xiaodan Zhang , Xingxing Hao , Dekui Wang , Ying He

Using MLIR Transform to Design Sliced Convolution Algorithm

This paper proposes SConvTransform, a Transform dialect extension that provides operations for optimizing 2D convolutions in MLIR. Its main operation, SConvOp, lowers Linalg convolutions into tiled and packed generic operations through a…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Victor Ferrari , Marcio Pereira , Lucas Alvarenga , Gustavo Leite , Guido Araujo

Im2win: An Efficient Convolution Paradigm on GPU

Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Luanzheng Guo , Xu T. Liu

Optimizing Winograd Convolution on ARMv8 processors

As Convolutional Neural Networks (CNNs) gain prominence in deep learning, algorithms like Winograd Convolution have been introduced to enhance computational efficiency. However, existing implementations often face challenges such as high…

Performance · Computer Science 2024-12-30 Haoyuan Gui , Xiaoyu Zhang , Chong Zhang , Zitong Su , Huiyuan Li

cuConv: A CUDA Implementation of Convolution for CNN Inference

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-28 Marc Jordà , Pedro Valero-Lara , Antonio J. Peña

ILP-M Conv: Optimize Convolution Algorithm for Single-Image Convolution Neural Network Inference on Mobile GPUs

Convolution neural networks are widely used for mobile applications. However, GPU convolution algorithms are designed for mini-batch neural network training, the single-image convolution neural network inference algorithm on mobile GPUs is…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-04 Zhuoran Ji

Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions

Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to compute convolutions is known as the Im2Col + BLAS method. This paper proposes SConv:…

Computer Vision and Pattern Recognition · Computer Science 2023-03-09 Victor Ferrari , Rafael Sousa , Marcio Pereira , João P. L. de Carvalho , José Nelson Amaral , José Moreira , Guido Araujo

Performance tuning for deep learning on a many-core processor (master thesis)

Convolutional neural networks (CNNs) are becoming very successful and popular for a variety of applications. The Loki many-core processor architecture is very promising for achieving specialised hardware performance and efficiency while…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-05 Philippos Papaphilippou

fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs

In recent years, Convolutional Neural Networks (ConvNets) have become an enabling technology for a wide range of novel embedded Artificial Intelligence systems. Across the range of applications, the performance needs vary significantly,…

Computer Vision and Pattern Recognition · Computer Science 2017-11-27 Stylianos I. Venieris , Christos-Savvas Bouganis

Multi-objective Evolutionary Approach for Efficient Kernel Size and Shape for CNN

While state-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate, these networks are computationally expensive involving billions of arithmetic operations and parameters. To improve the…

Performance · Computer Science 2021-06-29 Ziwei Wang , Martin A. Trefzer , Simon J. Bale , Andy M. Tyrrell

Projection-based Point Convolution for Efficient Point Cloud Segmentation

Understanding point cloud has recently gained huge interests following the development of 3D scanning devices and the accumulation of large-scale 3D data. Most point cloud processing algorithms can be classified as either point-based or…

Computer Vision and Pattern Recognition · Computer Science 2022-02-07 Pyunghwan Ahn , Juyoung Yang , Eojindl Yi , Chanho Lee , Junmo Kim