Related papers: Advancing Direct Convolution using Convolution Sli…

Using MLIR Transform to Design Sliced Convolution Algorithm

This paper proposes SConvTransform, a Transform dialect extension that provides operations for optimizing 2D convolutions in MLIR. Its main operation, SConvOp, lowers Linalg convolutions into tiled and packed generic operations through a…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Victor Ferrari , Marcio Pereira , Lucas Alvarenga , Gustavo Leite , Guido Araujo

The Indirect Convolution Algorithm

Deep learning frameworks commonly implement convolution operators with GEMM-based algorithms. In these algorithms, convolution is implemented on top of matrix-matrix multiplication (GEMM) functions, provided by highly optimized BLAS…

Computer Vision and Pattern Recognition · Computer Science 2019-07-05 Marat Dukhan

SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution

We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Amir Ofir , Gil Ben-Artzi

ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation

Convolution is a compute-intensive operation placed at the heart of Convolution Neural Networks (CNNs). It has led to the development of many high-performance algorithms, such as Im2col-GEMM, Winograd, and Direct-Convolution. However, the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Lucas Alvarenga , Victor Ferrari , Rafael Souza , Marcio Pereira , Guido Araujo

Im2win: Memory Efficient Convolution On SIMD Architectures

Convolution is the most expensive operation among neural network operations, thus its performance is critical to the overall performance of neural networks. Commonly used convolution approaches, including general matrix multiplication…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Xu T. Liu

ConvBLS: An Effective and Efficient Incremental Convolutional Broad Learning System for Image Classification

Deep learning generally suffers from enormous computational resources and time-consuming training processes. Broad Learning System (BLS) and its convolutional variants have been proposed to mitigate these issues and have achieved superb…

Machine Learning · Computer Science 2023-04-04 Chunyu Lei , C. L. Philip Chen , Jifeng Guo , Tong Zhang

Im2win: An Efficient Convolution Paradigm on GPU

Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Luanzheng Guo , Xu T. Liu

High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures

Convolution is the core component within deep neural networks and it is computationally intensive and time consuming. Tensor data layouts significantly impact convolution operations in terms of memory access and computational efficiency.…

Machine Learning · Computer Science 2024-08-02 Xiang Fu , Xinpeng Zhang , Jixiang Ma , Peng Zhao , Shuai Lu , Xu T. Liu

Selectively Dilated Convolution for Accuracy-Preserving Sparse Pillar-based Embedded 3D Object Detection

Pillar-based 3D object detection has gained traction in self-driving technology due to its speed and accuracy facilitated by the artificial densification of pillars for GPU-friendly processing. However, dense pillar processing fundamentally…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Seongmin Park , Minjae Lee , Junwon Choi , Jungwook Choi

High Performance and Portable Convolution Operators for ARM-based Multicore Processors

The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these…

Performance · Computer Science 2020-05-14 Pablo San Juan , Adrián Castelló , Manuel F. Dolz , Pedro Alonso-Jordá , Enrique S. Quintana-Ortí

Efficient Column-Wise N:M Pruning on RISC-V CPU

In deep learning frameworks, weight pruning is a widely used technique for improving computational efficiency by reducing the size of large models. This is especially critical for convolutional operators, which often act as performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-24 Chi-Wei Chu , Ding-Yong Hong , Jan-Jan Wu

MEC: Memory-efficient Convolution for Deep Neural Network

Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods…

Machine Learning · Computer Science 2017-06-22 Minsik Cho , Daniel Brand

High Performance Convolution Using Sparsity and Patterns for Inference in Deep Convolutional Neural Networks

Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Hossam Amer , Ahmed H. Salamah , Ahmad Sajedi , En-hui Yang

An Energy-Efficient Edge Computing Paradigm for Convolution-based Image Upsampling

A novel energy-efficient edge computing paradigm is proposed for real-time deep learning-based image upsampling applications. State-of-the-art deep learning solutions for image upsampling are currently trained using either resize or…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Ian Colbert , Ken Kreutz-Delgado , Srinjoy Das

Performance evaluation of acceleration of convolutional layers on OpenEdgeCGRA

Recently, efficiently deploying deep learning solutions on the edge has received increasing attention. New platforms are emerging to support the increasing demand for flexibility and high performance. In this work, we explore the efficient…

Hardware Architecture · Computer Science 2024-03-05 Nicolò Carpentieri , Juan Sapriza , Davide Schiavone , Daniele Jahier Pagliari , David Atienza , Maurizio Martina , Alessio Burrello

SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flipping

The channel redundancy in feature maps of convolutional neural networks (CNNs) results in the large consumption of memories and computational resources. In this work, we design a novel Slim Convolution (SlimConv) module to boost the…

Computer Vision and Pattern Recognition · Computer Science 2021-09-08 Jiaxiong Qiu , Cai Chen , Shuaicheng Liu , Bing Zeng

Revisiting the Integration of Convolution and Attention for Vision Backbone

Convolutions (Convs) and multi-head self-attentions (MHSAs) are typically considered alternatives to each other for building vision backbones. Although some works try to integrate both, they apply the two operators simultaneously at the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-22 Lei Zhu , Xinjiang Wang , Wayne Zhang , Rynson W. H. Lau

SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation

LiDAR point-cloud segmentation is an important problem for many applications. For large-scale point cloud segmentation, the \textit{de facto} method is to project a 3D point cloud to get a 2D LiDAR image and use convolutions to process it.…

Computer Vision and Pattern Recognition · Computer Science 2021-04-14 Chenfeng Xu , Bichen Wu , Zining Wang , Wei Zhan , Peter Vajda , Kurt Keutzer , Masayoshi Tomizuka

MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor

As the core of artificial intelligence applications, the research of convolution has become a hot topic in high performance computing. With the rapid development of the emerging SW26010 processor in artificial intelligence, there is an…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-12 Zheng Wu

Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications

Convolution and cross-correlation are the basis of filtering and pattern or template matching in multimedia signal processing. We propose two throughput scaling options for any one-dimensional convolution kernel in programmable processors…

Multimedia · Computer Science 2012-01-17 Mohammad Ashraful Anam , Yiannis Andreopoulos