Related papers: The Indirect Convolution Algorithm

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators

Many of today's deep neural network accelerators, e.g., Google's TPU and NVIDIA's tensor core, are built around accelerating the general matrix multiplication (i.e., GEMM). However, supporting convolution on GEMM-based accelerators is not…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-11 Yangjie Zhou , Mengtian Yang , Cong Guo , Jingwen Leng , Yun Liang , Quan Chen , Minyi Guo , Yuhao Zhu

High Performance and Portable Convolution Operators for ARM-based Multicore Processors

The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these…

Performance · Computer Science 2020-05-14 Pablo San Juan , Adrián Castelló , Manuel F. Dolz , Pedro Alonso-Jordá , Enrique S. Quintana-Ortí

Im2win: An Efficient Convolution Paradigm on GPU

Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Luanzheng Guo , Xu T. Liu

Low-memory GEMM-based convolution algorithms for deep neural networks

Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as…

Computer Vision and Pattern Recognition · Computer Science 2017-09-12 Andrew Anderson , Aravind Vasudevan , Cormac Keane , David Gregg

Parallel Multi Channel Convolution using General Matrix Multiplication

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve…

Computer Vision and Pattern Recognition · Computer Science 2017-07-04 Aravind Vasudevan , Andrew Anderson , David Gregg

MEC: Memory-efficient Convolution for Deep Neural Network

Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods…

Machine Learning · Computer Science 2017-06-22 Minsik Cho , Daniel Brand

SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution

We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Amir Ofir , Gil Ben-Artzi

Im2win: Memory Efficient Convolution On SIMD Architectures

Convolution is the most expensive operation among neural network operations, thus its performance is critical to the overall performance of neural networks. Commonly used convolution approaches, including general matrix multiplication…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Xu T. Liu

Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions

Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to compute convolutions is known as the Im2Col + BLAS method. This paper proposes SConv:…

Computer Vision and Pattern Recognition · Computer Science 2023-03-09 Victor Ferrari , Rafael Sousa , Marcio Pereira , João P. L. de Carvalho , José Nelson Amaral , José Moreira , Guido Araujo

ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation

Convolution is a compute-intensive operation placed at the heart of Convolution Neural Networks (CNNs). It has led to the development of many high-performance algorithms, such as Im2col-GEMM, Winograd, and Direct-Convolution. However, the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Lucas Alvarenga , Victor Ferrari , Rafael Souza , Marcio Pereira , Guido Araujo

LP-GEMM: Integrating Layout Propagation into GEMM Operations

In Scientific Computing and modern Machine Learning (ML) workloads, sequences of dependent General Matrix Multiplications (GEMMs) often dominate execution time. While state-of-the-art BLAS libraries aggressively optimize individual GEMM…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-07 César Guedes Carneiro , Lucas Alvarenga , Guido Araujo , Sandro Rigo

IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers

GEneral Matrix Multiply (GEMM) is a central operation in deep learning and corresponds to the largest chunk of the compute footprint. Therefore, improving its efficiency is an active topic of ongoing research. A popular strategy is the use…

Machine Learning · Computer Science 2024-03-13 Zhanpeng Zeng , Karthikeyan Sankaralingam , Vikas Singh

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-03 Shixun Wu , Yujia Zhai , Jinyang Liu , Jiajun Huang , Zizhe Jian , Bryan M. Wong , Zizhong Chen

High Performance Zero-Memory Overhead Direct Convolutions

The computation of convolution layers in deep neural networks typically rely on high performance routines that trade space for time by using additional memory (either for packing purposes or required as part of the algorithm) to improve…

Machine Learning · Computer Science 2018-09-28 Jiyuan Zhang , Franz Franchetti , Tze Meng Low

Precision-Energy-Throughput Scaling Of Generic Matrix Multiplication and Convolution Kernels Via Linear Projections

Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CONV) kernels often constitute the bulk of the compute- and memory-intensive processing within image/audio recognition and matching systems. We propose…

Multimedia · Computer Science 2014-11-12 Mohammad Ashraful Anam , Paul N. Whatmough , Yiannis Andreopoulos

High-Performance Deep Learning via a Single Building Block

Deep learning (DL) is one of the most prominent branches of machine learning. Due to the immense computational cost of DL workloads, industry and academia have developed DL libraries with highly-specialized kernels for each…

Machine Learning · Computer Science 2019-06-19 Evangelos Georganas , Kunal Banerjee , Dhiraj Kalamkar , Sasikanth Avancha , Anand Venkat , Michael Anderson , Greg Henry , Hans Pabst , Alexander Heinecke

Image-specific Convolutional Kernel Modulation for Single Image Super-resolution

Recently, deep-learning-based super-resolution methods have achieved excellent performances, but mainly focus on training a single generalized deep network by feeding numerous samples. Yet intuitively, each image has its representation, and…

Image and Video Processing · Electrical Eng. & Systems 2021-11-17 Yuanfei Huang , Jie Li , Yanting Hu , Xinbo Gao , Hua Huang

Accelerating Machine Learning Primitives on Commodity Hardware

Sliding Window Sum algorithms have been successfully used for training and inference of Deep Neural Networks. We have shown before how both pooling and convolution 1-D primitives could be expressed as sliding sums and evaluated by the…

Machine Learning · Computer Science 2023-10-10 Roman Snytsar

An Energy-Efficient Edge Computing Paradigm for Convolution-based Image Upsampling

A novel energy-efficient edge computing paradigm is proposed for real-time deep learning-based image upsampling applications. State-of-the-art deep learning solutions for image upsampling are currently trained using either resize or…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Ian Colbert , Ken Kreutz-Delgado , Srinjoy Das

Compiler-Level Matrix Multiplication Optimization for Deep Learning

An important linear algebra routine, GEneral Matrix Multiplication (GEMM), is a fundamental operator in deep learning. Compilers need to translate these routines into low-level code optimized for specific hardware. Compiler-level…

Machine Learning · Computer Science 2019-09-25 Huaqing Zhang , Xiaolin Cheng , Hui Zang , Dae Hoon Park