Related papers: Low-memory GEMM-based convolution algorithms for d…

The Indirect Convolution Algorithm

Deep learning frameworks commonly implement convolution operators with GEMM-based algorithms. In these algorithms, convolution is implemented on top of matrix-matrix multiplication (GEMM) functions, provided by highly optimized BLAS…

Computer Vision and Pattern Recognition · Computer Science 2019-07-05 Marat Dukhan

NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques

Quantization has emerged to be an effective way to significantly boost the performance of deep neural networks (DNNs) by utilizing low-bit computations. Despite having lower numerical precision, quantized DNNs are able to reduce both memory…

Machine Learning · Computer Science 2019-11-15 Wenlei Bao , Li-Wen Chang , Yang Chen , Ke Deng , Amit Agarwal , Emad Barsoum , Abe Taha

Im2win: An Efficient Convolution Paradigm on GPU

Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Luanzheng Guo , Xu T. Liu

MEC: Memory-efficient Convolution for Deep Neural Network

Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods…

Machine Learning · Computer Science 2017-06-22 Minsik Cho , Daniel Brand

Accelerating Machine Learning Primitives on Commodity Hardware

Sliding Window Sum algorithms have been successfully used for training and inference of Deep Neural Networks. We have shown before how both pooling and convolution 1-D primitives could be expressed as sliding sums and evaluated by the…

Machine Learning · Computer Science 2023-10-10 Roman Snytsar

Parallel Multi Channel Convolution using General Matrix Multiplication

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve…

Computer Vision and Pattern Recognition · Computer Science 2017-07-04 Aravind Vasudevan , Andrew Anderson , David Gregg

Compiler-Level Matrix Multiplication Optimization for Deep Learning

An important linear algebra routine, GEneral Matrix Multiplication (GEMM), is a fundamental operator in deep learning. Compilers need to translate these routines into low-level code optimized for specific hardware. Compiler-level…

Machine Learning · Computer Science 2019-09-25 Huaqing Zhang , Xiaolin Cheng , Hui Zang , Dae Hoon Park

High-Performance Deep Learning via a Single Building Block

Deep learning (DL) is one of the most prominent branches of machine learning. Due to the immense computational cost of DL workloads, industry and academia have developed DL libraries with highly-specialized kernels for each…

Machine Learning · Computer Science 2019-06-19 Evangelos Georganas , Kunal Banerjee , Dhiraj Kalamkar , Sasikanth Avancha , Anand Venkat , Michael Anderson , Greg Henry , Hans Pabst , Alexander Heinecke

An Area and Energy Efficient Design of Domain-Wall Memory-Based Deep Convolutional Neural Networks using Stochastic Computing

With recent trend of wearable devices and Internet of Things (IoTs), it becomes attractive to develop hardware-based deep convolutional neural networks (DCNNs) for embedded applications, which require low power/energy consumptions and small…

Neural and Evolutionary Computing · Computer Science 2018-02-06 Xiaolong Ma , Yipeng Zhang , Geng Yuan , Ao Ren , Zhe Li , Jie Han , Jingtong Hu , Yanzhi Wang

Im2win: Memory Efficient Convolution On SIMD Architectures

Convolution is the most expensive operation among neural network operations, thus its performance is critical to the overall performance of neural networks. Commonly used convolution approaches, including general matrix multiplication…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Xu T. Liu

Low-memory convolutional neural networks through incremental depth-first processing

We introduce an incremental processing scheme for convolutional neural network (CNN) inference, targeted at embedded applications with limited memory budgets. Instead of processing layers one by one, individual input pixels are propagated…

Neural and Evolutionary Computing · Computer Science 2019-05-22 Jonathan Binas , Yoshua Bengio

Accelerating Bandwidth-Bound Deep Learning Inference with Main-Memory Accelerators

DL inference queries play an important role in diverse internet services and a large fraction of datacenter cycles are spent on processing DL inference queries. Specifically, the matrix-matrix multiplication (GEMM) operations of…

Hardware Architecture · Computer Science 2020-12-02 Benjamin Y. Cho , Jeageun Jung , Mattan Erez

HADES: Hardware/Algorithm Co-design in DNN accelerators using Energy-efficient Approximate Alphabet Set Multipliers

Edge computing must be capable of executing computationally intensive algorithms, such as Deep Neural Networks (DNNs) while operating within a constrained computational resource budget. Such computations involve Matrix Vector…

Hardware Architecture · Computer Science 2023-10-24 Arani Roy , Kaushik Roy

IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers

GEneral Matrix Multiply (GEMM) is a central operation in deep learning and corresponds to the largest chunk of the compute footprint. Therefore, improving its efficiency is an active topic of ongoing research. A popular strategy is the use…

Machine Learning · Computer Science 2024-03-13 Zhanpeng Zeng , Karthikeyan Sankaralingam , Vikas Singh

Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation

Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. Though…

Machine Learning · Computer Science 2021-09-07 Jiaqi Gu , Hanqing Zhu , Chenghao Feng , Mingjie Liu , Zixuan Jiang , Ray T. Chen , David Z. Pan

Learning Efficient Convolutional Networks through Network Slimming

The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the…

Computer Vision and Pattern Recognition · Computer Science 2017-08-23 Zhuang Liu , Jianguo Li , Zhiqiang Shen , Gao Huang , Shoumeng Yan , Changshui Zhang

NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

The inherent diversity of computation types within the deep neural network (DNN) models often requires a variety of specialized units in hardware processors, which limits computational efficiency, increasing both inference latency and power…

Machine Learning · Computer Science 2024-08-21 Ruiqi Sun , Siwei Ye , Jie Zhao , Xin He , Jianzhe Lin , Yiran Li , An Zou

DWM: A Decomposable Winograd Method for Convolution Acceleration

Winograd's minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as 3x3 and…

Machine Learning · Computer Science 2020-02-06 Di Huang , Xishan Zhang , Rui Zhang , Tian Zhi , Deyuan He , Jiaming Guo , Chang Liu , Qi Guo , Zidong Du , Shaoli Liu , Tianshi Chen , Yunji Chen

TASO: Time and Space Optimization for Memory-Constrained DNN Inference

Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. State-of-the-art classification is typically achieved by large…

Machine Learning · Computer Science 2020-05-22 Yuan Wen , Andrew Anderson , Valentin Radu , Michael F. P. O'Boyle , David Gregg

OpenGeMM: A High-Utilization GeMM Accelerator Generator with Lightweight RISC-V Control and Tight Memory Coupling

Deep neural networks (DNNs) face significant challenges when deployed on resource-constrained extreme edge devices due to their computational and data-intensive nature. While standalone accelerators tailored for specific application…

Hardware Architecture · Computer Science 2024-11-22 Xiaoling Yi , Ryan Antonio , Joren Dumoulin , Jiacong Sun , Josse Van Delm , Guilherme Paim , Marian Verhelst