Related papers: MEC: Memory-efficient Convolution for Deep Neural …

The Indirect Convolution Algorithm

Deep learning frameworks commonly implement convolution operators with GEMM-based algorithms. In these algorithms, convolution is implemented on top of matrix-matrix multiplication (GEMM) functions, provided by highly optimized BLAS…

Computer Vision and Pattern Recognition · Computer Science 2019-07-05 Marat Dukhan

Im2win: Memory Efficient Convolution On SIMD Architectures

Convolution is the most expensive operation among neural network operations, thus its performance is critical to the overall performance of neural networks. Commonly used convolution approaches, including general matrix multiplication…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Xu T. Liu

Im2win: An Efficient Convolution Paradigm on GPU

Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Luanzheng Guo , Xu T. Liu

Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications

Compact neural networks are inclined to exploit "sparsely-connected" convolutions such as depthwise convolution and group convolution for employment in mobile applications. Compared with standard "fully-connected" convolutions, these…

Computer Vision and Pattern Recognition · Computer Science 2018-03-28 Zheng Qin , Zhaoning Zhang , Shiqing Zhang , Hao Yu , Yuxing Peng

Efficient Winograd Convolution via Integer Arithmetic

Convolution is the core operation for many deep neural networks. The Winograd convolution algorithms have been shown to accelerate the widely-used small convolution sizes. Quantized neural networks can effectively reduce model sizes and…

Neural and Evolutionary Computing · Computer Science 2019-01-09 Lingchuan Meng , John Brothers

Fast Convolution based on Winograd Minimum Filtering: Introduction and Development

Convolutional Neural Network (CNN) has been widely used in various fields and played an important role. Convolution operators are the fundamental component of convolutional neural networks, and it is also the most time-consuming part of…

Artificial Intelligence · Computer Science 2021-11-02 Gan Tong , Libo Huang

High Performance Zero-Memory Overhead Direct Convolutions

The computation of convolution layers in deep neural networks typically rely on high performance routines that trade space for time by using additional memory (either for packing purposes or required as part of the algorithm) to improve…

Machine Learning · Computer Science 2018-09-28 Jiyuan Zhang , Franz Franchetti , Tze Meng Low

Low-memory GEMM-based convolution algorithms for deep neural networks

Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as…

Computer Vision and Pattern Recognition · Computer Science 2017-09-12 Andrew Anderson , Aravind Vasudevan , Cormac Keane , David Gregg

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators

Many of today's deep neural network accelerators, e.g., Google's TPU and NVIDIA's tensor core, are built around accelerating the general matrix multiplication (i.e., GEMM). However, supporting convolution on GEMM-based accelerators is not…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-11 Yangjie Zhou , Mengtian Yang , Cong Guo , Jingwen Leng , Yun Liang , Quan Chen , Minyi Guo , Yuhao Zhu

An Energy-Efficient Edge Computing Paradigm for Convolution-based Image Upsampling

A novel energy-efficient edge computing paradigm is proposed for real-time deep learning-based image upsampling applications. State-of-the-art deep learning solutions for image upsampling are currently trained using either resize or…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Ian Colbert , Ken Kreutz-Delgado , Srinjoy Das

Winograd Convolution for Deep Neural Networks: Efficient Point Selection

Convolutional neural networks (CNNs) have dramatically improved the accuracy of tasks such as object recognition, image segmentation and interactive speech systems. CNNs require large amounts of computing resources because ofcomputationally…

Computer Vision and Pattern Recognition · Computer Science 2022-01-26 Syed Asad Alam , Andrew Anderson , Barbara Barabasz , David Gregg

Efficient Neural Network Deployment for Microcontroller

Edge computing for neural networks is getting important especially for low power applications and offline devices. TensorFlow Lite and PyTorch Mobile were released for this purpose. But they mainly support mobile devices instead of…

Hardware Architecture · Computer Science 2020-07-06 Hasan Unlu

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to…

Computer Vision and Pattern Recognition · Computer Science 2017-06-13 David Budden , Alexander Matveev , Shibani Santurkar , Shraman Ray Chaudhuri , Nir Shavit

CompConv: A Compact Convolution Module for Efficient Feature Learning

Convolutional Neural Networks (CNNs) have achieved remarkable success in various computer vision tasks but rely on tremendous computational cost. To solve this problem, existing approaches either compress well-trained large-scale models or…

Computer Vision and Pattern Recognition · Computer Science 2021-07-06 Chen Zhang , Yinghao Xu , Yujun Shen

Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing

Deploying deep neural networks (DNNs) on resource-constrained mobile devices presents significant challenges, particularly in achieving real-time performance while simultaneously coping with limited computational resources and battery life.…

Networking and Internet Architecture · Computer Science 2025-09-24 Zekai Sun , Xiuxian Guan , Zheng Lin , Zihan Fang , Xiangming Cai , Zhe Chen , Fangming Liu , Heming Cui , Jie Xiong , Wei Ni , Chau Yuen

Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation

Memory-efficient transfer learning (METL) approaches have recently achieved promising performance in adapting pre-trained models to downstream tasks. They avoid applying gradient backpropagation in large backbones, thus significantly…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 Yutong Zhang , Jiaxin Chen , Honglin Chen , Kaiqi Zheng , Shengcai Liao , Hanwen Zhong , Weixin Li , Yunhong Wang

Memory-efficient Learning for High-Dimensional MRI Reconstruction

Deep learning (DL) based unrolled reconstructions have shown state-of-the-art performance for under-sampled magnetic resonance imaging (MRI). Similar to compressed sensing, DL can leverage high-dimensional data (e.g. 3D, 2D+time, 3D+time)…

Image and Video Processing · Electrical Eng. & Systems 2021-03-09 Ke Wang , Michael Kellman , Christopher M. Sandino , Kevin Zhang , Shreyas S. Vasanawala , Jonathan I. Tamir , Stella X. Yu , Michael Lustig

SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution

We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Amir Ofir , Gil Ben-Artzi

High Performance Convolution Using Sparsity and Patterns for Inference in Deep Convolutional Neural Networks

Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Hossam Amer , Ahmed H. Salamah , Ahmad Sajedi , En-hui Yang

Fast and Adaptive Task Management in MEC: A Deep Learning Approach Using Pointer Networks

Task offloading and scheduling in Mobile Edge Computing (MEC) are vital for meeting the low-latency demands of modern IoT and dynamic task scheduling scenarios. MEC reduces the processing burden on resource-constrained devices by enabling…

Networking and Internet Architecture · Computer Science 2026-01-23 Arild Yonkeu , Mohammadreza Amini , Burak Kantarci