English
Related papers

Related papers: Im2win: Memory Efficient Convolution On SIMD Archi…

200 papers

Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Luanzheng Guo , Xu T. Liu

Convolution is the core component within deep neural networks and it is computationally intensive and time consuming. Tensor data layouts significantly impact convolution operations in terms of memory access and computational efficiency.…

Machine Learning · Computer Science 2024-08-02 Xiang Fu , Xinpeng Zhang , Jixiang Ma , Peng Zhao , Shuai Lu , Xu T. Liu

Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods…

Machine Learning · Computer Science 2017-06-22 Minsik Cho , Daniel Brand

Deep learning frameworks commonly implement convolution operators with GEMM-based algorithms. In these algorithms, convolution is implemented on top of matrix-matrix multiplication (GEMM) functions, provided by highly optimized BLAS…

Computer Vision and Pattern Recognition · Computer Science 2019-07-05 Marat Dukhan

The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these…

Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to compute convolutions is known as the Im2Col + BLAS method. This paper proposes SConv:…

Computer Vision and Pattern Recognition · Computer Science 2023-03-09 Victor Ferrari , Rafael Sousa , Marcio Pereira , João P. L. de Carvalho , José Nelson Amaral , José Moreira , Guido Araujo

Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as…

Computer Vision and Pattern Recognition · Computer Science 2017-09-12 Andrew Anderson , Aravind Vasudevan , Cormac Keane , David Gregg

Many of today's deep neural network accelerators, e.g., Google's TPU and NVIDIA's tensor core, are built around accelerating the general matrix multiplication (i.e., GEMM). However, supporting convolution on GEMM-based accelerators is not…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-11 Yangjie Zhou , Mengtian Yang , Cong Guo , Jingwen Leng , Yun Liang , Quan Chen , Minyi Guo , Yuhao Zhu

Convolution is the most time-consuming part in the computation of convolutional neural networks (CNNs), which have achieved great successes in numerous applications. Due to the complex data dependency and the increase in the amount of model…

Machine Learning · Computer Science 2021-01-01 Xiaoyang Zhang , Junmin Xiao , Guangming Tan

The Von Neumann bottleneck, which relates to the energy cost of moving data from memory to on-chip core and vice versa, is a serious challenge in state-of-the-art AI architectures, like Convolutional Neural Networks' (CNNs) accelerators.…

Hardware Architecture · Computer Science 2025-02-27 Cristian Sestito , Ahmed J. Abdelmaksoud , Shady Agwa , Themis Prodromakis

We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Amir Ofir , Gil Ben-Artzi

A novel energy-efficient edge computing paradigm is proposed for real-time deep learning-based image upsampling applications. State-of-the-art deep learning solutions for image upsampling are currently trained using either resize or…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Ian Colbert , Ken Kreutz-Delgado , Srinjoy Das

Computing-In-Memory (CIM) offers a potential solution to the memory wall issue and can achieve high energy efficiency by minimizing data movement, making it a promising architecture for edge AI devices. Lightweight models like MobileNet and…

Hardware Architecture · Computer Science 2025-08-21 Choongseok Song , Doo Seok Jeong

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve…

Computer Vision and Pattern Recognition · Computer Science 2017-07-04 Aravind Vasudevan , Andrew Anderson , David Gregg

The computation of convolution layers in deep neural networks typically rely on high performance routines that trade space for time by using additional memory (either for packing purposes or required as part of the algorithm) to improve…

Machine Learning · Computer Science 2018-09-28 Jiyuan Zhang , Franz Franchetti , Tze Meng Low

Convolution is a compute-intensive operation placed at the heart of Convolution Neural Networks (CNNs). It has led to the development of many high-performance algorithms, such as Im2col-GEMM, Winograd, and Direct-Convolution. However, the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Lucas Alvarenga , Victor Ferrari , Rafael Souza , Marcio Pereira , Guido Araujo

As Convolutional Neural Networks (CNNs) gain prominence in deep learning, algorithms like Winograd Convolution have been introduced to enhance computational efficiency. However, existing implementations often face challenges such as high…

Performance · Computer Science 2024-12-30 Haoyuan Gui , Xiaoyu Zhang , Chong Zhang , Zitong Su , Huiyuan Li

In the last decade, Convolutional Neural Network with a multi-layer architecture has advanced rapidly. However, training its complex network is very space-consuming, since a lot of intermediate data are preserved across layers, especially…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-23 Zhigang Wang , Hangyu Yang , Ning Wang , Chuanfei Xu , Jie Nie , Zhiqiang Wei , Yu Gu , Ge Yu

State-of-the-art systolic array-based accelerators adopt the traditional im2col algorithm to accelerate the inference of convolutional layers. However, traditional im2col cannot efficiently support AI backpropagation. Backpropagation in…

Hardware Architecture · Computer Science 2022-09-21 Jianchao Yang , Mei Wen , Junzhong Shen , Yasong Cao , Minjin Tang , Renyu Yang , Jiawei Fei , Chunyuan Zhang

Transposed Convolutions (TCONV) enable the up-scaling mechanism within generative Artificial Intelligence (AI) models. However, the predominant Input-Oriented Mapping (IOM) method for implementing TCONV has complex output mapping,…

Hardware Architecture · Computer Science 2025-07-11 Jude Haris , José Cano
‹ Prev 1 2 3 10 Next ›