Related papers: MG3MConv: Multi-Grained Matrix-Multiplication-Mapp…
Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation, and object detection and localization. Here we consider the problem of inference, the application of a…
Convolutional neural networks have witnessed remarkable improvements in computational efficiency in recent years. A key driving force has been the idea of trading-off model expressivity and efficiency through a combination of $1\times 1$…
This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures. The design combines…
Transposed Convolutions (TCONV) enable the up-scaling mechanism within generative Artificial Intelligence (AI) models. However, the predominant Input-Oriented Mapping (IOM) method for implementing TCONV has complex output mapping,…
Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to…
Convolution is a compute-intensive operation placed at the heart of Convolution Neural Networks (CNNs). It has led to the development of many high-performance algorithms, such as Im2col-GEMM, Winograd, and Direct-Convolution. However, the…
In order to handle modern convolutional neural networks (CNNs) efficiently, a hardware architecture of CNN inference accelerator is proposed to handle depthwise convolutions and regular convolutions, which are both essential building blocks…
Shifted-and-Duplicated-Kernel (SDK) mapping has emerged as an effective strategy to accelerate convolutional layers on compute-in-memory (CIM) hardware. However, existing SDK variants (e.g., VWC-SDK) merely optimize mapping for a single CIM…
Convolutional Neural Networks (CNNs) are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and…
We present a new convolutional neural network, called Multi Voxel-Point Neurons Convolution (MVPConv), for fast and accurate 3D deep learning. The previous works adopt either individual point-based features or local-neighboring voxel-based…
This paper proposes SConvTransform, a Transform dialect extension that provides operations for optimizing 2D convolutions in MLIR. Its main operation, SConvOp, lowers Linalg convolutions into tiled and packed generic operations through a…
Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix…
As Convolutional Neural Networks (CNNs) gain prominence in deep learning, algorithms like Winograd Convolution have been introduced to enhance computational efficiency. However, existing implementations often face challenges such as high…
Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in…
Convolution neural networks are widely used for mobile applications. However, GPU convolution algorithms are designed for mini-batch neural network training, the single-image convolution neural network inference algorithm on mobile GPUs is…
Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to compute convolutions is known as the Im2Col + BLAS method. This paper proposes SConv:…
Convolutional neural networks (CNNs) are becoming very successful and popular for a variety of applications. The Loki many-core processor architecture is very promising for achieving specialised hardware performance and efficiency while…
In recent years, Convolutional Neural Networks (ConvNets) have become an enabling technology for a wide range of novel embedded Artificial Intelligence systems. Across the range of applications, the performance needs vary significantly,…
While state-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate, these networks are computationally expensive involving billions of arithmetic operations and parameters. To improve the…
Understanding point cloud has recently gained huge interests following the development of 3D scanning devices and the accumulation of large-scale 3D data. Most point cloud processing algorithms can be classified as either point-based or…