Related papers: Arithmetic Intensity Balancing Convolution for Har…

On the Efficiency of Convolutional Neural Networks

Since the breakthrough performance of AlexNet in 2012, convolutional neural networks (convnets) have grown into extremely powerful vision models. Deep learning researchers have used convnets to perform vision tasks with accuracy that was…

Machine Learning · Computer Science 2024-05-22 Andrew Lavin

FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off

Spatial convolution is fundamental in constructing deep Convolutional Neural Networks (CNNs) for visual recognition. While dynamic convolution enhances model accuracy by adaptively combining static kernels, it incurs significant…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Tianyu Zhang , Fan Wan , Haoran Duan , Kevin W. Tong , Jingjing Deng , Yang Long

Energy-Efficient ConvNets Through Approximate Computing

Recently ConvNets or convolutional neural networks (CNN) have come up as state-of-the-art classification and detection algorithms, achieving near-human performance in visual detection. However, ConvNet algorithms are typically very…

Computer Vision and Pattern Recognition · Computer Science 2016-11-17 Bert Moons , Bert De Brabandere , Luc Van Gool , Marian Verhelst

Bitwidth-Specific Logarithmic Arithmetic for Future Hardware-Accelerated Training

While advancements in quantization have significantly reduced the computational costs of inference in deep learning, training still predominantly relies on complex floating-point arithmetic. Low-precision fixed-point training presents a…

Machine Learning · Computer Science 2025-10-21 Hassan Hamad , Yuou Qiu , Peter A. Beerel , Keith M. Chugg

The xPU-athalon: Quantifying the Competition of AI Acceleration

The push for greater efficiency in AI computation has given rise to an array of accelerator architectures that increasingly challenge the GPU's long-standing dominance. In this work, we provide a quantitative view of this evolving landscape…

Hardware Architecture · Computer Science 2026-04-14 Alicia Golden , Carole-Jean Wu , Gu-Yeon Wei , David Brooks

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

In the wake of the success of convolutional neural networks in image classification, object recognition, speech recognition, etc., the demand for deploying these compute-intensive ML models on embedded and mobile systems with tight power…

Computer Vision and Pattern Recognition · Computer Science 2019-11-12 Lukas Cavigelli , Georg Rutishauser , Luca Benini

TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing

As convolution has empowered many smart applications, dynamic convolution further equips it with the ability to adapt to diverse inputs. However, the static and dynamic convolutions are either layout-agnostic or computation-heavy, making it…

Computer Vision and Pattern Recognition · Computer Science 2022-03-23 Jierun Chen , Tianlang He , Weipeng Zhuo , Li Ma , Sangtae Ha , S. -H. Gary Chan

CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities

Recent research on vision backbone architectures has predominantly focused on optimizing efficiency for hardware platforms with high parallel processing capabilities. This category increasingly includes embedded systems such as mobile…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Moritz Nottebaum , Matteo Dunnhofer , Christian Micheloni

AdderNet and its Minimalist Hardware Design for Energy-Efficient Artificial Intelligence

Convolutional neural networks (CNN) have been widely used for boosting the performance of many machine intelligence tasks. However, the CNN models are usually computationally intensive and energy consuming, since they are often designed…

Machine Learning · Computer Science 2021-02-04 Yunhe Wang , Mingqiang Huang , Kai Han , Hanting Chen , Wei Zhang , Chunjing Xu , Dacheng Tao

Optimizing Winograd Convolution on ARMv8 processors

As Convolutional Neural Networks (CNNs) gain prominence in deep learning, algorithms like Winograd Convolution have been introduced to enhance computational efficiency. However, existing implementations often face challenges such as high…

Performance · Computer Science 2024-12-30 Haoyuan Gui , Xiaoyu Zhang , Chong Zhang , Zitong Su , Huiyuan Li

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does not necessarily lead to a similar level of reduction in…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Jierun Chen , Shiu-hong Kao , Hao He , Weipeng Zhuo , Song Wen , Chul-Ho Lee , S. -H. Gary Chan

SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers

Transformers' compute-intensive operations pose enormous challenges for their deployment in resource-constrained EdgeAI / tinyML devices. As an established neural network compression technique, quantization reduces the hardware…

Machine Learning · Computer Science 2023-04-26 Alberto Marchisio , Davide Dura , Maurizio Capra , Maurizio Martina , Guido Masera , Muhammad Shafique

Fixflow: A Framework to Evaluate Fixed-point Arithmetic in Light-Weight CNN Inference

Convolutional neural networks (CNN) are widely used in resource-constrained devices in IoT applications. In order to reduce the computational complexity and memory footprint, the resource-constrained devices use fixed-point representation.…

Machine Learning · Computer Science 2023-02-21 Farhad Taheri , Siavash Bayat-Sarmadi , Hatame Mosanaei-Boorani , Reza Taheri

Performance/power assessment of CNN packages on embedded automotive platforms

The rise of power-efficient embedded computers based on highly-parallel accelerators opens a number of opportunities and challenges for researchers and engineers, and paved the way to the era of edge computing. At the same time, advances in…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-13 Paolo Burgio , Gianluca Brilli

Accelerating Transposed Convolutions on FPGA-based Edge Devices

Transposed Convolutions (TCONV) enable the up-scaling mechanism within generative Artificial Intelligence (AI) models. However, the predominant Input-Oriented Mapping (IOM) method for implementing TCONV has complex output mapping,…

Hardware Architecture · Computer Science 2025-07-11 Jude Haris , José Cano

EffCNet: An Efficient CondenseNet for Image Classification on NXP BlueBox

Intelligent edge devices with built-in processors vary widely in terms of capability and physical form to perform advanced Computer Vision (CV) tasks such as image classification and object detection, for example. With constant advances in…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Priyank Kalgaonkar , Mohamed El-Sharkawy

ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators

The inference of deep neural networks (DNNs) on resource-constrained embedded systems introduces non-trivial trade-offs among model accuracy, computational latency, and hardware limitations, particularly when real-time constraints must be…

Hardware Architecture · Computer Science 2026-03-11 T. Baldi , D. Casini , A. Biondi

Hardware-Aware Reformulation of Convolutions for Efficient Execution on Specialized AI Hardware: A Case Study on NVIDIA Tensor Cores

Convolutional Neural Networks (CNNs) are central to modern AI, but their performance is often limited by hardware constraints. NVIDIA Tensor Cores, for instance, require input channels to be multiples of 8 and sometimes 512 for efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-21 Ganesh Bikshandi

BinArray: A Scalable Hardware Accelerator for Binary Approximated CNNs

Deep Convolutional Neural Networks (CNNs) have become state-of-the art for computer vision and other signal processing tasks due to their superior accuracy. In recent years, large efforts have been made to reduce the computational costs of…

Hardware Architecture · Computer Science 2021-04-13 Mario Fischer , Juergen Wassner

MUXConv: Information Multiplexing in Convolutional Neural Networks

Convolutional neural networks have witnessed remarkable improvements in computational efficiency in recent years. A key driving force has been the idea of trading-off model expressivity and efficiency through a combination of $1\times 1$…

Computer Vision and Pattern Recognition · Computer Science 2020-04-08 Zhichao Lu , Kalyanmoy Deb , Vishnu Naresh Boddeti