Related papers: A Quantization-Friendly Separable Convolution for …

Subtensor Quantization for Mobilenets

Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Thu Dinh , Andrey Melnikov , Vasilios Daskalopoulos , Sek Chai

Bag of Tricks with Quantized Convolutional Neural Networks for image classification

Deep neural networks have been proven effective in a wide range of tasks. However, their high computational and memory costs make them impractical to deploy on resource-constrained devices. To address this issue, quantization schemes have…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Jie Hu , Mengze Zeng , Enhua Wu

Quantized Convolutional Neural Networks for Mobile Devices

Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high…

Computer Vision and Pattern Recognition · Computer Science 2016-05-17 Jiaxiang Wu , Cong Leng , Yuhang Wang , Qinghao Hu , Jian Cheng

Streamlined Deployment for Quantized Neural Networks

Running Deep Neural Network (DNN) models on devices with limited computational capability is a challenge due to large compute and memory requirements. Quantized Neural Networks (QNNs) have emerged as a potential solution to this problem,…

Computer Vision and Pattern Recognition · Computer Science 2018-05-31 Yaman Umuroglu , Magnus Jahre

Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only…

Machine Learning · Computer Science 2018-05-22 Sean O. Settle , Manasa Bollavaram , Paolo D'Alberto , Elliott Delaye , Oscar Fernandez , Nicholas Fraser , Aaron Ng , Ashish Sirasao , Michael Wu

Adaptive Quantization for Deep Neural Network

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large…

Machine Learning · Computer Science 2017-12-05 Yiren Zhou , Seyed-Mohsen Moosavi-Dezfooli , Ngai-Man Cheung , Pascal Frossard

Do All MobileNets Quantize Poorly? Gaining Insights into the Effect of Quantization on Depthwise Separable Convolutional Networks Through the Eyes of Multi-scale Distributional Dynamics

As the "Mobile AI" revolution continues to grow, so does the need to understand the behaviour of edge-deployed deep neural networks. In particular, MobileNets are the go-to family of deep convolutional neural networks (CNN) for mobile.…

Computer Vision and Pattern Recognition · Computer Science 2021-04-27 Stone Yun , Alexander Wong

Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers

This paper presents a novel end-to-end methodology for enabling the deployment of low-error deep networks on microcontrollers. To fit the memory and computational limitations of resource-constrained edge-devices, we exploit mixed…

Machine Learning · Computer Science 2019-05-31 Manuele Rusci , Alessandro Capotondi , Luca Benini

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization…

Machine Learning · Computer Science 2020-04-22 Hao Wu , Patrick Judd , Xiaojie Zhang , Mikhail Isaev , Paulius Micikevicius

Fixed-point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms

Convolutional Neural Networks (CNNs) have proven to be a powerful state-of-the-art method for image classification tasks. One drawback however is the high computational complexity and high memory consumption of CNNs which makes them…

Computer Vision and Pattern Recognition · Computer Science 2021-02-04 Rishabh Goyal , Joaquin Vanschoren , Victor van Acht , Stephan Nijssen

PROM: Prioritize Reduction of Multiplications Over Lower Bit-Widths for Efficient CNNs

Convolutional neural networks (CNNs) are crucial for computer vision tasks on resource-constrained devices. Quantization effectively compresses these models, reducing storage size and energy cost. However, in modern depthwise-separable…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Lukas Meiner , Jens Mehnert , Alexandru Paul Condurache

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be…

Machine Learning · Computer Science 2017-12-19 Benoit Jacob , Skirmantas Kligys , Bo Chen , Menglong Zhu , Matthew Tang , Andrew Howard , Hartwig Adam , Dmitry Kalenichenko

ANTNets: Mobile Convolutional Neural Networks for Resource Efficient Image Classification

Deep convolutional neural networks have achieved remarkable success in computer vision. However, deep neural networks require large computing resources to achieve high performance. Although depthwise separable convolution can be an…

Computer Vision and Pattern Recognition · Computer Science 2019-09-06 Yunyang Xiong , Hyunwoo J. Kim , Varsha Hedau

DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks

Deep neural networks have achieved state-of-the art performance on various computer vision tasks. However, their deployment on resource-constrained devices has been hindered due to their high computational and storage complexity. While…

Computer Vision and Pattern Recognition · Computer Science 2020-07-21 Hassan Dbouk , Hetul Sanghvi , Mahesh Mehendale , Naresh Shanbhag

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

Model quantization is challenging due to many tedious hyper-parameters such as precision (bitwidth), dynamic range (minimum and maximum discrete values) and stepsize (interval between discrete values). Unlike prior arts that carefully tune…

Machine Learning · Computer Science 2021-07-08 Zhang Zhaoyang , Shao Wenqi , Gu Jinwei , Wang Xiaogang , Luo Ping

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks.…

Computer Vision and Pattern Recognition · Computer Science 2017-04-18 Andrew G. Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , Marco Andreetto , Hartwig Adam

Fast Implementation of 4-bit Convolutional Neural Networks for Mobile Devices

Quantized low-precision neural networks are very popular because they require less computational resources for inference and can provide high performance, which is vital for real-time and embedded recognition systems. However, their…

Computer Vision and Pattern Recognition · Computer Science 2020-10-21 Anton Trusov , Elena Limonova , Dmitry Slugin , Dmitry Nikolaev , Vladimir V. Arlazarov

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved…

Machine Learning · Computer Science 2023-07-07 Georg Rutishauser , Francesco Conti , Luca Benini

Quality Scalable Quantization Methodology for Deep Learning on Edge

Deep Learning Architectures employ heavy computations and bulk of the computational energy is taken up by the convolution operations in the Convolutional Neural Networks. The objective of our proposed work is to reduce the energy…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-17 Salman Abdul Khaliq , Rehan Hafiz

Ternary MobileNets via Per-Layer Hybrid Filter Banks

MobileNets family of computer vision neural networks have fueled tremendous progress in the design and organization of resource-efficient architectures in recent years. New applications with stringent real-time requirements on highly…

Machine Learning · Computer Science 2019-11-05 Dibakar Gope , Jesse Beu , Urmish Thakker , Matthew Mattina