Related papers: SigmaQuant: Hardware-Aware Heterogeneous Quantizat…

MSP: An FPGA-Specific Mixed-Scheme, Multi-Precision Deep Neural Network Quantization Framework

With the tremendous success of deep learning, there exists imminent need to deploy deep learning models onto edge devices. To tackle the limited computing and storage resources in edge devices, model compression techniques have been widely…

Machine Learning · Computer Science 2020-10-20 Sung-En Chang , Yanyu Li , Mengshu Sun , Weiwen Jiang , Runbin Shi , Xue Lin , Yanzhi Wang

Trimming Down Large Spiking Vision Transformers via Heterogeneous Quantization Search

Spiking Neural Networks (SNNs) are amenable to deployment on edge devices and neuromorphic hardware due to their lower dissipation. Recently, SNN-based transformers have garnered significant interest, incorporating attention mechanisms akin…

Neural and Evolutionary Computing · Computer Science 2024-12-10 Boxun Xu , Yufei Song , Peng Li

Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for…

Machine Learning · Computer Science 2023-09-01 Clemens JS Schaefer , Siddharth Joshi , Shan Li , Raul Blazquez

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded…

Machine Learning · Computer Science 2020-12-15 Sung-En Chang , Yanyu Li , Mengshu Sun , Runbin Shi , Hayden K. -H. So , Xuehai Qian , Yanzhi Wang , Xue Lin

PowerQuant: Automorphism Search for Non-Uniform Quantization

Deep neural networks (DNNs) are nowadays ubiquitous in many domains such as computer vision. However, due to their high latency, the deployment of DNNs hinges on the development of compression techniques such as quantization which consists…

Computer Vision and Pattern Recognition · Computer Science 2023-01-25 Edouard Yvinec , Arnaud Dapogny , Matthieu Cord , Kevin Bailly

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency,…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Kuan Wang , Zhijian Liu , Yujun Lin , Ji Lin , Song Han

Adaptive Quantization for Deep Neural Network

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large…

Machine Learning · Computer Science 2017-12-05 Yiren Zhou , Seyed-Mohsen Moosavi-Dezfooli , Ngai-Man Cheung , Pascal Frossard

Hardware-Centric AutoML for Mixed-Precision Quantization

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency,…

Computer Vision and Pattern Recognition · Computer Science 2020-08-14 Kuan Wang , Zhijian Liu , Yujun Lin , Ji Lin , Song Han

AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model…

Machine Learning · Computer Science 2024-04-29 Cédric Gernigon , Silviu-Ioan Filip , Olivier Sentieys , Clément Coggiola , Mickael Bruno

Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors

Although the quest for more accurate solutions is pushing deep learning research towards larger and more complex algorithms, edge devices demand efficient inference and therefore reduction in model size, latency and energy consumption. One…

Instrumentation and Detectors · Physics 2021-06-22 Claudionor N. Coelho , Aki Kuusela , Shan Li , Hao Zhuang , Thea Aarrestad , Vladimir Loncar , Jennifer Ngadiuba , Maurizio Pierini , Adrian Alan Pol , Sioni Summers

Hardware-friendly Deep Learning by Network Quantization and Binarization

Quantization is emerging as an efficient approach to promote hardware-friendly deep learning and run deep neural networks on resource-limited hardware. However, it still causes a significant decrease to the network in accuracy. We summarize…

Machine Learning · Computer Science 2021-12-03 Haotong Qin

Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks

Quantization of weights and activations in Deep Neural Networks (DNNs) is a powerful technique for network compression, and has enjoyed significant attention and success. However, much of the inference-time benefit of quantization is…

Performance · Computer Science 2019-12-13 Andrew Anderson , David Gregg

Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for…

Machine Learning · Computer Science 2023-12-27 Konstantinos Balaskas , Andreas Karatzas , Christos Sad , Kostas Siozios , Iraklis Anagnostopoulos , Georgios Zervakis , Jörg Henkel

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on…

Computer Vision and Pattern Recognition · Computer Science 2019-08-15 Ruihao Gong , Xianglong Liu , Shenghu Jiang , Tianxiang Li , Peng Hu , Jiazhen Lin , Fengwei Yu , Junjie Yan

Hardware-Aware DNN Compression for Homogeneous Edge Devices

Deploying deep neural networks (DNNs) across homogeneous edge devices (the devices with the same SKU labeled by the manufacturer) often assumes identical performance among them. However, once a device model is widely deployed, the…

Machine Learning · Computer Science 2025-12-23 Kunlong Zhang , Guiying Li , Ning Lu , Peng Yang , Ke Tang

SlimEdge: Performance and Device Aware Distributed DNN Deployment on Resource-Constrained Edge Hardware

Distributed deep neural networks (DNNs) have become central to modern computer vision, yet their deployment on resource-constrained edge devices remains hindered by substantial parameter counts, computational demands, and the probability of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-17 Mahadev Sunil Kumar , Arnab Raha , Debayan Das , Gopakumar G , Rounak Chatterjee , Amitava Mukherjee

MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search

Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference…

Machine Learning · Computer Science 2023-10-02 Eliska Kloberdanz , Wei Le

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

As the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resource-constrained mobile devices is, however, by no means trivial, since it incurs high performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-31 En Li , Zhi Zhou , Xu Chen

Hardware-Aware DNN Compression for Homogeneous Edge Devices

Deploying deep neural networks (DNNs) across homogeneous edge devices (the devices with the same SKU labeled by the manufacturer) often assumes identical performance among them. However, once a device model is widely deployed, the…

Hardware Architecture · Computer Science 2025-12-16 Kunlong Zhang , Guiying Li , Ning Lu , Peng Yang , Ke Tang

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

Quantization of deep neural networks (DNN) has been proven effective for compressing and accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the original datasets under privacy-sensitive and confidential…

Machine Learning · Computer Science 2022-02-16 Cong Guo , Yuxian Qiu , Jingwen Leng , Xiaotian Gao , Chen Zhang , Yunxin Liu , Fan Yang , Yuhao Zhu , Minyi Guo