Related papers: TensorQuant - A Simulation Toolbox for Deep Neural…

Designing strong baselines for ternary neural network quantization through support and mass equalization

Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision. These results rely on over-parameterized backbones, which are expensive to run. This computational burden can be dramatically…

Computer Vision and Pattern Recognition · Computer Science 2023-07-03 Edouard Yvinec , Arnaud Dapogny , Kevin Bailly

A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification

Recent advancements in machine learning achieved by Deep Neural Networks (DNNs) have been significant. While demonstrating high accuracy, DNNs are associated with a huge number of parameters and computations, which leads to high memory…

Machine Learning · Computer Science 2023-12-20 Babak Rokh , Ali Azarpeyvand , Alireza Khanteymoori

PowerQuant: Automorphism Search for Non-Uniform Quantization

Deep neural networks (DNNs) are nowadays ubiquitous in many domains such as computer vision. However, due to their high latency, the deployment of DNNs hinges on the development of compression techniques such as quantization which consists…

Computer Vision and Pattern Recognition · Computer Science 2023-01-25 Edouard Yvinec , Arnaud Dapogny , Matthieu Cord , Kevin Bailly

MSP: An FPGA-Specific Mixed-Scheme, Multi-Precision Deep Neural Network Quantization Framework

With the tremendous success of deep learning, there exists imminent need to deploy deep learning models onto edge devices. To tackle the limited computing and storage resources in edge devices, model compression techniques have been widely…

Machine Learning · Computer Science 2020-10-20 Sung-En Chang , Yanyu Li , Mengshu Sun , Weiwen Jiang , Runbin Shi , Xue Lin , Yanzhi Wang

TruncQuant: Truncation-Ready Quantization for DNNs with Flexible Weight Bit Precision

The deployment of deep neural networks on edge devices is a challenging task due to the increasing complexity of state-of-the-art models, requiring efforts to reduce model size and inference latency. Recent studies explore models operating…

Machine Learning · Computer Science 2025-06-16 Jinhee Kim , Seoyeon Yoon , Taeho Lee , Joo Chan Lee , Kang Eun Jeon , Jong Hwan Ko

MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search

Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference…

Machine Learning · Computer Science 2023-10-02 Eliska Kloberdanz , Wei Le

CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for Energy-Efficient Low-precision Deep Convolutional Neural Networks

In today's era of smart cyber-physical systems, Deep Neural Networks (DNNs) have become ubiquitous due to their state-of-the-art performance in complex real-world applications. The high computational complexity of these networks, which…

Hardware Architecture · Computer Science 2022-08-02 Muhammad Abdullah Hanif , Giuseppe Maria Sarda , Alberto Marchisio , Guido Masera , Maurizio Martina , Muhammad Shafique

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded…

Machine Learning · Computer Science 2020-12-15 Sung-En Chang , Yanyu Li , Mengshu Sun , Runbin Shi , Hayden K. -H. So , Xuehai Qian , Yanzhi Wang , Xue Lin

Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks

Tensorizing a neural network involves reshaping some or all of its dense weight matrices into higher-order tensors and approximating them using low-rank tensor network decompositions. This technique has shown promise as a model compression…

Machine Learning · Computer Science 2025-05-27 Safa Hamreras , Sukhbinder Singh , Román Orús

Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers

Deep neural network (DNN) quantization converting floating-point (FP) data in the network to integers (INT) is an effective way to shrink the model size for memory saving and simplify the operations for compute acceleration. Recently,…

Machine Learning · Computer Science 2020-01-01 Yukuan Yang , Shuang Wu , Lei Deng , Tianyi Yan , Yuan Xie , Guoqi Li

Fixed-point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms

Convolutional Neural Networks (CNNs) have proven to be a powerful state-of-the-art method for image classification tasks. One drawback however is the high computational complexity and high memory consumption of CNNs which makes them…

Computer Vision and Pattern Recognition · Computer Science 2021-02-04 Rishabh Goyal , Joaquin Vanschoren , Victor van Acht , Stephan Nijssen

StatQAT: Statistical Quantizer Optimization for Deep Networks

Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes,…

Machine Learning · Statistics 2026-05-19 Mehmet Aktukmak , Daniel Huang , Ke Ding

TentacleNet: A Pseudo-Ensemble Template for Accurate Binary Convolutional Neural Networks

Binarization is an attractive strategy for implementing lightweight Deep Convolutional Neural Networks (CNNs). Despite the unquestionable savings offered, memory footprint above all, it may induce an excessive accuracy loss that prevents a…

Machine Learning · Computer Science 2019-12-30 Luca Mocerino , Andrea Calimera

SigmaQuant: Hardware-Aware Heterogeneous Quantization Method for Edge DNN Inference

Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, energy, and computational power. While…

Machine Learning · Computer Science 2026-03-04 Qunyou Liu , Pengbo Yu , Marina Zapater , David Atienza

Low-bit Model Quantization for Deep Neural Networks: A Survey

With unprecedented rapid development, deep neural networks (DNNs) have deeply influenced almost all fields. However, their heavy computation costs and model sizes are usually unacceptable in real-world deployment. Model quantization, an…

Machine Learning · Computer Science 2025-05-12 Kai Liu , Qian Zheng , Kaiwen Tao , Zhiteng Li , Haotong Qin , Wenbo Li , Yong Guo , Xianglong Liu , Linghe Kong , Guihai Chen , Yulun Zhang , Xiaokang Yang

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devices that use less on-board memory and enable efficient low-power inference. In this paper, we present a comparison of model-parameter driven quantization approaches…

Computer Vision and Pattern Recognition · Computer Science 2019-10-14 Prateeth Nayak , David Zhang , Sek Chai

Tensor Methods for Generating Compact Uncertainty Quantification and Deep Learning Models

Tensor methods have become a promising tool to solve high-dimensional problems in the big data era. By exploiting possible low-rank tensor factorization, many high-dimensional model-based or data-driven problems can be solved to facilitate…

Optimization and Control · Mathematics 2019-08-22 Chunfeng Cui , Cole Hawkins , Zheng Zhang

FQ-Conv: Fully Quantized Convolution for Efficient and Accurate Inference

Deep neural networks (DNNs) can be made hardware-efficient by reducing the numerical precision of the weights and activations of the network and by improving the network's resilience to noise. However, this gain in efficiency often comes at…

Machine Learning · Computer Science 2019-12-20 Bram-Ernst Verhoef , Nathan Laubeuf , Stefan Cosemans , Peter Debacker , Ioannis Papistas , Arindam Mallik , Diederik Verkest

A Survey of Quantization Methods for Efficient Neural Network Inference

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related…

Computer Vision and Pattern Recognition · Computer Science 2021-06-23 Amir Gholami , Sehoon Kim , Zhen Dong , Zhewei Yao , Michael W. Mahoney , Kurt Keutzer

A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers

Current Deep Learning approaches have been very successful using convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers. Three limitations of this approach are: 1) they are based on a simple…

Neural and Evolutionary Computing · Computer Science 2017-07-17 Thomas E. Potok , Catherine Schuman , Steven R. Young , Robert M. Patton , Federico Spedalieri , Jeremy Liu , Ke-Thia Yao , Garrett Rose , Gangotree Chakma