Related papers: Adaptive Binary-Ternary Quantization

Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric Quantizer

Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices, or cloud platforms for at-scale services. While binarization is a special case of quantization, this extreme case…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Phuoc Pham , Jacob Abraham , Jaeyong Chung

Ternary Quantization: A Survey

Inference time, model size, and accuracy are critical for deploying deep neural network models. Numerous research efforts have been made to compress neural network models with faster inference and higher accuracy. Pruning and quantization…

Machine Learning · Computer Science 2023-03-06 Dan Liu , Xue Liu

Trained Ternary Quantization

Deep neural networks are widely used in machine learning applications. However, the deployment of large neural networks models can be difficult to deploy on mobile devices with limited power budgets. To solve this problem, we propose…

Machine Learning · Computer Science 2017-02-24 Chenzhuo Zhu , Song Han , Huizi Mao , William J. Dally

Pruning Ternary Quantization

Inference time, model size, and accuracy are three key factors in deep model compression. Most of the existing work addresses these three key factors separately as it is difficult to optimize them all at the same time. For example, low-bit…

Computer Vision and Pattern Recognition · Computer Science 2023-07-18 Dan Liu , Xi Chen , Jie Fu , Chen Ma , Xue Liu

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

Binary and Ternary Quantization Can Enhance Feature Discrimination

Quantization is widely applied in machine learning to reduce computational and storage costs for both data and models. Considering that classification tasks are fundamental to the field, it is crucial to investigate how quantization impacts…

Machine Learning · Computer Science 2025-07-14 Weizhi Lu , Mingrui Chen , Weiyu Li

Deep Recurrent Quantization for Generating Sequential Binary Codes

Quantization has been an effective technology in ANN (approximate nearest neighbour) search due to its high accuracy and fast search speed. To meet the requirement of different applications, there is always a trade-off between retrieval…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Jingkuan Song , Xiaosu Zhu , Lianli Gao , Xin-Shun Xu , Wu Liu , Heng Tao Shen

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Adaptive Quantization for Deep Neural Network

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large…

Machine Learning · Computer Science 2017-12-05 Yiren Zhou , Seyed-Mohsen Moosavi-Dezfooli , Ngai-Man Cheung , Pascal Frossard

Ternary and Binary Quantization for Improved Classification

Dimension reduction and data quantization are two important methods for reducing data complexity. In the paper, we study the methodology of first reducing data dimension by random projection and then quantizing the projections to ternary or…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Weizhi Lu , Mingrui Chen , Kai Guo , Weiyu Li

Binary and Ternary Natural Language Generation

Ternary and binary neural networks enable multiplication-free computation and promise multiple orders of magnitude efficiency gains over full-precision networks if implemented on specialized hardware. However, since both the parameter and…

Computation and Language · Computer Science 2023-06-06 Zechun Liu , Barlas Oguz , Aasish Pappu , Yangyang Shi , Raghuraman Krishnamoorthi

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

In low-latency or mobile applications, lower computation complexity, lower memory footprint and better energy efficiency are desired. Many prior works address this need by removing redundant parameters. Parameter quantization replaces…

Machine Learning · Computer Science 2021-11-16 Cheng-Chou Lan

Hyperspherical Quantization: Toward Smaller and More Accurate Models

Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the…

Computer Vision and Pattern Recognition · Computer Science 2022-12-27 Dan Liu , Xi Chen , Chen Ma , Xue Liu

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on…

Computer Vision and Pattern Recognition · Computer Science 2019-08-15 Ruihao Gong , Xianglong Liu , Shenghu Jiang , Tianxiang Li , Peng Hu , Jiazhen Lin , Fengwei Yu , Junjie Yan

WaveQ: Gradient-Based Deep Quantization of Neural Networks through Sinusoidal Adaptive Regularization

As deep neural networks make their ways into different domains, their compute efficiency is becoming a first-order constraint. Deep quantization, which reduces the bitwidth of the operations (below 8 bits), offers a unique opportunity as it…

Machine Learning · Computer Science 2020-04-27 Ahmed T. Elthakeb , Prannoy Pilligundla , Fatemehsadat Mireshghallah , Tarek Elgindi , Charles-Alban Deledalle , Hadi Esmaeilzadeh

Hybrid Binary Networks: Optimizing for Accuracy, Efficiency and Memory

Binarization is an extreme network compression approach that provides large computational speedups along with energy and memory savings, albeit at significant accuracy costs. We investigate the question of where to binarize inputs at…

Computer Vision and Pattern Recognition · Computer Science 2018-04-12 Ameya Prabhu , Vishal Batchu , Rohit Gajawada , Sri Aurobindo Munagala , Anoop Namboodiri

Deep Neural Network Compression with Single and Multiple Level Quantization

Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this…

Machine Learning · Computer Science 2018-12-18 Yuhui Xu , Yongzhuang Wang , Aojun Zhou , Weiyao Lin , Hongkai Xiong

Soft Threshold Ternary Networks

Large neural networks are difficult to deploy on mobile devices because of intensive computation and storage. To alleviate it, we study ternarization, a balance between efficiency and accuracy that quantizes both weights and activations…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Weixiang Xu , Xiangyu He , Tianli Zhao , Qinghao Hu , Peisong Wang , Jian Cheng

Model compression as constrained optimization, with application to neural nets. Part II: quantization

We consider the problem of deep neural net compression by quantization: given a large, reference net, we want to quantize its real-valued weights using a codebook with $K$ entries so that the training loss of the quantized net is minimal.…

Machine Learning · Computer Science 2017-07-17 Miguel Á. Carreira-Perpiñán , Yerlan Idelbayev

Designing strong baselines for ternary neural network quantization through support and mass equalization

Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision. These results rely on over-parameterized backbones, which are expensive to run. This computational burden can be dramatically…

Computer Vision and Pattern Recognition · Computer Science 2023-07-03 Edouard Yvinec , Arnaud Dapogny , Kevin Bailly