English
Related papers

Related papers: Trained Ternary Quantization

200 papers

We propose a novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits. Using this method, we demonstrate a minimal loss in classification accuracy on…

Machine Learning · Computer Science 2017-05-31 Naveen Mellempudi , Abhisek Kundu , Dheevatsa Mudigere , Dipankar Das , Bharat Kaul , Pradeep Dubey

Inference time, model size, and accuracy are three key factors in deep model compression. Most of the existing work addresses these three key factors separately as it is difficult to optimize them all at the same time. For example, low-bit…

Computer Vision and Pattern Recognition · Computer Science 2023-07-18 Dan Liu , Xi Chen , Jie Fu , Chen Ma , Xue Liu

Deep convolution neural network has achieved great success in many artificial intelligence applications. However, its enormous model size and massive computation cost have become the main obstacle for deployment of such powerful algorithm…

Computer Vision and Pattern Recognition · Computer Science 2018-07-23 Zhezhi He , Boqing Gong , Deliang Fan

Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit…

Computer Vision and Pattern Recognition · Computer Science 2021-11-03 Kohei Yamamoto

Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Dongqing Zhang , Jiaolong Yang , Dongqiangzi Ye , Gang Hua

Quantization of weights of deep neural networks (DNN) has proven to be an effective solution for the purpose of implementing DNNs on edge devices such as mobiles, ASICs and FPGAs, because they have no sufficient resources to support…

Machine Learning · Computer Science 2019-12-20 Tianyu Zhang , Lei Zhu , Qian Zhao , Kilho Shin

Neural network models are resource hungry. It is difficult to deploy such deep networks on devices with limited resources, like smart wearables, cellphones, drones, and autonomous vehicles. Low bit quantization such as binary and ternary…

Machine Learning · Computer Science 2021-09-15 Ryan Razani , Grégoire Morin , Vahid Partovi Nia , Eyyüb Sari

In the past years, Deep convolution neural network has achieved great success in many artificial intelligence applications. However, its enormous model size and massive computation cost have become the main obstacle for deployment of such…

Machine Learning · Computer Science 2018-10-03 Zhezhi He , Deliang Fan

Model architectures have been dramatically increasing in size, improving performance at the cost of resource requirements. In this paper we propose 3DQ, a ternary quantization method, applied for the first time to 3D Fully Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Magdalini Paschali , Stefano Gasperini , Abhijit Guha Roy , Michael Y. -S. Fang , Nassir Navab

Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision. These results rely on over-parameterized backbones, which are expensive to run. This computational burden can be dramatically…

Computer Vision and Pattern Recognition · Computer Science 2023-07-03 Edouard Yvinec , Arnaud Dapogny , Kevin Bailly

We propose a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy. In addition, we also constrain the activations to 8-bits thus enabling sub 8-bit full…

Machine Learning · Computer Science 2017-02-02 Naveen Mellempudi , Abhisek Kundu , Dipankar Das , Dheevatsa Mudigere , Bharat Kaul

This paper proposes a training method having multiple cyclic training for achieving enhanced performance in low-bit quantized convolutional neural networks (CNNs). Quantization is a popular method for obtaining lightweight CNNs, where the…

Computer Vision and Pattern Recognition · Computer Science 2022-06-28 HyunJin Kim , Jungwoo Shin , Alberto A. Del Barrio

Large neural networks are difficult to deploy on mobile devices because of intensive computation and storage. To alleviate it, we study ternarization, a balance between efficiency and accuracy that quantizes both weights and activations…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Weixiang Xu , Xiangyu He , Tianli Zhao , Qinghao Hu , Peisong Wang , Jian Cheng

Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this…

Machine Learning · Computer Science 2018-12-18 Yuhui Xu , Yongzhuang Wang , Aojun Zhou , Weiyao Lin , Hongkai Xiong

Inference time, model size, and accuracy are critical for deploying deep neural network models. Numerous research efforts have been made to compress neural network models with faster inference and higher accuracy. Pruning and quantization…

Machine Learning · Computer Science 2023-03-06 Dan Liu , Xue Liu

We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing…

Neural and Evolutionary Computing · Computer Science 2016-09-23 Itay Hubara , Matthieu Courbariaux , Daniel Soudry , Ran El-Yaniv , Yoshua Bengio

Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the…

Computer Vision and Pattern Recognition · Computer Science 2022-12-27 Dan Liu , Xi Chen , Chen Ma , Xue Liu

This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained…

Computer Vision and Pattern Recognition · Computer Science 2017-08-28 Aojun Zhou , Anbang Yao , Yiwen Guo , Lin Xu , Yurong Chen

With the development of deep neural networks, the size of network models becomes larger and larger. Model compression has become an urgent need for deploying these network models to mobile or embedded devices. Model quantization is a…

Machine Learning · Computer Science 2019-07-02 Wen-Pu Cai , Wu-Jun Li

Deep neural networks (DNNs) have achieved great success in a wide range of computer vision areas, but the applications to mobile devices is limited due to their high storage and computational cost. Much efforts have been devoted to compress…

Computer Vision and Pattern Recognition · Computer Science 2019-05-14 Yiming Hu , Jianquan Li , Xianlei Long , Shenhua Hu , Jiagang Zhu , Xingang Wang , Qingyi Gu
‹ Prev 1 2 3 10 Next ›