English
Related papers

Related papers: Physics Inspired Criterion for Pruning-Quantizatio…

200 papers

We present a differentiable joint pruning and quantization (DJPQ) scheme. We frame neural network compression as a joint gradient-based optimization problem, trading off between model pruning and quantization automatically for hardware…

Machine Learning · Computer Science 2021-04-06 Ying Wang , Yadong Lu , Tijmen Blankevoort

As edge devices become prevalent, deploying Deep Neural Networks (DNN) on edge devices has become a critical issue. However, DNN requires a high computational resource which is rarely available for edge devices. To handle this, we propose a…

Machine Learning · Computer Science 2021-06-29 Jangho Kim , Simyung Chang , Nojun Kwak

We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To…

Machine Learning · Computer Science 2020-06-16 Tianzhe Wang , Kuan Wang , Han Cai , Ji Lin , Zhijian Liu , Song Han

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Deep Neural Networks (DNNs) have achieved significant advances in a wide range of applications. However, their deployment on resource-constrained devices remains a challenge due to the large number of layers and parameters, which result in…

Neural and Evolutionary Computing · Computer Science 2025-09-05 Sara Makenali , Babak Rokh , Ali Azarpeyvand

While joint pruning--quantization is theoretically superior to sequential application, current joint methods rely on auxiliary procedures outside the training loop for finding compression parameters. This reliance adds engineering…

Machine Learning · Computer Science 2025-12-16 Jonathan Wenshøj , Tong Chen , Bob Pepin , Raghavendra Selvan

As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network…

Computer Vision and Pattern Recognition · Computer Science 2022-05-24 Peng Hu , Xi Peng , Hongyuan Zhu , Mohamed M. Sabry Aly , Jie Lin

The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the…

Deep Neural Networks(DNNs) have many parameters and activation data, and these both are expensive to implement. One method to reduce the size of the DNN is to quantize the pre-trained model by using a low-bit expression for weights and…

Computer Vision and Pattern Recognition · Computer Science 2020-11-26 Jun Nishikawa , Ryoji Ikegaya

Inference time, model size, and accuracy are three key factors in deep model compression. Most of the existing work addresses these three key factors separately as it is difficult to optimize them all at the same time. For example, low-bit…

Computer Vision and Pattern Recognition · Computer Science 2023-07-18 Dan Liu , Xi Chen , Jie Fu , Chen Ma , Xue Liu

Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware. The prevailing methods for network compression, namely quantization and pruning, have been shown to enhance efficiency at the…

Machine Learning · Computer Science 2023-06-13 Ben Zandonati , Glenn Bucagu , Adrian Alan Pol , Maurizio Pierini , Olya Sirkin , Tal Kopetz

Why rely on dense neural networks and then blindly sparsify them when prior knowledge about the problem structure is already available? Many inverse problems admit algorithm-unrolled networks that naturally encode physics and sparsity. In…

Machine Learning · Computer Science 2025-10-14 Arian Eamaz , Farhang Yeganegi , Mojtaba Soltanalian

Quantizing a floating-point neural network to its fixed-point representation is crucial for Learned Image Compression (LIC) because it improves decoding consistency for interoperability and reduces space-time complexity for implementation.…

Image and Video Processing · Electrical Eng. & Systems 2023-10-10 Junqi Shi , Ming Lu , Zhan Ma

Towards fast, hardware-efficient, and low-complexity receivers, we propose a compression-aware learning approach and examine it on free-space optical (FSO) receivers for turbulence mitigation. The learning approach jointly quantize, prune,…

Signal Processing · Electrical Eng. & Systems 2026-01-13 Mohanad Obeed , Ming Jian

Unlike fixed- or variable-rate image coding, progressive image coding (PIC) aims to compress various qualities of images into a single bitstream, increasing the versatility of bitstream utilization and providing high compression efficiency…

Image and Video Processing · Electrical Eng. & Systems 2025-11-04 Jooyoung Lee , Se Yoon Jeong , Munchurl Kim

Quantization and pruning are core techniques used to reduce the inference costs of deep neural networks. State-of-the-art quantization techniques are currently applied to both the weights and activations; however, pruning is most often…

Machine Learning · Computer Science 2021-11-02 Xinyu Zhang , Ian Colbert , Ken Kreutz-Delgado , Srinjoy Das

The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory…

In recent years, deep network pruning has attracted significant attention in order to enable the rapid deployment of AI into small devices with computation and memory constraints. Pruning is often achieved by dropping redundant weights,…

Machine Learning · Computer Science 2023-08-24 Enmao Diao , Ganghua Wang , Jiawei Zhan , Yuhong Yang , Jie Ding , Vahid Tarokh

Quantization and pruning are two essential techniques for compressing neural networks, yet they are often treated independently, with limited theoretical analysis connecting them. This paper introduces a unified framework for post-training…

Machine Learning · Computer Science 2025-05-21 Haoyu Zhang , Rayan Saab

Recently, deep learning has become a de facto standard in machine learning with convolutional neural networks (CNNs) demonstrating spectacular success on a wide variety of tasks. However, CNNs are typically very demanding computationally at…

Computer Vision and Pattern Recognition · Computer Science 2019-09-27 Yochai Zur , Chaim Baskin , Evgenii Zheltonozhskii , Brian Chmiel , Itay Evron , Alex M. Bronstein , Avi Mendelson
‹ Prev 1 2 3 10 Next ›