English
Related papers

Related papers: Automatic Joint Structured Pruning and Quantizatio…

200 papers

Quantization and pruning are core techniques used to reduce the inference costs of deep neural networks. State-of-the-art quantization techniques are currently applied to both the weights and activations; however, pruning is most often…

Machine Learning · Computer Science 2021-11-02 Xinyu Zhang , Ian Colbert , Ken Kreutz-Delgado , Srinjoy Das

Deep Neural Networks (DNNs) have achieved significant advances in a wide range of applications. However, their deployment on resource-constrained devices remains a challenge due to the large number of layers and parameters, which result in…

Neural and Evolutionary Computing · Computer Science 2025-09-05 Sara Makenali , Babak Rokh , Ali Azarpeyvand

The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory…

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference, especially when deploying to edge or IoT devices with limited computation capacity and power consumption budget. The uniform bit…

Machine Learning · Computer Science 2020-04-27 Tao Wang , Junsong Wang , Chang Xu , Chao Xue

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant…

Computer Vision and Pattern Recognition · Computer Science 2021-06-16 Tailin Liang , John Glossner , Lei Wang , Shaobo Shi , Xiaotong Zhang

Recent DNN pruning algorithms have succeeded in reducing the number of parameters in fully connected layers, often with little or no drop in classification accuracy. However, most of the existing pruning schemes either have to be applied…

Machine Learning · Computer Science 2018-03-13 Konstantinos Pitas , Mike Davies , Pierre Vandergheynst

Graph neural networks (GNNs) are known to operate with high accuracy on learning from graph-structured data, but they suffer from high computational and resource costs. Neural network compression methods are used to reduce the model size…

Machine Learning · Computer Science 2025-10-28 Khatoon Khedri , Reza Rawassizadeh , Qifu Wen , Mehdi Hosseinzadeh

The deployment of deep neural networks on resource-constrained devices necessitates effective model com- pression strategies that judiciously balance the reduction of model size with the preservation of performance. This study introduces a…

Machine Learning · Computer Science 2025-05-02 Mohammad Zbeeb , Mariam Salman , Mohammad Bazzi , Ammar Mohanna

Deep Neural Networks(DNNs) have many parameters and activation data, and these both are expensive to implement. One method to reduce the size of the DNN is to quantize the pre-trained model by using a low-bit expression for weights and…

Computer Vision and Pattern Recognition · Computer Science 2020-11-26 Jun Nishikawa , Ryoji Ikegaya

Deep Neural Networks (DNNs) are applied in a wide range of usecases. There is an increased demand for deploying DNNs on devices that do not have abundant resources such as memory and computation units. Recently, network compression through…

Machine Learning · Computer Science 2020-05-19 Haichuan Yang , Shupeng Gui , Yuhao Zhu , Ji Liu

Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance. However, most current structured pruning methods do not provide any performance guarantees, and…

Machine Learning · Computer Science 2023-02-14 Marwa El Halabi , Suraj Srinivas , Simon Lacoste-Julien

Energy efficient implementations and deployments of Spiking neural networks (SNNs) have been of great interest due to the possibility of developing artificial systems that can achieve the computational powers and energy efficiency of the…

Machine Learning · Computer Science 2023-02-09 Clemens JS Schaefer , Pooria Taheri , Mark Horeni , Siddharth Joshi

To address the large model size and intensive computation requirement of deep neural networks (DNNs), weight pruning techniques have been proposed and generally fall into two categories, i.e., static regularization-based pruning and dynamic…

Machine Learning · Computer Science 2020-04-14 Tianyun Zhang , Xiaolong Ma , Zheng Zhan , Shanglin Zhou , Minghai Qin , Fei Sun , Yen-Kuang Chen , Caiwen Ding , Makan Fardad , Yanzhi Wang

The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful,…

Machine Learning · Computer Science 2023-03-01 Riade Benbaki , Wenyu Chen , Xiang Meng , Hussein Hazimeh , Natalia Ponomareva , Zhe Zhao , Rahul Mazumder

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for…

Compressing Deep Neural Network (DNN) models to alleviate the storage and computation requirements is essential for practical applications, especially for resource limited devices. Although capable of reducing a reasonable amount of model…

Machine Learning · Computer Science 2021-06-17 Sheng Lin , Wei Jiang , Wei Wang , Kaidi Xu , Yanzhi Wang , Shan Liu , Songnan Li

Deep neural networks (DNNs) offer significant flexibility and robust performance. This makes them ideal for building not only system models but also advanced neural network controllers (NNCs). However, their high complexity and…

Machine Learning · Computer Science 2025-11-14 Ganesh Sundaram , Jonas Ulmen , Amjad Haider , Daniel Görges

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model…

Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network. However, the efficacy of…

Machine Learning · Statistics 2019-06-10 Elliot J. Crowley , Jack Turner , Amos Storkey , Michael O'Boyle

Operating deep neural networks (DNNs) on devices with limited resources requires the reduction of their memory as well as computational footprint. Popular reduction methods are network quantization or pruning, which either reduce the word…

‹ Prev 1 2 3 10 Next ›