Related papers: Efficient Model Compression Techniques with FishLe…

WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been significant interest in utilizing this information in the context of…

Machine Learning · Computer Science 2020-11-26 Sidak Pal Singh , Dan Alistarh

Pruning at a Glance: Global Neural Pruning for Model Compression

Deep Learning models have become the dominant approach in several areas due to their high performance. Unfortunately, the size and hence computational requirements of operating such models can be considerably high. Therefore, this…

Computer Vision and Pattern Recognition · Computer Science 2019-12-04 Abdullah Salama , Oleksiy Ostapenko , Tassilo Klein , Moin Nabi

Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks

This work evaluates the compression techniques on ConvNeXt models in image classification tasks using the CIFAR-10 dataset. Structured pruning, unstructured pruning, and dynamic quantization methods are evaluated to reduce model size and…

Machine Learning · Computer Science 2024-09-05 Samer Francy , Raghubir Singh

Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning

Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline significantly increases the overall training time. In this paper, we develop a…

Neural and Evolutionary Computing · Computer Science 2023-04-11 Shanglin Zhou , Mikhail A. Bragin , Lynn Pepin , Deniz Gurevin , Fei Miao , Caiwen Ding

OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition

The recent paradigm shift to large-scale foundation models has brought about a new era for deep learning that, while has found great success in practice, has also been plagued by prohibitively expensive costs in terms of high memory…

Machine Learning · Computer Science 2025-05-21 Stephen Zhang , Vardan Papyan

Weight Pruning via Adaptive Sparsity Loss

Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust compressive learning…

Machine Learning · Computer Science 2020-06-05 George Retsinas , Athena Elafrou , Georgios Goumas , Petros Maragos

Towards Optimal Compression: Joint Pruning and Quantization

Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware. The prevailing methods for network compression, namely quantization and pruning, have been shown to enhance efficiency at the…

Machine Learning · Computer Science 2023-06-13 Ben Zandonati , Glenn Bucagu , Adrian Alan Pol , Maurizio Pierini , Olya Sirkin , Tal Kopetz

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Transformer-based language models have become a key building block for natural language processing. While these models are extremely accurate, they can be too large and computationally intensive to run on standard deployments. A variety of…

Computation and Language · Computer Science 2022-10-19 Eldar Kurtic , Daniel Campos , Tuan Nguyen , Elias Frantar , Mark Kurtz , Benjamin Fineran , Michael Goin , Dan Alistarh

To prune, or not to prune: exploring the efficacy of pruning for model compression

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks…

Machine Learning · Statistics 2017-11-15 Michael Zhu , Suyog Gupta

WeightMom: Learning Sparse Networks using Iterative Momentum-based pruning

Deep Neural Networks have been used in a wide variety of applications with significant success. However, their highly complex nature owing to comprising millions of parameters has lead to problems during deployment in pipelines with low…

Machine Learning · Computer Science 2022-08-15 Elvis Johnson , Xiaochen Tang , Sriramacharyulu Samudrala

Enabling Retrain-free Deep Neural Network Pruning using Surrogate Lagrangian Relaxation

Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline, i.e., training, pruning and retraining (fine-tuning) significantly increases the…

Machine Learning · Computer Science 2021-03-26 Deniz Gurevin , Shanglin Zhou , Lynn Pepin , Bingbing Li , Mikhail Bragin , Caiwen Ding , Fei Miao

Integrating Fairness and Model Pruning Through Bi-level Optimization

Deep neural networks have achieved exceptional results across a range of applications. As the demand for efficient and sparse deep learning models escalates, the significance of model compression, particularly pruning, is increasingly…

Machine Learning · Computer Science 2025-04-01 Yucong Dai , Gen Li , Feng Luo , Xiaolong Ma , Yongkai Wu

Towards Efficient Model Compression via Learned Global Ranking

Pruning convolutional filters has demonstrated its effectiveness in compressing ConvNets. Prior art in filter pruning requires users to specify a target model complexity (e.g., model size or FLOP count) for the resulting architecture.…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Ting-Wu Chin , Ruizhou Ding , Cha Zhang , Diana Marculescu

Deep Model Compression Via Two-Stage Deep Reinforcement Learning

Besides accuracy, the model size of convolutional neural networks (CNN) models is another important factor considering limited hardware resources in practical applications. For example, employing deep neural networks on mobile systems…

Machine Learning · Computer Science 2021-07-05 Huixin Zhan , Wei-Ming Lin , Yongcan Cao

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy…

Computation and Language · Computer Science 2023-06-01 Huiqiang Jiang , Li Lyna Zhang , Yuang Li , Yu Wu , Shijie Cao , Ting Cao , Yuqing Yang , Jinyu Li , Mao Yang , Lili Qiu

Pruning Large Language Models via Accuracy Predictor

Large language models(LLMs) containing tens of billions of parameters (or even more) have demonstrated impressive capabilities in various NLP tasks. However, substantial model size poses challenges to training, inference, and deployment so…

Artificial Intelligence · Computer Science 2023-10-11 Yupeng Ji , Yibo Cao , Jiucai Liu

SlimNets: An Exploration of Deep Model Compression and Acceleration

Deep neural networks have achieved increasingly accurate results on a wide variety of complex tasks. However, much of this improvement is due to the growing use and availability of computational resources (e.g use of GPUs, more layers, more…

Machine Learning · Computer Science 2018-08-03 Ini Oguntola , Subby Olubeko , Christopher Sweeney

Aggressive Post-Training Compression on Extremely Large Language Models

The increasing size and complexity of Large Language Models (LLMs) pose challenges for their deployment on personal computers and mobile devices. Aggressive post-training model compression is necessary to reduce the models' size, but it…

Computation and Language · Computer Science 2024-10-01 Zining Zhang , Yao Chen , Bingsheng He , Zhenjie Zhang

Towards Higher Ranks via Adversarial Weight Pruning

Convolutional Neural Networks (CNNs) are hard to deploy on edge devices due to its high computation and storage complexities. As a common practice for model compression, network pruning consists of two major categories: unstructured and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Yuchuan Tian , Hanting Chen , Tianyu Guo , Chao Xu , Yunhe Wang

Structured Model Pruning for Efficient Inference in Computational Pathology

Recent years have seen significant efforts to adopt Artificial Intelligence (AI) in healthcare for various use cases, from computer-aided diagnosis to ICU triage. However, the size of AI models has been rapidly growing due to scaling laws…

Image and Video Processing · Electrical Eng. & Systems 2024-04-16 Mohammed Adnan , Qinle Ba , Nazim Shaikh , Shivam Kalra , Satarupa Mukherjee , Auranuch Lorsakul