Related papers: Model Compression

Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models

Deep learning models have achieved tremendous success in most of the industries in recent years. The evolution of these models has also led to an increase in the model size and energy requirement, making it difficult to deploy in production…

Machine Learning · Computer Science 2024-07-24 Aayush Saxena , Arit Kumar Bishwas , Ayush Ashok Mishra , Ryan Armstrong

Model Compression Techniques in Biometrics Applications: A Survey

The development of deep learning algorithms has extensively empowered humanity's task automatization capacity. However, the huge improvement in the performance of these models is highly correlated with their increasing level of complexity,…

Computer Vision and Pattern Recognition · Computer Science 2024-01-19 Eduarda Caldeira , Pedro C. Neto , Marco Huber , Naser Damer , Ana F. Sequeira

Model compression as constrained optimization, with application to neural nets. Part V: combining compressions

Model compression is generally performed by using quantization, low-rank approximation or pruning, for which various algorithms have been researched in recent years. One fundamental question is: what types of compression work better for a…

Machine Learning · Computer Science 2021-07-12 Miguel Á. Carreira-Perpiñán , Yerlan Idelbayev

Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study

The excellent performance of deep neural networks is usually accompanied by a large number of parameters and computations, which have limited their usage on the resource-limited edge devices. To address this issue, abundant methods such as…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Muzhou Yu , Linfeng Zhang , Kaisheng Ma

Relationship between Model Compression and Adversarial Robustness: A Review of Current Evidence

Increasing the model capacity is a known approach to enhance the adversarial robustness of deep learning networks. On the other hand, various model compression techniques, including pruning and quantization, can reduce the size of the…

Machine Learning · Computer Science 2023-11-28 Svetlana Pavlitska , Hannes Grolig , J. Marius Zöllner

A Comprehensive Survey of Compression Algorithms for Language Models

How can we compress language models without sacrificing accuracy? The number of compression algorithms for language models is rapidly growing to benefit from remarkable advances of recent language models without side effects due to the…

Computation and Language · Computer Science 2024-01-30 Seungcheol Park , Jaehyeon Choi , Sojin Lee , U Kang

Towards Optimal Compression: Joint Pruning and Quantization

Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware. The prevailing methods for network compression, namely quantization and pruning, have been shown to enhance efficiency at the…

Machine Learning · Computer Science 2023-06-13 Ben Zandonati , Glenn Bucagu , Adrian Alan Pol , Maurizio Pierini , Olya Sirkin , Tal Kopetz

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting,…

Computation and Language · Computer Science 2020-06-24 Zhuohan Li , Eric Wallace , Sheng Shen , Kevin Lin , Kurt Keutzer , Dan Klein , Joseph E. Gonzalez

Does compressing activations help model parallel training?

Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to…

Machine Learning · Computer Science 2023-01-09 Song Bian , Dacheng Li , Hongyi Wang , Eric P. Xing , Shivaram Venkataraman

Model compression as constrained optimization, with application to neural nets. Part I: general framework

Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. We give a general formulation of model…

Machine Learning · Computer Science 2017-07-06 Miguel Á. Carreira-Perpiñán

A Survey on Model Compression for Large Language Models

Large Language Models (LLMs) have transformed natural language processing tasks successfully. Yet, their large size and high computational needs pose challenges for practical use, especially in resource-limited settings. Model compression…

Computation and Language · Computer Science 2024-07-31 Xunyu Zhu , Jian Li , Yong Liu , Can Ma , Weiping Wang

To prune, or not to prune: exploring the efficacy of pruning for model compression

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks…

Machine Learning · Statistics 2017-11-15 Michael Zhu , Suyog Gupta

A Survey of Model Compression and Acceleration for Deep Neural Networks

Deep neural networks (DNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with…

Machine Learning · Computer Science 2020-06-16 Yu Cheng , Duo Wang , Pan Zhou , Tao Zhang

Robustness in Compressed Neural Networks for Object Detection

Model compression techniques allow to significantly reduce the computational cost associated with data processing by deep neural networks with only a minor decrease in average accuracy. Simultaneously, reducing the model size may have a…

Machine Learning · Computer Science 2021-09-28 Sebastian Cygert , Andrzej Czyżewski

Compressed Object Detection

Deep learning approaches have achieved unprecedented performance in visual recognition tasks such as object detection and pose estimation. However, state-of-the-art models have millions of parameters represented as floats which make them…

Computer Vision and Pattern Recognition · Computer Science 2021-02-08 Gedeon Muhawenayo , Georgia Gkioxari

Model Compression and Efficient Inference for Large Language Models: A Survey

Transformer based large language models have achieved tremendous success. However, the significant memory and computational costs incurred during the inference process make it challenging to deploy large models on resource-constrained…

Computation and Language · Computer Science 2024-02-16 Wenxiao Wang , Wei Chen , Yicong Luo , Yongliu Long , Zhengkai Lin , Liye Zhang , Binbin Lin , Deng Cai , Xiaofei He

When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models

Large language models (LLMs) exhibit excellent performance in various tasks. However, the memory requirements of LLMs present a great challenge when deploying on memory-limited devices, even for quantized LLMs. This paper introduces a…

Computation and Language · Computer Science 2025-02-24 Weilan Wang , Yu Mao , Dongdong Tang , Hongchao Du , Nan Guan , Chun Jason Xue

Model Compression for Resource-Constrained Mobile Robots

The number of mobile robots with constrained computing resources that need to execute complex machine learning models has been increasing during the past decade. Commonly, these robots rely on edge infrastructure accessible over wireless…

Machine Learning · Computer Science 2022-07-22 Timotheos Souroulla , Alberto Hata , Ahmad Terra , Özer Özkahraman , Rafia Inam

Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression

What happens when multiple compression methods are combined-does the order in which they are applied matter? Joint model compression has emerged as a powerful strategy to achieve higher efficiency by combining multiple methods such as…

Artificial Intelligence · Computer Science 2026-03-20 Minjun Kim , Jaehyeon Choi , Hyunwoo Yang , Jongjin Kim , Jinho Song , U Kang

Spectral Pruning: Compressing Deep Neural Networks via Spectral Analysis and its Generalization Error

Compression techniques for deep neural network models are becoming very important for the efficient execution of high-performance deep learning systems on edge-computing devices. The concept of model compression is also important for…

Machine Learning · Statistics 2020-07-14 Taiji Suzuki , Hiroshi Abe , Tomoya Murata , Shingo Horiuchi , Kotaro Ito , Tokuma Wachi , So Hirai , Masatoshi Yukishima , Tomoaki Nishimura