Related papers: Hyper-Compression: Model Compression via Hyperfunc…

Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive…

Machine Learning · Computer Science 2024-03-13 Soo Min Kwon , Zekai Zhang , Dogyoon Song , Laura Balzano , Qing Qu

Compression of Recurrent Neural Networks for Efficient Language Modeling

Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications.…

Computation and Language · Computer Science 2019-04-09 Artem M. Grachev , Dmitry I. Ignatov , Andrey V. Savchenko

MCNC: Manifold-Constrained Reparameterization for Neural Compression

The outstanding performance of large foundational models across diverse tasks, from computer vision to speech and natural language processing, has significantly increased their demand. However, storing and transmitting these models poses…

Machine Learning · Computer Science 2025-04-28 Chayne Thrash , Ali Abbasi , Reed Andreas , Parsa Nooralinejad , Soroush Abbasi Koohpayegani , Hamed Pirsiavash , Soheil Kolouri

Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges,…

Machine Learning · Computer Science 2023-12-13 Arnav Chavan , Nahush Lele , Deepak Gupta

Model Compression

With time, machine learning models have increased in their scope, functionality and size. Consequently, the increased functionality and size of such models requires high-end hardware to both train and provide inference after the fact. This…

Machine Learning · Computer Science 2021-09-07 Arhum Ishtiaq , Sara Mahmood , Maheen Anees , Neha Mumtaz

Forget the Data and Fine-Tuning! Just Fold the Network to Compress

We introduce model folding, a novel data-free model compression technique that merges structurally similar neurons across layers, significantly reducing the model size without the need for fine-tuning or access to training data. Unlike…

Machine Learning · Computer Science 2025-08-13 Dong Wang , Haris Šikić , Lothar Thiele , Olga Saukh

Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt

While the numerous parameters in Large Language Models (LLMs) contribute to their superior performance, this massive scale makes them inefficient and memory-hungry. Thus, they are hard to deploy on commodity hardware, such as one single…

Computation and Language · Computer Science 2023-10-11 Zhaozhuo Xu , Zirui Liu , Beidi Chen , Yuxin Tang , Jue Wang , Kaixiong Zhou , Xia Hu , Anshumali Shrivastava

Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study

The excellent performance of deep neural networks is usually accompanied by a large number of parameters and computations, which have limited their usage on the resource-limited edge devices. To address this issue, abundant methods such as…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Muzhou Yu , Linfeng Zhang , Kaisheng Ma

Lossy and Lossless (L$^2$) Post-training Model Size Compression

Deep neural networks have delivered remarkable performance and have been widely used in various visual tasks. However, their huge size causes significant inconvenience for transmission and storage. Many previous studies have explored model…

Computer Vision and Pattern Recognition · Computer Science 2023-08-09 Yumeng Shi , Shihao Bai , Xiuying Wei , Ruihao Gong , Jianlei Yang

Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations

Large Language Models are growing in size, and we expect them to continue to do so, as larger models train quicker. However, this increase in size will severely impact inference costs. Therefore model compression is important, to retain the…

Machine Learning · Computer Science 2024-04-10 Georgy Tyukin

Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models

Deep learning models have achieved tremendous success in most of the industries in recent years. The evolution of these models has also led to an increase in the model size and energy requirement, making it difficult to deploy in production…

Machine Learning · Computer Science 2024-07-24 Aayush Saxena , Arit Kumar Bishwas , Ayush Ashok Mishra , Ryan Armstrong

Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction

Deep learning models incorporating linear SSMs have gained attention for capturing long-range dependencies in sequential data. However, their large parameter sizes pose challenges for deployment on resource-constrained devices. In this…

Machine Learning · Computer Science 2025-07-31 Hiroki Sakamoto , Kazuhiro Sato

Model Compression Techniques in Biometrics Applications: A Survey

The development of deep learning algorithms has extensively empowered humanity's task automatization capacity. However, the huge improvement in the performance of these models is highly correlated with their increasing level of complexity,…

Computer Vision and Pattern Recognition · Computer Science 2024-01-19 Eduarda Caldeira , Pedro C. Neto , Marco Huber , Naser Damer , Ana F. Sequeira

When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models

Large language models (LLMs) exhibit excellent performance in various tasks. However, the memory requirements of LLMs present a great challenge when deploying on memory-limited devices, even for quantized LLMs. This paper introduces a…

Computation and Language · Computer Science 2025-02-24 Weilan Wang , Yu Mao , Dongdong Tang , Hongchao Du , Nan Guan , Chun Jason Xue

Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank…

Computation and Language · Computer Science 2025-02-25 Yixin Ji , Yang Xiang , Juntao Li , Qingrong Xia , Zi Ye , Xinyu Duan , Zhefeng Wang , Kehai Chen , Min Zhang

Compression Laws for Large Language Models

We introduce compression laws for language language models (LLMs). While recent scaling laws have sought to understand how LLMs scale with respect to model size, pre-training data, and computational resources, we focus on understanding how…

Computation and Language · Computer Science 2025-04-08 Ayan Sengupta , Siddhant Chaudhary , Tanmoy Chakraborty

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting,…

Computation and Language · Computer Science 2020-06-24 Zhuohan Li , Eric Wallace , Sheng Shen , Kevin Lin , Kurt Keutzer , Dan Klein , Joseph E. Gonzalez

Projected Compression: Trainable Projection for Efficient Transformer Compression

Large language models have steadily increased in size to achieve improved performance; however, this growth has also led to greater inference time and computational demands. Consequently, there is rising interest in model size reduction…

Machine Learning · Computer Science 2025-06-30 Maciej Stefaniak , Michał Krutul , Jan Małaśnicki , Maciej Pióro , Jakub Krajewski , Sebastian Jaszczur , Marek Cygan , Kamil Adamczewski , Jan Ludziejewski

Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

While overparameterization in machine learning models offers great benefits in terms of optimization and generalization, it also leads to increased computational requirements as model sizes grow. In this work, we show that by leveraging the…

Machine Learning · Computer Science 2026-02-16 Can Yaras , Peng Wang , Laura Balzano , Qing Qu

A Benchmark Study of Neural Network Compression Methods for Hyperspectral Image Classification

Deep neural networks have achieved strong performance in image classification tasks due to their ability to learn complex patterns from high-dimensional data. However, their large computational and memory requirements often limit deployment…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Sai Shi