Related papers: A flexible, extensible software framework for mode…

Model compression as constrained optimization, with application to neural nets. Part I: general framework

Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. We give a general formulation of model…

Machine Learning · Computer Science 2017-07-06 Miguel Á. Carreira-Perpiñán

Model compression as constrained optimization, with application to neural nets. Part V: combining compressions

Model compression is generally performed by using quantization, low-rank approximation or pruning, for which various algorithms have been researched in recent years. One fundamental question is: what types of compression work better for a…

Machine Learning · Computer Science 2021-07-12 Miguel Á. Carreira-Perpiñán , Yerlan Idelbayev

Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges,…

Machine Learning · Computer Science 2023-12-13 Arnav Chavan , Nahush Lele , Deepak Gupta

Compressed Learning of Deep Neural Networks for OpenCL-Capable Embedded Systems

Deep neural networks (DNNs) have been quite successful in solving many complex learning problems. However, DNNs tend to have a large number of learning parameters, leading to a large memory and computation requirement. In this paper, we…

Machine Learning · Computer Science 2019-05-21 Sangkyun Lee , Jeonghyun Lee

On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee

Model compression is a crucial part of deploying neural networks (NNs), especially when the memory and storage of computing devices are limited in many applications. This paper focuses on two model compression techniques: low-rank…

Machine Learning · Computer Science 2024-08-16 Chenyang Li , Jihoon Chung , Mengnan Du , Haimin Wang , Xianlian Zhou , Bo Shen

Coded Deep Learning: Framework and Algorithm

The success of deep learning (DL) is often achieved with large models and high complexity during both training and post-training inferences, hindering training in resource-limited settings. To alleviate these issues, this paper introduces a…

Machine Learning · Computer Science 2025-01-20 En-hui Yang , Shayan Mohajer Hamidi

Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive…

Machine Learning · Computer Science 2024-03-13 Soo Min Kwon , Zekai Zhang , Dogyoon Song , Laura Balzano , Qing Qu

Coding for Computation: Efficient Compression of Neural Networks for Reconfigurable Hardware

As state of the art neural networks (NNs) continue to grow in size, their resource-efficient implementation becomes ever more important. In this paper, we introduce a compression scheme that reduces the number of computations required for…

Machine Learning · Computer Science 2025-04-25 Hans Rosenberger , Rodrigo Fischer , Johanna S. Fröhlich , Ali Bereyhi , Ralf R. Müller

Unified Framework for Pre-trained Neural Network Compression via Decomposition and Optimized Rank Selection

Despite their high accuracy, complex neural networks demand significant computational resources, posing challenges for deployment on resource constrained devices such as mobile phones and embedded systems. Compression algorithms have been…

Machine Learning · Computer Science 2025-09-23 Ali Aghababaei-Harandi , Massih-Reza Amini

A Programmable Approach to Neural Network Compression

Deep neural networks (DNNs) frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such…

Machine Learning · Computer Science 2020-12-03 Vinu Joseph , Saurav Muralidharan , Animesh Garg , Michael Garland , Ganesh Gopalakrishnan

A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation

The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. As a result, computation and memory costs in resource-limited applications may be significantly reduced by dropping…

Machine Learning · Statistics 2022-11-10 Wenjing Yang , Ganghua Wang , Jie Ding , Yuhong Yang

LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

When deploying deep learning models to a device, it is traditionally assumed that available computational resources (compute, memory, and power) remain static. However, real-world computing systems do not always provide stable resource…

Machine Learning · Computer Science 2021-10-11 Elvis Nunez , Maxwell Horton , Anish Prabhu , Anurag Ranjan , Ali Farhadi , Mohammad Rastegari

Model Compression using Progressive Channel Pruning

In this work, we propose a simple but effective channel pruning framework called Progressive Channel Pruning (PCP) to accelerate Convolutional Neural Networks (CNNs). In contrast to the existing channel pruning methods that prune channels…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Jinyang Guo , Weichen Zhang , Wanli Ouyang , Dong Xu

Low-Complexity Inference in Continual Learning via Compressed Knowledge Transfer

Continual learning (CL) aims to train models that can learn a sequence of tasks without forgetting previously acquired knowledge. A core challenge in CL is balancing stability -- preserving performance on old tasks -- and plasticity --…

Machine Learning · Computer Science 2025-05-14 Zhenrong Liu , Janne M. J. Huttunen , Mikko Honkala

DeepTwist: Learning Model Compression via Occasional Weight Distortion

Model compression has been introduced to reduce the required hardware resources while maintaining the model accuracy. Lots of techniques for model compression, such as pruning, quantization, and low-rank approximation, have been suggested…

Machine Learning · Computer Science 2018-10-31 Dongsoo Lee , Parichay Kapoor , Byeongwook Kim

A Comprehensive Survey of Compression Algorithms for Language Models

How can we compress language models without sacrificing accuracy? The number of compression algorithms for language models is rapidly growing to benefit from remarkable advances of recent language models without side effects due to the…

Computation and Language · Computer Science 2024-01-30 Seungcheol Park , Jaehyeon Choi , Sojin Lee , U Kang

Low-Rank Compression of Language Models via Differentiable Rank Selection

Approaches for compressing large-language models using low-rank decomposition have made strides, particularly with the introduction of activation and loss-aware SVD, which improves the trade-off between decomposition rank and downstream…

Machine Learning · Computer Science 2025-12-17 Sidhant Sundrani , Francesco Tudisco , Pasquale Minervini

Unified Low-rank Compression Framework for Click-through Rate Prediction

Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments.…

Information Retrieval · Computer Science 2024-06-12 Hao Yu , Minghao Fu , Jiandong Ding , Yusheng Zhou , Jianxin Wu

Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models

Deep learning models have achieved tremendous success in most of the industries in recent years. The evolution of these models has also led to an increase in the model size and energy requirement, making it difficult to deploy in production…

Machine Learning · Computer Science 2024-07-24 Aayush Saxena , Arit Kumar Bishwas , Ayush Ashok Mishra , Ryan Armstrong

TOCO: A Framework for Compressing Neural Network Models Based on Tolerance Analysis

Neural network compression methods have enabled deploying large models on emerging edge devices with little cost, by adapting already-trained models to the constraints of these devices. The rapid development of AI-capable edge devices with…

Machine Learning · Computer Science 2019-12-20 Soroosh Khoram , Jing Li