Related papers: Minimal Random Code Learning: Getting Bits Back fr…

Bit-wise Training of Neural Network Weights

We introduce an algorithm where the individual bits representing the weights of a neural network are learned. This method allows training weights with integer values on arbitrary bit-depths and naturally uncovers sparse networks, without…

Machine Learning · Computer Science 2022-02-22 Cristian Ivan

Model compression as constrained optimization, with application to neural nets. Part II: quantization

We consider the problem of deep neural net compression by quantization: given a large, reference net, we want to quantize its real-valued weights using a codebook with $K$ entries so that the training loss of the quantized net is minimal.…

Machine Learning · Computer Science 2017-07-17 Miguel Á. Carreira-Perpiñán , Yerlan Idelbayev

An Efficient Compression of Deep Neural Network Checkpoints Based on Prediction and Context Modeling

This paper is dedicated to an efficient compression of weights and optimizer states (called checkpoints) obtained at different stages during a neural network training process. First, we propose a prediction-based compression approach, where…

Machine Learning · Computer Science 2025-06-16 Yuriy Kim , Evgeny Belyaev

Weightless: Lossy Weight Encoding For Deep Neural Network Compression

The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such…

Machine Learning · Computer Science 2017-11-15 Brandon Reagen , Udit Gupta , Robert Adolf , Michael M. Mitzenmacher , Alexander M. Rush , Gu-Yeon Wei , David Brooks

Low Precision Neural Networks using Subband Decomposition

Large-scale deep neural networks (DNN) have been successfully used in a number of tasks from image recognition to natural language processing. They are trained using large training sets on large models, making them computationally and…

Machine Learning · Computer Science 2017-03-28 Sek Chai , Aswin Raghavan , David Zhang , Mohamed Amer , Tim Shields

Soft Weight-Sharing for Neural Network Compression

The success of deep learning in numerous application domains created the de- sire to run and train them on mobile devices. This however, conflicts with their computationally, memory and energy intense nature, leading to a growing interest…

Machine Learning · Statistics 2017-05-10 Karen Ullrich , Edward Meeds , Max Welling

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

Although deep learning models are highly effective for various learning tasks, their high computational costs prohibit the deployment to scenarios where either memory or computational resources are limited. In this paper, we focus on…

Computer Vision and Pattern Recognition · Computer Science 2017-09-14 Cong Leng , Hao Li , Shenghuo Zhu , Rong Jin

Distribution-Aware Tensor Decomposition for Compression of Convolutional Neural Networks

Neural networks are widely used for image-related tasks but typically demand considerable computing power. Once a network has been trained, however, its memory- and compute-footprint can be reduced by compression. In this work, we focus on…

Machine Learning · Computer Science 2025-11-13 Alper Kalle , Theo Rudkiewicz , Mohamed-Oumar Ouerfelli , Mohamed Tamaazousti

Efficient Model Compression for Bayesian Neural Networks

Model Compression has drawn much attention within the deep learning community recently. Compressing a dense neural network offers many advantages including lower computation cost, deployability to devices of limited storage and memories,…

Machine Learning · Computer Science 2024-11-04 Diptarka Saha , Zihe Liu , Feng Liang

Getting Free Bits Back from Rotational Symmetries in LLMs

Current methods for compressing neural network weights, such as decomposition, pruning, quantization, and channel simulation, often overlook the inherent symmetries within these networks and thus waste bits on encoding redundant…

Information Theory · Computer Science 2024-10-03 Jiajun He , Gergely Flamich , José Miguel Hernández-Lobato

BitNet: Bit-Regularized Deep Neural Networks

We present a novel optimization strategy for training neural networks which we call "BitNet". The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the…

Machine Learning · Computer Science 2018-11-20 Aswin Raghavan , Mohamed Amer , Sek Chai , Graham Taylor

Weight Pruning via Adaptive Sparsity Loss

Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust compressive learning…

Machine Learning · Computer Science 2020-06-05 George Retsinas , Athena Elafrou , Georgios Goumas , Petros Maragos

Self-Compressing Neural Networks

This work focuses on reducing neural network size, which is a major driver of neural network execution time, power consumption, bandwidth, and memory footprint. A key challenge is to reduce size in a manner that can be exploited readily for…

Machine Learning · Computer Science 2025-06-18 Szabolcs Cséfalvay , James Imber

Compressing Word Embeddings via Deep Compositional Code Learning

Natural language processing (NLP) models often require a massive number of parameters for word embeddings, resulting in a large storage or memory footprint. Deploying neural NLP models to mobile devices requires compressing the word…

Computation and Language · Computer Science 2017-11-20 Raphael Shu , Hideki Nakayama

Weight Fixing Networks

Modern iterations of deep learning models contain millions (billions) of unique parameters, each represented by a b-bit number. Popular attempts at compressing neural networks (such as pruning and quantisation) have shown that many of the…

Machine Learning · Computer Science 2022-10-26 Christopher Subia-Waud , Srinandan Dasmahapatra

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural…

Machine Learning · Computer Science 2018-05-30 Dongsoo Lee , Byeongwook Kim

Scalable Compression of Deep Neural Networks

Deep neural networks generally involve some layers with mil- lions of parameters, making them difficult to be deployed and updated on devices with limited resources such as mobile phones and other smart embedded systems. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2016-08-29 Xing Wang , Jie Liang

Learning Efficient Convolutional Networks through Irregular Convolutional Kernels

As deep neural networks are increasingly used in applications suited for low-power devices, a fundamental dilemma becomes apparent: the trend is to grow models to absorb increasing data that gives rise to memory intensive; however low-power…

Computer Vision and Pattern Recognition · Computer Science 2023-02-21 Weiyu Guo , Jiabin Ma , Liang Wang , Yongzhen Huang

Compression with Bayesian Implicit Neural Representations

Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural…

Machine Learning · Computer Science 2023-10-31 Zongyu Guo , Gergely Flamich , Jiajun He , Zhibo Chen , José Miguel Hernández-Lobato

SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions

Compressing large-scale neural networks is essential for deploying models on resource-constrained devices. Most existing methods adopt weight pruning or low-bit quantization individually, often resulting in suboptimal compression rates to…

Machine Learning · Computer Science 2025-10-13 Ziyi Wang , Nan Jiang , Guang Lin , Qifan Song