Related papers: Neural Network Compression Framework for fast mode…

Coding for Computation: Efficient Compression of Neural Networks for Reconfigurable Hardware

As state of the art neural networks (NNs) continue to grow in size, their resource-efficient implementation becomes ever more important. In this paper, we introduce a compression scheme that reduces the number of computations required for…

Machine Learning · Computer Science 2025-04-25 Hans Rosenberger , Rodrigo Fischer , Johanna S. Fröhlich , Ali Bereyhi , Ralf R. Müller

Neural Network Compression Via Sparse Optimization

The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network…

Machine Learning · Computer Science 2020-11-12 Tianyi Chen , Bo Ji , Yixin Shi , Tianyu Ding , Biyi Fang , Sheng Yi , Xiao Tu

Compressed Learning of Deep Neural Networks for OpenCL-Capable Embedded Systems

Deep neural networks (DNNs) have been quite successful in solving many complex learning problems. However, DNNs tend to have a large number of learning parameters, leading to a large memory and computation requirement. In this paper, we…

Machine Learning · Computer Science 2019-05-21 Sangkyun Lee , Jeonghyun Lee

Neural NeRF Compression

Neural Radiance Fields (NeRFs) have emerged as powerful tools for capturing detailed 3D scenes through continuous volumetric representations. Recent NeRFs utilize feature grids to improve rendering quality and speed; however, these…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Tuan Pham , Stephan Mandt

On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee

Model compression is a crucial part of deploying neural networks (NNs), especially when the memory and storage of computing devices are limited in many applications. This paper focuses on two model compression techniques: low-rank…

Machine Learning · Computer Science 2024-08-16 Chenyang Li , Jihoon Chung , Mengnan Du , Haimin Wang , Xianlian Zhou , Bo Shen

SparseDNN: Fast Sparse Deep Learning Inference on CPUs

The last few years have seen gigantic leaps in algorithms and systems to support efficient deep learning inference. Pruning and quantization algorithms can now consistently compress neural networks by an order of magnitude. For a compressed…

Machine Learning · Computer Science 2021-07-22 Ziheng Wang

An Introduction to Neural Data Compression

Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened up new possibilities for data compression, allowing compression…

Machine Learning · Computer Science 2023-08-22 Yibo Yang , Stephan Mandt , Lucas Theis

A Unified Approximation Framework for Compressing and Accelerating Deep Neural Networks

Deep neural networks (DNNs) have achieved significant success in a variety of real world applications, i.e., image classification. However, tons of parameters in the networks restrict the efficiency of neural networks due to the large model…

Machine Learning · Computer Science 2019-08-21 Yuzhe Ma , Ran Chen , Wei Li , Fanhua Shang , Wenjian Yu , Minsik Cho , Bei Yu

A Benchmark Study of Neural Network Compression Methods for Hyperspectral Image Classification

Deep neural networks have achieved strong performance in image classification tasks due to their ability to learn complex patterns from high-dimensional data. However, their large computational and memory requirements often limit deployment…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Sai Shi

A flexible, extensible software framework for model compression based on the LC algorithm

We propose a software framework based on the ideas of the Learning-Compression (LC) algorithm, that allows a user to compress a neural network or other machine learning model using different compression schemes with minimal effort.…

Machine Learning · Computer Science 2020-05-19 Yerlan Idelbayev , Miguel Á. Carreira-Perpiñán

FSCNN: A Fast Sparse Convolution Neural Network Inference System

Convolution neural networks (CNNs) have achieved remarkable success, but typically accompany high computation cost and numerous redundant weight parameters. To reduce the FLOPs, structure pruning is a popular approach to remove the entire…

Computer Vision and Pattern Recognition · Computer Science 2022-12-20 Bo Ji , Tianyi Chen

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-constrained computing devices. Model compression techniques can address…

Machine Learning · Computer Science 2018-10-23 Qing Qin , Jie Ren , Jialong Yu , Ling Gao , Hai Wang , Jie Zheng , Yansong Feng , Jianbin Fang , Zheng Wang

TOCO: A Framework for Compressing Neural Network Models Based on Tolerance Analysis

Neural network compression methods have enabled deploying large models on emerging edge devices with little cost, by adapting already-trained models to the constraints of these devices. The rapid development of AI-capable edge devices with…

Machine Learning · Computer Science 2019-12-20 Soroosh Khoram , Jing Li

A Survey of Model Compression and Acceleration for Deep Neural Networks

Deep neural networks (DNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with…

Machine Learning · Computer Science 2020-06-16 Yu Cheng , Duo Wang , Pan Zhou , Tao Zhang

A Programmable Approach to Neural Network Compression

Deep neural networks (DNNs) frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such…

Machine Learning · Computer Science 2020-12-03 Vinu Joseph , Saurav Muralidharan , Animesh Garg , Michael Garland , Ganesh Gopalakrishnan

LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

When deploying deep learning models to a device, it is traditionally assumed that available computational resources (compute, memory, and power) remain static. However, real-world computing systems do not always provide stable resource…

Machine Learning · Computer Science 2021-10-11 Elvis Nunez , Maxwell Horton , Anish Prabhu , Anurag Ranjan , Ali Farhadi , Mohammad Rastegari

Optimizing CNN Model Inference on CPUs

The popularity of Convolutional Neural Network (CNN) models and the ubiquity of CPUs imply that better performance of CNN model inference on CPUs can deliver significant gain to a large number of users. To improve the performance of CNN…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-09 Yizhi Liu , Yao Wang , Ruofei Yu , Mu Li , Vin Sharma , Yida Wang

Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe

High throughput and low latency inference of deep neural networks are critical for the deployment of deep learning applications. This paper presents the efficient inference techniques of IntelCaffe, the first Intel optimized deep learning…

Computer Vision and Pattern Recognition · Computer Science 2018-05-23 Jiong Gong , Haihao Shen , Guoming Zhang , Xiaoli Liu , Shane Li , Ge Jin , Niharika Maheshwari , Evarist Fomenko , Eden Segal

A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression

Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. When training a DNN model, the intermediate activation data must be saved in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-24 Sian Jin , Guanpeng Li , Shuaiwen Leon Song , Dingwen Tao

Model compression as constrained optimization, with application to neural nets. Part I: general framework

Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. We give a general formulation of model…

Machine Learning · Computer Science 2017-07-06 Miguel Á. Carreira-Perpiñán