Related papers: Efficient Neural Network Compression

Unified Framework for Pre-trained Neural Network Compression via Decomposition and Optimized Rank Selection

Despite their high accuracy, complex neural networks demand significant computational resources, posing challenges for deployment on resource constrained devices such as mobile phones and embedded systems. Compression algorithms have been…

Machine Learning · Computer Science 2025-09-23 Ali Aghababaei-Harandi , Massih-Reza Amini

Automatic Rank Selection for High-Speed Convolutional Neural Network

Low-rank decomposition plays a central role in accelerating convolutional neural network (CNN), and the rank of decomposed kernel-tensor is a key parameter that determines the complexity and accuracy of a neural network. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2018-07-02 Hyeji Kim , Chong-Min Kyung

Low Rank Optimization for Efficient Deep Learning: Making A Balance between Compact Architecture and Fast Training

Deep neural networks have achieved great success in many data processing applications. However, the high computational complexity and storage cost makes deep learning hard to be used on resource-constrained devices, and it is not…

Machine Learning · Computer Science 2023-03-27 Xinwei Ou , Zhangxin Chen , Ce Zhu , Yipeng Liu

Compression of Recurrent Neural Networks using Matrix Factorization

Compressing neural networks is a key step when deploying models for real-time or embedded applications. Factorizing the model's matrices using low-rank approximations is a promising method for achieving compression. While it is possible to…

Machine Learning · Computer Science 2023-10-20 Lucas Maison , Hélion du Mas des Bourboux , Thomas Courtat

Low-Rank Matrix Approximation for Neural Network Compression

Deep Neural Networks (DNNs) have encountered an emerging deployment challenge due to large and expensive memory and computation requirements. In this paper, we present a new Adaptive-Rank Singular Value Decomposition (ARSVD) method that…

Machine Learning · Computer Science 2025-05-13 Kalyan Cherukuri , Aarav Lala

Speeding up Resnet Architecture with Layers Targeted Low Rank Decomposition

Compression of a neural network can help in speeding up both the training and the inference of the network. In this research, we study applying compression using low rank decomposition on network layers. Our research demonstrates that to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-25 Walid Ahmed , Habib Hajimolahoseini , Austin Wen , Yang Liu

Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks

The success of deep neural networks in many real-world applications is leading to new challenges in building more efficient architectures. One effective way of making networks more efficient is neural network compression. We provide an…

Machine Learning · Computer Science 2019-12-23 Andrey Kuzmin , Markus Nagel , Saurabh Pitre , Sandeep Pendyam , Tijmen Blankevoort , Max Welling

Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning

While Convolutional Neural Networks (CNNs) excel at learning complex latent-space representations, their over-parameterization can lead to overfitting and reduced performance, particularly with limited data. This, alongside their high…

Computer Vision and Pattern Recognition · Computer Science 2024-01-17 Manish Sharma , Jamison Heard , Eli Saber , Panos P. Markopoulos

CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition

Deep Convolutional Neural Networks (CNNs) are increasingly difficult to deploy on microcontrollers (MCUs) and lightweight NPUs (Neural Processing Units) due to their growing size and compute demands. Low-rank tensor decomposition, such as…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Sudhakar Sah , Nikhil Chabbra , Matthieu Durnerin

Low-Rank+Sparse Tensor Compression for Neural Networks

Low-rank tensor compression has been proposed as a promising approach to reduce the memory and compute requirements of neural networks for their deployment on edge devices. Tensor compression reduces the number of parameters required to…

Machine Learning · Computer Science 2021-11-03 Cole Hawkins , Haichuan Yang , Meng Li , Liangzhen Lai , Vikas Chandra

Neural Network Compression via Effective Filter Analysis and Hierarchical Pruning

Network compression is crucial to making the deep networks to be more efficient, faster, and generalizable to low-end hardware. Current network compression methods have two open problems: first, there lacks a theoretical framework to…

Machine Learning · Computer Science 2022-06-09 Ziqi Zhou , Li Lian , Yilong Yin , Ze Wang

Convolutional neural networks compression with low rank and sparse tensor decompositions

Convolutional neural networks show outstanding results in a variety of computer vision tasks. However, a neural network architecture design usually faces a trade-off between model performance and computational/memory complexity. For some…

Computer Vision and Pattern Recognition · Computer Science 2020-06-12 Pavel Kaloshin

Neural Network Compression Via Sparse Optimization

The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network…

Machine Learning · Computer Science 2020-11-12 Tianyi Chen , Bo Ji , Yixin Shi , Tianyu Ding , Biyi Fang , Sheng Yi , Xiao Tu

Data-Driven Low-Rank Neural Network Compression

Despite many modern applications of Deep Neural Networks (DNNs), the large number of parameters in the hidden layers makes them unattractive for deployment on devices with storage capacity constraints. In this paper we propose a Data-Driven…

Machine Learning · Computer Science 2021-07-14 Dimitris Papadimitriou , Swayambhoo Jain

Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

A common technique for compressing a neural network is to compute the $k$-rank $\ell_2$ approximation $A_{k,2}$ of the matrix $A\in\mathbb{R}^{n\times d}$ that corresponds to a fully connected layer (or embedding layer). Here, $d$ is the…

Machine Learning · Computer Science 2020-09-29 Murad Tukan , Alaa Maalouf , Matan Weksler , Dan Feldman

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "deep compression", a three stage…

Computer Vision and Pattern Recognition · Computer Science 2016-02-16 Song Han , Huizi Mao , William J. Dally

Trained Rank Pruning for Efficient Deep Neural Networks

The performance of Deep Neural Networks (DNNs) keeps elevating in recent years with increasing network depth and width. To enable DNNs on edge devices like mobile phones, researchers proposed several network compression methods including…

Computer Vision and Pattern Recognition · Computer Science 2020-01-27 Yuhui Xu , Yuxi Li , Shuai Zhang , Wei Wen , Botao Wang , Yingyong Qi , Yiran Chen , Weiyao Lin , Hongkai Xiong

Self-Compressing Neural Networks

This work focuses on reducing neural network size, which is a major driver of neural network execution time, power consumption, bandwidth, and memory footprint. A key challenge is to reduce size in a manner that can be exploited readily for…

Machine Learning · Computer Science 2025-06-18 Szabolcs Cséfalvay , James Imber

Network Automatic Pruning: Start NAP and Take a Nap

Network pruning can significantly reduce the computation and memory footprint of large neural networks. To achieve a good trade-off between model size and performance, popular pruning techniques usually rely on hand-crafted heuristics and…

Computer Vision and Pattern Recognition · Computer Science 2021-01-19 Wenyuan Zeng , Yuwen Xiong , Raquel Urtasun

Efficient CNN Compression via Multi-method Low Rank Factorization and Feature Map Similarity

Low-Rank Factorization (LRF) is a widely adopted technique for compressing deep neural networks (DNNs). However, it faces several challenges, including optimal rank selection, a vast design space, long fine-tuning times, and limited…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 M. Kokhazadeh , G. Keramidas , V. Kelefouras