Related papers: ADMM Based Semi-Structured Pattern Pruning Framewo…

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Natural Language Processing (NLP) has recently achieved success by using huge pre-trained Transformer networks. However, these models often contain hundreds of millions or even billions of parameters, bringing challenges to online…

Computation and Language · Computer Science 2021-11-01 Connor Holmes , Minjia Zhang , Yuxiong He , Bo Wu

A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers

Weight pruning methods for deep neural networks (DNNs) have been investigated recently, but prior work in this area is mainly heuristic, iterative pruning, thereby lacking guarantees on the weight reduction ratio and convergence time. To…

Neural and Evolutionary Computing · Computer Science 2018-10-23 Tianyun Zhang , Shaokai Ye , Kaiqi Zhang , Jian Tang , Wujie Wen , Makan Fardad , Yanzhi Wang

Progressive Weight Pruning of Deep Neural Networks using ADMM

Deep neural networks (DNNs) although achieving human-level performance in many domains, have very large model size that hinders their broader applications on edge computing devices. Extensive research work have been conducted on DNN model…

Machine Learning · Computer Science 2018-11-06 Shaokai Ye , Tianyun Zhang , Kaiqi Zhang , Jiayu Li , Kaidi Xu , Yunfei Yang , Fuxun Yu , Jian Tang , Makan Fardad , Sijia Liu , Xiang Chen , Xue Lin , Yanzhi Wang

Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers

We present a systematic weight pruning framework of deep neural networks (DNNs) using the alternating direction method of multipliers (ADMM). We first formulate the weight pruning problem of DNNs as a constrained nonconvex optimization…

Machine Learning · Computer Science 2018-04-24 Tianyun Zhang , Shaokai Ye , Yipeng Zhang , Yanzhi Wang , Makan Fardad

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

To facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), two important categories of DNN model compression techniques: weight pruning and weight quantization are investigated. The former leverages the…

Machine Learning · Computer Science 2019-01-03 Ao Ren , Tianyun Zhang , Shaokai Ye , Jiayu Li , Wenyao Xu , Xuehai Qian , Xue Lin , Yanzhi Wang

ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning

The state-of-art DNN structures involve high computation and great demand for memory storage which pose intensive challenge on DNN framework resources. To mitigate the challenges, weight pruning techniques has been studied. However, high…

Machine Learning · Computer Science 2019-05-03 Xiaolong Ma , Geng Yuan , Sheng Lin , Zhengang Li , Hao Sun , Yanzhi Wang

StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs

Weight pruning methods of DNNs have been demonstrated to achieve a good model pruning rate without loss of accuracy, thereby alleviating the significant computation/storage requirements of large-scale DNNs. Structured weight pruning methods…

Neural and Evolutionary Computing · Computer Science 2019-03-28 Tianyun Zhang , Shaokai Ye , Kaiqi Zhang , Xiaolong Ma , Ning Liu , Linfeng Zhang , Jian Tang , Kaisheng Ma , Xue Lin , Makan Fardad , Yanzhi Wang

Sample-aware Adaptive Structured Pruning for Large Language Models

Large language models (LLMs) have achieved outstanding performance in natural language processing, but enormous model sizes and high computational costs limit their practical deployment. Structured pruning can effectively reduce the…

Computation and Language · Computer Science 2025-03-11 Jun Kong , Xinge Ma , Jin Wang , Xuejie Zhang

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. However, previous pruning methods mainly focus on reducing the model size and/or improving…

Machine Learning · Computer Science 2022-03-29 Yifan Gong , Zheng Zhan , Zhengang Li , Wei Niu , Xiaolong Ma , Wenhao Wang , Bin Ren , Caiwen Ding , Xue Lin , Xiaolin Xu , Yanzhi Wang

A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM

Many model compression techniques of Deep Neural Networks (DNNs) have been investigated, including weight pruning, weight clustering and quantization, etc. Weight pruning leverages the redundancy in the number of weights in DNNs, while…

Neural and Evolutionary Computing · Computer Science 2018-11-06 Shaokai Ye , Tianyun Zhang , Kaiqi Zhang , Jiayu Li , Jiaming Xie , Yun Liang , Sijia Liu , Xue Lin , Yanzhi Wang

Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM

Weight pruning and weight quantization are two important categories of DNN model compression. Prior work on these techniques are mainly based on heuristics. A recent work developed a systematic frame-work of DNN weight pruning using the…

Neural and Evolutionary Computing · Computer Science 2019-04-02 Shaokai Ye , Xiaoyu Feng , Tianyun Zhang , Xiaolong Ma , Sheng Lin , Zhengang Li , Kaidi Xu , Wujie Wen , Sijia Liu , Jian Tang , Makan Fardad , Xue Lin , Yongpan Liu , Yanzhi Wang

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning. Although computationally efficient, the recent Adapters…

Computation and Language · Computer Science 2022-11-11 Shwai He , Liang Ding , Daize Dong , Miao Zhang , Dacheng Tao

Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation

The state-of-art DNN structures involve intensive computation and high memory storage. To mitigate the challenges, the memristor crossbar array has emerged as an intrinsically suitable matrix computation and low-power acceleration framework…

Signal Processing · Electrical Eng. & Systems 2019-09-04 Xiaolong Ma , Geng Yuan , Sheng Lin , Caiwen Ding , Fuxun Yu , Tao Liu , Wujie Wen , Xiang Chen , Yanzhi Wang

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have…

Computation and Language · Computer Science 2024-12-19 Weiyu Huang , Yuezhou Hu , Guohao Jian , Jun Zhu , Jianfei Chen

Adapter Pruning using Tropical Characterization

Adapters are widely popular parameter-efficient transfer learning approaches in natural language processing that insert trainable modules in between layers of a pre-trained language model. Apart from several heuristics, however, there has…

Computation and Language · Computer Science 2023-10-31 Rishabh Bhardwaj , Tushar Vaidya , Soujanya Poria

StructPrune: Structured Global Pruning asymptotics with $\mathcal{O}(\sqrt{N})$ GPU Memory

Pruning is critical for scaling large language models (LLMs). Global pruning achieves strong performance but requires $\mathcal{O}(N)$ memory, which is infeasible for billion-parameter models. Local pruning reduces GPU memory usage to that…

Machine Learning · Computer Science 2025-10-07 Xinyuan Song , Guangji Bai , Liang Zhao

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train…

Machine Learning · Computer Science 2025-11-17 Rui Pan , Shivanshu Shekhar , Boyao Wang , Shizhe Diao , Jipeng Zhang , Xingyuan Pan , Renjie Pi , Tong Zhang

Compression of Neural Machine Translation Models via Pruning

Neural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT…

Artificial Intelligence · Computer Science 2016-07-01 Abigail See , Minh-Thang Luong , Christopher D. Manning

Alternating Direction Method of Multipliers for Sparse Convolutional Neural Networks

The storage and computation requirements of Convolutional Neural Networks (CNNs) can be prohibitive for exploiting these models over low-power or embedded devices. This paper reduces the computational complexity of the CNNs by minimizing an…

Neural and Evolutionary Computing · Computer Science 2017-01-17 Farkhondeh Kiaee , Christian Gagné , Mahdieh Abbasi

To prune, or not to prune: exploring the efficacy of pruning for model compression

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks…

Machine Learning · Statistics 2017-11-15 Michael Zhu , Suyog Gupta