English
Related papers

Related papers: ADMM Based Semi-Structured Pattern Pruning Framewo…

200 papers

Natural Language Processing (NLP) has recently achieved success by using huge pre-trained Transformer networks. However, these models often contain hundreds of millions or even billions of parameters, bringing challenges to online…

Computation and Language · Computer Science 2021-11-01 Connor Holmes , Minjia Zhang , Yuxiong He , Bo Wu

Weight pruning methods for deep neural networks (DNNs) have been investigated recently, but prior work in this area is mainly heuristic, iterative pruning, thereby lacking guarantees on the weight reduction ratio and convergence time. To…

Neural and Evolutionary Computing · Computer Science 2018-10-23 Tianyun Zhang , Shaokai Ye , Kaiqi Zhang , Jian Tang , Wujie Wen , Makan Fardad , Yanzhi Wang

Deep neural networks (DNNs) although achieving human-level performance in many domains, have very large model size that hinders their broader applications on edge computing devices. Extensive research work have been conducted on DNN model…

Machine Learning · Computer Science 2018-11-06 Shaokai Ye , Tianyun Zhang , Kaiqi Zhang , Jiayu Li , Kaidi Xu , Yunfei Yang , Fuxun Yu , Jian Tang , Makan Fardad , Sijia Liu , Xiang Chen , Xue Lin , Yanzhi Wang

We present a systematic weight pruning framework of deep neural networks (DNNs) using the alternating direction method of multipliers (ADMM). We first formulate the weight pruning problem of DNNs as a constrained nonconvex optimization…

Machine Learning · Computer Science 2018-04-24 Tianyun Zhang , Shaokai Ye , Yipeng Zhang , Yanzhi Wang , Makan Fardad

To facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), two important categories of DNN model compression techniques: weight pruning and weight quantization are investigated. The former leverages the…

Machine Learning · Computer Science 2019-01-03 Ao Ren , Tianyun Zhang , Shaokai Ye , Jiayu Li , Wenyao Xu , Xuehai Qian , Xue Lin , Yanzhi Wang

The state-of-art DNN structures involve high computation and great demand for memory storage which pose intensive challenge on DNN framework resources. To mitigate the challenges, weight pruning techniques has been studied. However, high…

Machine Learning · Computer Science 2019-05-03 Xiaolong Ma , Geng Yuan , Sheng Lin , Zhengang Li , Hao Sun , Yanzhi Wang

Weight pruning methods of DNNs have been demonstrated to achieve a good model pruning rate without loss of accuracy, thereby alleviating the significant computation/storage requirements of large-scale DNNs. Structured weight pruning methods…

Neural and Evolutionary Computing · Computer Science 2019-03-28 Tianyun Zhang , Shaokai Ye , Kaiqi Zhang , Xiaolong Ma , Ning Liu , Linfeng Zhang , Jian Tang , Kaisheng Ma , Xue Lin , Makan Fardad , Yanzhi Wang

Large language models (LLMs) have achieved outstanding performance in natural language processing, but enormous model sizes and high computational costs limit their practical deployment. Structured pruning can effectively reduce the…

Computation and Language · Computer Science 2025-03-11 Jun Kong , Xinge Ma , Jin Wang , Xuejie Zhang

Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. However, previous pruning methods mainly focus on reducing the model size and/or improving…

Machine Learning · Computer Science 2022-03-29 Yifan Gong , Zheng Zhan , Zhengang Li , Wei Niu , Xiaolong Ma , Wenhao Wang , Bin Ren , Caiwen Ding , Xue Lin , Xiaolin Xu , Yanzhi Wang

Many model compression techniques of Deep Neural Networks (DNNs) have been investigated, including weight pruning, weight clustering and quantization, etc. Weight pruning leverages the redundancy in the number of weights in DNNs, while…

Neural and Evolutionary Computing · Computer Science 2018-11-06 Shaokai Ye , Tianyun Zhang , Kaiqi Zhang , Jiayu Li , Jiaming Xie , Yun Liang , Sijia Liu , Xue Lin , Yanzhi Wang

Weight pruning and weight quantization are two important categories of DNN model compression. Prior work on these techniques are mainly based on heuristics. A recent work developed a systematic frame-work of DNN weight pruning using the…

Neural and Evolutionary Computing · Computer Science 2019-04-02 Shaokai Ye , Xiaoyu Feng , Tianyun Zhang , Xiaolong Ma , Sheng Lin , Zhengang Li , Kaidi Xu , Wujie Wen , Sijia Liu , Jian Tang , Makan Fardad , Xue Lin , Yongpan Liu , Yanzhi Wang

Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning. Although computationally efficient, the recent Adapters…

Computation and Language · Computer Science 2022-11-11 Shwai He , Liang Ding , Daize Dong , Miao Zhang , Dacheng Tao

The state-of-art DNN structures involve intensive computation and high memory storage. To mitigate the challenges, the memristor crossbar array has emerged as an intrinsically suitable matrix computation and low-power acceleration framework…

Signal Processing · Electrical Eng. & Systems 2019-09-04 Xiaolong Ma , Geng Yuan , Sheng Lin , Caiwen Ding , Fuxun Yu , Tao Liu , Wujie Wen , Xiang Chen , Yanzhi Wang

The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have…

Computation and Language · Computer Science 2024-12-19 Weiyu Huang , Yuezhou Hu , Guohao Jian , Jun Zhu , Jianfei Chen

Adapters are widely popular parameter-efficient transfer learning approaches in natural language processing that insert trainable modules in between layers of a pre-trained language model. Apart from several heuristics, however, there has…

Computation and Language · Computer Science 2023-10-31 Rishabh Bhardwaj , Tushar Vaidya , Soujanya Poria

Pruning is critical for scaling large language models (LLMs). Global pruning achieves strong performance but requires $\mathcal{O}(N)$ memory, which is infeasible for billion-parameter models. Local pruning reduces GPU memory usage to that…

Machine Learning · Computer Science 2025-10-07 Xinyuan Song , Guangji Bai , Liang Zhao

Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train…

Machine Learning · Computer Science 2025-11-17 Rui Pan , Shivanshu Shekhar , Boyao Wang , Shizhe Diao , Jipeng Zhang , Xingyuan Pan , Renjie Pi , Tong Zhang

Neural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT…

Artificial Intelligence · Computer Science 2016-07-01 Abigail See , Minh-Thang Luong , Christopher D. Manning

The storage and computation requirements of Convolutional Neural Networks (CNNs) can be prohibitive for exploiting these models over low-power or embedded devices. This paper reduces the computational complexity of the CNNs by minimizing an…

Neural and Evolutionary Computing · Computer Science 2017-01-17 Farkhondeh Kiaee , Christian Gagné , Mahdieh Abbasi

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks…

Machine Learning · Statistics 2017-11-15 Michael Zhu , Suyog Gupta
‹ Prev 1 2 3 10 Next ›