English
Related papers

Related papers: Fluctuation-based Adaptive Structured Pruning for …

200 papers

The rapid increase in the size of large language models (LLMs) has significantly escalated their computational and memory demands, posing challenges for efficient deployment, especially on resource-constrained devices. Structured pruning…

Machine Learning · Computer Science 2025-01-17 Hanyu Hu , Pengxiang Zhao , Ping Li , Yi Zheng , Zhefeng Wang , Xiaoming Yuan

The recent advancements in large language models (LLMs) have significantly improved language understanding and generation capabilities. However, it is difficult to deploy LLMs on resource-constrained edge devices due to their high…

Computation and Language · Computer Science 2024-12-20 Haotian Zheng , Jinke Ren , Yushan Sun , Ruichen Zhang , Wenbo Zhang , Zhen Li , Dusit Niyato , Shuguang Cui , Yatong Han

Large language models(LLMs) have garnered significant attention and demonstrated impressive capabilities in a wide range of applications. However, due to their enormous computational costs, the deployment and application of LLMs are often…

Machine Learning · Computer Science 2025-05-30 Jialong Guo , Xinghao Chen , Yehui Tang , Yunhe Wang

Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial…

Computation and Language · Computer Science 2026-01-07 Guangxin Wu , Hao Zhang , Zhang Zhibin , Jiafeng Guo , Xueqi Cheng

Structured pruning is a commonly used convolutional neural network (CNN) compression approach. Pruning rate setting is a fundamental problem in structured pruning. Most existing works introduce too many additional learnable parameters to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-26 Pucheng Zhai , Kailing Guo , Fang Liu , Xiaofen Xing , Xiangmin Xu

Structured pruning is one of the representative techniques for compressing large language models (LLMs) to reduce GPU memory consumption and accelerate inference speed. It offers significant practical value in improving the efficiency of…

Computation and Language · Computer Science 2025-08-08 Yiheng Liu , Junhao Ning , Sichen Xia , Xiaohui Gao , Ning Qiang , Bao Ge , Junwei Han , Xintao Hu

Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost. This retraining-free paradigm involves (1) pruning…

Machine Learning · Computer Science 2024-07-19 Pingjie Wang , Ziqing Fan , Shengchao Hu , Zhe Chen , Yanfeng Wang , Yu Wang

Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with…

Computation and Language · Computer Science 2024-11-05 Shangqian Gao , Chi-Heng Lin , Ting Hua , Tang Zheng , Yilin Shen , Hongxia Jin , Yen-Chang Hsu

The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently…

Computation and Language · Computer Science 2024-12-11 Yuxin Wang , Minghua Ma , Zekun Wang , Jingchang Chen , Huiming Fan , Liping Shan , Qing Yang , Dongliang Xu , Ming Liu , Bing Qin

Instruction tuning has optimized the specialized capabilities of large language models (LLMs), but it often requires extensive datasets and prolonged training times. The challenge lies in developing specific capabilities by identifying…

Computation and Language · Computer Science 2026-05-26 Run Zou , Jianhang Ding , Yifan Ding , Wen Wu , Hao Chen , Renshu Gu

The deployment of large language models (LLMs) is often constrained by their substantial computational and memory demands. While structured pruning presents a viable approach by eliminating entire network components, existing methods suffer…

Machine Learning · Computer Science 2025-05-07 Hanyu Hu , Xiaoming Yuan

Large language models (LLMs) have demonstrated remarkable performance across various language tasks, but their widespread deployment is impeded by their large size and high computational costs. Structural pruning is a prevailing technique…

Computation and Language · Computer Science 2024-12-10 Haihang Wu

Considering the hardware-friendly characteristics and broad applicability, structured pruning has emerged as an efficient solution to reduce the resource demands of large language models (LLMs) on resource-constrained devices. Traditional…

Machine Learning · Computer Science 2025-01-28 Zihuai Xu , Yang Xu , Hongli Xu , Yunming Liao , Zhiwei Yao , Zuan Xie

Structured pruning of modern large language models (LLMs) has emerged as a way of decreasing their high computational needs. Width pruning reduces the size of projection weight matrices (e.g., by removing attention heads) while maintaining…

Machine Learning · Computer Science 2024-06-25 Bo-Kyeong Kim , Geonmin Kim , Tae-Ho Kim , Thibault Castells , Shinkook Choi , Junho Shin , Hyoung-Kyu Song

Deep learning drives a new wave in computing systems and triggers the automation of increasingly complex problems. In particular, Large Language Models (LLMs) have significantly advanced cognitive tasks, often matching or even surpassing…

Large Language Models (LLMs), such as LLaMA and T5, have shown exceptional performance across various tasks through fine-tuning. Although low-rank adaption (LoRA) has emerged to cheaply fine-tune these LLMs on downstream tasks, their…

Machine Learning · Computer Science 2024-08-08 Mingyang Zhang , Hao Chen , Chunhua Shen , Zhen Yang , Linlin Ou , Xinyi Yu , Bohan Zhuang

Structured pruning of large language models (LLMs) offers substantial efficiency improvements by removing entire hidden units, yet current approaches often suffer from significant performance degradation, particularly in zero-shot settings,…

Machine Learning · Computer Science 2025-09-18 Mengting Ai , Tianxin Wei , Sirui Chen , Jingrui He

Structural pruning techniques are essential for deploying multimodal large language models (MLLMs) across various hardware platforms, from edge devices to cloud servers. However, current pruning methods typically determine optimal…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Zhihan Zhang , Xiang Pan , Hongchen Wei , Zhenzhong Chen

Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train…

Machine Learning · Computer Science 2025-11-17 Rui Pan , Shivanshu Shekhar , Boyao Wang , Shizhe Diao , Jipeng Zhang , Xingyuan Pan , Renjie Pi , Tong Zhang

Pruning provides a practical solution to reduce the resources required to run large language models (LLMs) to benefit from their effective capabilities as well as control their cost for training and inference. Research on LLM pruning often…

Computation and Language · Computer Science 2025-10-28 Yuanhe Tian , Junjie Liu , Xican Yang , Haishan Ye , Yan Song
‹ Prev 1 2 3 10 Next ›