Related papers: Progressive Binarization with Semi-Structured Prun…

PB-LLM: Partially Binarized Large Language Models

This paper explores network binarization, a radical form of quantization, compressing model weights to a single bit, specifically for Large Language Models (LLMs) compression. Due to previous binarization methods collapsing LLMs, we propose…

Machine Learning · Computer Science 2023-11-09 Yuzhang Shang , Zhihang Yuan , Qiang Wu , Zhen Dong

DReSS: Data-driven Regularized Structured Streamlining for Large Language Models

Large language models (LLMs) have achieved significant progress across various domains, but their increasing scale results in high computational and memory costs. Recent studies have revealed that LLMs exhibit sparsity, providing the…

Machine Learning · Computer Science 2025-07-01 Mingkuan Feng , Jinyang Wu , Shuai Zhang , Pengpeng Shao , Ruihan Jin , Zhengqi Wen , Jianhua Tao , Feihu Che

PT$^2$-LLM: Post-Training Ternarization for Large Language Models

Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment. Ternarization has gained attention as a promising compression technique, delivering…

Machine Learning · Computer Science 2026-02-02 Xianglong Yan , Chengzhu Bao , Zhiteng Li , Tianao Zhang , Kaicheng Yang , Haotong Qin , Ruobing Xie , Xingwu Sun , Yulun Zhang

ARB-LLM: Alternating Refined Binarizations for Large Language Models

Large Language Models (LLMs) have greatly pushed forward advancements in natural language processing, yet their high memory and computational demands hinder practical deployment. Binarization, as an effective compression technique, can…

Computer Vision and Pattern Recognition · Computer Science 2026-02-02 Zhiteng Li , Xianglong Yan , Tianao Zhang , Haotong Qin , Dong Xie , Jiang Tian , zhongchao shi , Linghe Kong , Yulun Zhang , Xiaokang Yang

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

Large Language Models (LLMs) have become pivotal in advancing the field of artificial intelligence, yet their immense sizes pose significant challenges for both fine-tuning and deployment. Current post-training pruning methods, while…

Computation and Language · Computer Science 2024-05-28 Xudong Lu , Aojun Zhou , Yuhui Xu , Renrui Zhang , Peng Gao , Hongsheng Li

QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language Models

The rise of large language models (LLMs) has significantly advanced various natural language processing (NLP) tasks. However, the resource demands of these models pose substantial challenges. Structured pruning is an effective approach to…

Machine Learning · Computer Science 2024-12-17 Changhai Zhou , Yuhua Zhou , Shijie Han , Qian Qiao , Hongguang Li

Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration

Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial…

Computation and Language · Computer Science 2026-01-07 Guangxin Wu , Hao Zhang , Zhang Zhibin , Jiafeng Guo , Xueqi Cheng

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

While Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities, their substantial computational and memory requirements pose significant barriers to practical deployment. Current parameter reduction techniques primarily…

Computation and Language · Computer Science 2025-07-29 Yiran Huang , Lukas Thede , Massimiliano Mancini , Wenjia Xu , Zeynep Akata

SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

The deployment of large language models (LLMs) is often constrained by their substantial computational and memory demands. While structured pruning presents a viable approach by eliminating entire network components, existing methods suffer…

Machine Learning · Computer Science 2025-05-07 Hanyu Hu , Xiaoming Yuan

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Pretrained large language models (LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources. As a powerful compression technology, binarization can extremely…

Machine Learning · Computer Science 2024-06-19 Wei Huang , Yangdong Liu , Haotong Qin , Ying Li , Shiming Zhang , Xianglong Liu , Michele Magno , Xiaojuan Qi

DarwinLM: Evolutionary Structured Pruning of Large Language Models

Large Language Models (LLMs) have achieved significant success across various NLP tasks. However, their massive computational costs limit their widespread use, particularly in real-time applications. Structured pruning offers an effective…

Machine Learning · Computer Science 2025-03-06 Shengkun Tang , Oliver Sieberling , Eldar Kurtic , Zhiqiang Shen , Dan Alistarh

Two-Stage Regularization-Based Structured Pruning for LLMs

The deployment of large language models (LLMs) is largely hindered by their large number of parameters. Structural pruning has emerged as a promising solution. Prior structured pruning methods directly remove unimportant parameters based on…

Machine Learning · Computer Science 2026-04-21 Mingkuan Feng , Jinyang Wu , Siyuan Liu , Shuai Zhang , Hongjian Fang , Ruihan Jin , Feihu Che , Pengpeng Shao , Zhengqi Wen , Jianhua Tao

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have…

Computation and Language · Computer Science 2024-12-19 Weiyu Huang , Yuezhou Hu , Guohao Jian , Jun Zhu , Jianfei Chen

Sparsity Induction for Accurate Post-Training Pruning of Large Language Models

Large language models have demonstrated capabilities in text generation, while their increasing parameter scales present challenges in computational and memory efficiency. Post-training sparsity (PTS), which reduces model cost by removing…

Computation and Language · Computer Science 2026-02-26 Minhao Jiang , Zhikai Li , Xuewen Liu , Jing Zhang , Mengjuan Chen , Qingyi Gu

Binary Neural Networks for Large Language Model: A Survey

Large language models (LLMs) have wide applications in the field of natural language processing(NLP), such as GPT-4 and Llama. However, with the exponential growth of model parameter sizes, LLMs bring significant resource overheads. Low-bit…

Computation and Language · Computer Science 2025-02-27 Liangdong Liu , Zhitong Zheng , Cong Wang , Tianhuang Su , Zhenyu Yang

PAT: Pruning-Aware Tuning for Large Language Models

Large language models (LLMs) excel in language tasks, especially with supervised fine-tuning after pre-training. However, their substantial memory and computational requirements hinder practical applications. Structural pruning, which…

Machine Learning · Computer Science 2025-01-28 Yijiang Liu , Huanrui Yang , Youxin Chen , Rongyu Zhang , Miao Wang , Yuan Du , Li Du

SparseLLM: Towards Global Pruning for Pre-trained Language Models

The transformative impact of large language models (LLMs) like LLaMA and GPT on natural language processing is countered by their prohibitive computational demands. Pruning has emerged as a pivotal compression strategy, introducing sparsity…

Computation and Language · Computer Science 2024-11-04 Guangji Bai , Yijiang Li , Chen Ling , Kibaek Kim , Liang Zhao

PIP: Perturbation-based Iterative Pruning for Large Language Models

The rapid increase in the parameter counts of Large Language Models (LLMs), which often reach into the billions or even trillions, presents significant challenges for their practical deployment, particularly in resource-constrained…

Machine Learning · Computer Science 2025-11-18 Yi Cao , Wei-Jie Xu , Yucheng Shen , Weijie Shi , Chi-Min Chan , Jianfeng Qu , Jiajie Xu

DB-LLM: Accurate Dual-Binarization for Efficient LLMs

Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment. Quantization emerges as one of the most effective…

Machine Learning · Computer Science 2024-02-20 Hong Chen , Chengtao Lv , Liang Ding , Haotong Qin , Xiabin Zhou , Yifu Ding , Xuebo Liu , Min Zhang , Jinyang Guo , Xianglong Liu , Dacheng Tao

Highly Efficient and Effective LLMs with Multi-Boolean Architectures

Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and…

Machine Learning · Statistics 2026-04-22 Ba-Hien Tran , Van Minh Nguyen