English

Progressive Binarization with Semi-Structured Pruning for LLMs

Machine Learning 2025-09-30 v4

Abstract

Large language models (LLMs) have achieved remarkable progress in natural language processing, but their high computational and memory costs hinder deployment on resource-constrained devices. Binarization represents the most extreme form of quantization, yet binarized models still contain redundancy that can be further removed. Pruning provides a natural way to eliminate such redundancy, but na\"ive combination with binarization often results in severe performance degradation. In this paper, we propose Progressive Binarization with Semi-Structured Pruning (PBS2^2P), a novel post-training framework that seamlessly integrates binarization and semi-structured pruning. We first propose Stepwise semi-structured Pruning with Binarization Optimization (SPBO), which progressively introduces sparsity while optimizing binarization parameters to jointly reduce pruning and quantization error, yielding more stable and accurate compression. Additionally, we propose a Coarse-to-Fine Search (CFS) that first allocates pruning ratios and then refines element selection, further enhancing overall performance. Extensive experiments across multiple LLM families show that PBS2^2P consistently outperforms state-of-the-art (SOTA) binary post-training quantization methods in both perplexity and downstream accuracy. The code and models will be available at https://github.com/XIANGLONGYAN/PBS2P.

Keywords

Cite

@article{arxiv.2502.01705,
  title  = {Progressive Binarization with Semi-Structured Pruning for LLMs},
  author = {Xianglong Yan and Tianao Zhang and Zhiteng Li and Haotong Qin and Yulun Zhang},
  journal= {arXiv preprint arXiv:2502.01705},
  year   = {2025}
}
R2 v1 2026-06-28T21:31:08.752Z