Related papers: SLaB: Sparse-Lowrank-Binary Decomposition for Effi…

SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either…

Computation and Language · Computer Science 2026-04-07 Xinhao Huang , You-Liang Huang , Zeyi Wen

Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications

Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as…

Machine Learning · Computer Science 2025-12-22 Yang Li , Daniel Agyei Asante , Changsheng Zhao , Ernie Chang , Yangyang Shi , Vikas Chandra

Large Language Model Compression with Global Rank and Sparsity Optimization

Low-rank and sparse composite approximation is a natural idea to compress Large Language Models (LLMs). However, such an idea faces two primary challenges that adversely affect the performance of existing methods. The first challenge…

Machine Learning · Computer Science 2026-02-27 Changhai Zhou , Qian Qiao , Yuhua Zhou , Yuxin Wu , Shichao Weng , Weizhong Zhang , Cheng Jin

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks. We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs that…

Computation and Language · Computer Science 2024-05-07 Abhinav Agarwalla , Abhay Gupta , Alexandre Marques , Shubhra Pandit , Michael Goin , Eldar Kurtic , Kevin Leong , Tuan Nguyen , Mahmoud Salem , Dan Alistarh , Sean Lie , Mark Kurtz

Characterizing the Accuracy -- Efficiency Trade-off of Low-rank Decomposition in Language Models

Recent large language models (LLMs) employ billions of parameters to enable broad problem-solving capabilities. Such language models also tend to be memory-bound because of the dominance of matrix-vector and matrix-matrix multiplications…

Machine Learning · Computer Science 2024-10-24 Chakshu Moar , Faraz Tahmasebi , Michael Pellauer , Hyoukjun Kwon

1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models

Large Language Models (LLMs) have demonstrated remarkable proficiency in language comprehension and generation; however, their widespread adoption is constrained by substantial bandwidth and computational demands. While pruning and low-rank…

Computation and Language · Computer Science 2025-10-31 Zeliang Zong , Kai Zhang , Zheyang Li , Wenming Tan , Ye Ren , Yiyan Zhai , Jilin Hu

FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

Large Language Models (LLMs) have enabled remarkable progress in natural language processing, yet their high computational and memory demands pose challenges for deployment in resource-constrained environments. Although recent low-rank…

Computation and Language · Computer Science 2026-02-09 Jiayi Tian , Ryan Solgi , Jinming Lu , Yifan Yang , Hai Li , Zheng Zhang

Adaptive Pruning for Large Language Models with Structural Importance Awareness

The recent advancements in large language models (LLMs) have significantly improved language understanding and generation capabilities. However, it is difficult to deploy LLMs on resource-constrained edge devices due to their high…

Computation and Language · Computer Science 2024-12-20 Haotian Zheng , Jinke Ren , Yushan Sun , Ruichen Zhang , Wenbo Zhang , Zhen Li , Dusit Niyato , Shuguang Cui , Yatong Han

CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression

Large Language Models (LLMs) present significant deployment challenges due to their immense size and computational requirements. Model compression techniques are essential for making these models practical for resource-constrained…

Machine Learning · Computer Science 2025-08-27 Muchammad Daniyal Kautsar , Afra Majida Hariono , Widyawan , Syukron Abu Ishaq Alfarozi , Kuntpong Woraratpanya

Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank…

Computation and Language · Computer Science 2025-02-25 Yixin Ji , Yang Xiang , Juntao Li , Qingrong Xia , Zi Ye , Xinyu Duan , Zhefeng Wang , Kehai Chen , Min Zhang

Sparsity-Aware Low-Rank Representation for Efficient Fine-Tuning of Large Language Models

Adapting large pre-trained language models to downstream tasks often entails fine-tuning millions of parameters or deploying costly dense weight updates, which hinders their use in resource-constrained environments. Low-rank Adaptation…

Machine Learning · Computer Science 2026-01-29 Longteng Zhang , Sen Wu , Shuai Hou , Zhengyu Qing , Zhuo Zheng , Danning Ke , Qihong Lin , Qiang Wang , Shaohuai Shi , Xiaowen Chu

TRAWL: Tensor Reduced and Approximated Weights for Large Language Models

Recent research has shown that pruning large-scale language models for inference is an effective approach to improving model efficiency, significantly reducing model weights with minimal impact on performance. Interestingly, pruning can…

Computation and Language · Computer Science 2025-02-19 Yiran Luo , Het Patel , Yu Fu , Dawon Ahn , Jia Chen , Yue Dong , Evangelos E. Papalexakis

LLM-Pruner: On the Structural Pruning of Large Language Models

Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges in both the…

Computation and Language · Computer Science 2023-09-29 Xinyin Ma , Gongfan Fang , Xinchao Wang

SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression

Large Language Models (LLMs) face a significant bottleneck during autoregressive inference due to the massive memory footprint of the Key-Value (KV) cache. Existing compression techniques like token eviction, quantization, or other low-rank…

Machine Learning · Computer Science 2025-11-25 Santhosh G S , Saurav Prakash , Balaraman Ravindran

SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining

Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank…

Machine Learning · Computer Science 2024-11-05 Andi Han , Jiaxiang Li , Wei Huang , Mingyi Hong , Akiko Takeda , Pratik Jawanpuria , Bamdev Mishra

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have…

Computation and Language · Computer Science 2024-12-19 Weiyu Huang , Yuezhou Hu , Guohao Jian , Jun Zhu , Jianfei Chen

LOST: Low-rank and Sparse Pre-training for Large Language Models

While large language models (LLMs) have achieved remarkable performance across a wide range of tasks, their massive scale incurs prohibitive computational and memory costs for pre-training from scratch. Recent studies have investigated the…

Machine Learning · Computer Science 2025-08-05 Jiaxi Li , Lu Yin , Li Shen , Jinjin Xu , Liwu Xu , Tianjin Huang , Wenwu Wang , Shiwei Liu , Xilu Wang

Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges,…

Machine Learning · Computer Science 2023-12-13 Arnav Chavan , Nahush Lele , Deepak Gupta

SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

The deployment of large language models (LLMs) is often constrained by their substantial computational and memory demands. While structured pruning presents a viable approach by eliminating entire network components, existing methods suffer…

Machine Learning · Computer Science 2025-05-07 Hanyu Hu , Xiaoming Yuan

SparseLLM: Towards Global Pruning for Pre-trained Language Models

The transformative impact of large language models (LLMs) like LLaMA and GPT on natural language processing is countered by their prohibitive computational demands. Pruning has emerged as a pivotal compression strategy, introducing sparsity…

Computation and Language · Computer Science 2024-11-04 Guangji Bai , Yijiang Li , Chen Ling , Kibaek Kim , Liang Zhao