English
Related papers

Related papers: SLaB: Sparse-Lowrank-Binary Decomposition for Effi…

200 papers

Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either…

Computation and Language · Computer Science 2026-04-07 Xinhao Huang , You-Liang Huang , Zeyi Wen

Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as…

Machine Learning · Computer Science 2025-12-22 Yang Li , Daniel Agyei Asante , Changsheng Zhao , Ernie Chang , Yangyang Shi , Vikas Chandra

Low-rank and sparse composite approximation is a natural idea to compress Large Language Models (LLMs). However, such an idea faces two primary challenges that adversely affect the performance of existing methods. The first challenge…

Machine Learning · Computer Science 2026-02-27 Changhai Zhou , Qian Qiao , Yuhua Zhou , Yuxin Wu , Shichao Weng , Weizhong Zhang , Cheng Jin

Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks. We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs that…

Recent large language models (LLMs) employ billions of parameters to enable broad problem-solving capabilities. Such language models also tend to be memory-bound because of the dominance of matrix-vector and matrix-matrix multiplications…

Machine Learning · Computer Science 2024-10-24 Chakshu Moar , Faraz Tahmasebi , Michael Pellauer , Hyoukjun Kwon

Large Language Models (LLMs) have demonstrated remarkable proficiency in language comprehension and generation; however, their widespread adoption is constrained by substantial bandwidth and computational demands. While pruning and low-rank…

Computation and Language · Computer Science 2025-10-31 Zeliang Zong , Kai Zhang , Zheyang Li , Wenming Tan , Ye Ren , Yiyan Zhai , Jilin Hu

Large Language Models (LLMs) have enabled remarkable progress in natural language processing, yet their high computational and memory demands pose challenges for deployment in resource-constrained environments. Although recent low-rank…

Computation and Language · Computer Science 2026-02-09 Jiayi Tian , Ryan Solgi , Jinming Lu , Yifan Yang , Hai Li , Zheng Zhang

The recent advancements in large language models (LLMs) have significantly improved language understanding and generation capabilities. However, it is difficult to deploy LLMs on resource-constrained edge devices due to their high…

Computation and Language · Computer Science 2024-12-20 Haotian Zheng , Jinke Ren , Yushan Sun , Ruichen Zhang , Wenbo Zhang , Zhen Li , Dusit Niyato , Shuguang Cui , Yatong Han

Large Language Models (LLMs) present significant deployment challenges due to their immense size and computational requirements. Model compression techniques are essential for making these models practical for resource-constrained…

In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank…

Computation and Language · Computer Science 2025-02-25 Yixin Ji , Yang Xiang , Juntao Li , Qingrong Xia , Zi Ye , Xinyu Duan , Zhefeng Wang , Kehai Chen , Min Zhang

Adapting large pre-trained language models to downstream tasks often entails fine-tuning millions of parameters or deploying costly dense weight updates, which hinders their use in resource-constrained environments. Low-rank Adaptation…

Machine Learning · Computer Science 2026-01-29 Longteng Zhang , Sen Wu , Shuai Hou , Zhengyu Qing , Zhuo Zheng , Danning Ke , Qihong Lin , Qiang Wang , Shaohuai Shi , Xiaowen Chu

Recent research has shown that pruning large-scale language models for inference is an effective approach to improving model efficiency, significantly reducing model weights with minimal impact on performance. Interestingly, pruning can…

Computation and Language · Computer Science 2025-02-19 Yiran Luo , Het Patel , Yu Fu , Dawon Ahn , Jia Chen , Yue Dong , Evangelos E. Papalexakis

Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges in both the…

Computation and Language · Computer Science 2023-09-29 Xinyin Ma , Gongfan Fang , Xinchao Wang

Large Language Models (LLMs) face a significant bottleneck during autoregressive inference due to the massive memory footprint of the Key-Value (KV) cache. Existing compression techniques like token eviction, quantization, or other low-rank…

Machine Learning · Computer Science 2025-11-25 Santhosh G S , Saurav Prakash , Balaraman Ravindran

Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank…

Machine Learning · Computer Science 2024-11-05 Andi Han , Jiaxiang Li , Wei Huang , Mingyi Hong , Akiko Takeda , Pratik Jawanpuria , Bamdev Mishra

The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have…

Computation and Language · Computer Science 2024-12-19 Weiyu Huang , Yuezhou Hu , Guohao Jian , Jun Zhu , Jianfei Chen

While large language models (LLMs) have achieved remarkable performance across a wide range of tasks, their massive scale incurs prohibitive computational and memory costs for pre-training from scratch. Recent studies have investigated the…

Machine Learning · Computer Science 2025-08-05 Jiaxi Li , Lu Yin , Li Shen , Jinjin Xu , Liwu Xu , Tianjin Huang , Wenwu Wang , Shiwei Liu , Xilu Wang

Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges,…

Machine Learning · Computer Science 2023-12-13 Arnav Chavan , Nahush Lele , Deepak Gupta

The deployment of large language models (LLMs) is often constrained by their substantial computational and memory demands. While structured pruning presents a viable approach by eliminating entire network components, existing methods suffer…

Machine Learning · Computer Science 2025-05-07 Hanyu Hu , Xiaoming Yuan

The transformative impact of large language models (LLMs) like LLaMA and GPT on natural language processing is countered by their prohibitive computational demands. Pruning has emerged as a pivotal compression strategy, introducing sparsity…

Computation and Language · Computer Science 2024-11-04 Guangji Bai , Yijiang Li , Chen Ling , Kibaek Kim , Liang Zhao
‹ Prev 1 2 3 10 Next ›