English
Related papers

Related papers: A general tensor-structured compression scheme for…

200 papers

Modern foundation models such as large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources. We propose a new framework to convert such LLMs/LMMs into a reduced-dimension…

Machine Learning · Computer Science 2025-05-27 Toshiaki Koike-Akino , Xiangyu Chen , Jing Liu , Ye Wang , Pu , Wang , Matthew Brand

The development of large language models (LLMs) has expanded to multi-modal systems capable of processing text, images, and speech within a unified framework. Training these models demands significantly larger datasets and computational…

Computation and Language · Computer Science 2025-05-09 Weixin Liang , Lili Yu , Liang Luo , Srinivasan Iyer , Ning Dong , Chunting Zhou , Gargi Ghosh , Mike Lewis , Wen-tau Yih , Luke Zettlemoyer , Xi Victoria Lin

Large language models (LLMs) are both storage-intensive and computation-intensive, posing significant challenges when deployed on resource-constrained hardware. As linear layers in LLMs are mainly resource consuming parts, this paper…

Hardware Architecture · Computer Science 2025-02-03 Sixiao Huang , Tintin Wang , Ang Li , Ao Shen , Kai Li , Keyao Jiang , Mingqiang Huang , Hao Yu

The embedding layers transforming input words into real vectors are the key components of deep neural networks used in natural language processing. However, when the vocabulary is large, the corresponding weight matrices can be enormous,…

Computation and Language · Computer Science 2020-02-20 Oleksii Hrinchuk , Valentin Khrulkov , Leyla Mirvakhabova , Elena Orlova , Ivan Oseledets

Large language models (LLMs) demonstrate impressive results in natural language processing tasks but require a significant amount of computational and memory resources. Structured matrix representations are a promising way for reducing the…

Computation and Language · Computer Science 2025-06-04 Ekaterina Grishina , Mikhail Gorbunov , Maxim Rakhuba

Large language models deliver strong generative performance but at the cost of massive parameter counts, memory use, and decoding latency. Prior work has shown that pruning and structured sparsity can preserve accuracy under substantial…

Computation and Language · Computer Science 2026-04-17 Andrew Kiruluta

During the training of Large Language Models (LLMs), tensor data is periodically "checkpointed" to persistent storage to allow recovery of work done in the event of failure. The volume of data that must be copied during each checkpoint,…

Machine Learning · Computer Science 2025-05-16 Daniel Waddington , Cornel Constantinescu

The high computational demands of Large Language Models (LLMs) motivate methods that reduce parameter count and accelerate inference. In response, model pruning emerges as an effective strategy, yet current methods typically focus on a…

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems…

Computation and Language · Computer Science 2025-06-03 Yingfeng Luo , Tong Zheng , Yongyu Mu , Bei Li , Qinghong Zhang , Yongqi Gao , Ziqiang Xu , Peinan Feng , Xiaoqian Liu , Tong Xiao , Jingbo Zhu

Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices…

Machine Learning · Computer Science 2025-05-05 Chi-Heng Lin , Shangqian Gao , James Seale Smith , Abhishek Patel , Shikhar Tuli , Yilin Shen , Hongxia Jin , Yen-Chang Hsu

Training Large Language Models (LLMs) typically involves a two-stage pipeline at the output layer: hidden states are projected into vocabulary logits via a linear transformation (lm_head), followed by cross-entropy loss computation against…

Machine Learning · Computer Science 2025-11-25 Jianbing Dong , Jianbin Chang

Recent work explored the potential of large-scale Transformer-based pre-trained models, especially Pre-trained Language Models (PLMs) in natural language processing. This raises many concerns from various perspectives, e.g., financial costs…

Computation and Language · Computer Science 2022-05-23 Yuxin Ren , Benyou Wang , Lifeng Shang , Xin Jiang , Qun Liu

Large Language Models are growing in size, and we expect them to continue to do so, as larger models train quicker. However, this increase in size will severely impact inference costs. Therefore model compression is important, to retain the…

Machine Learning · Computer Science 2024-04-10 Georgy Tyukin

Mixture-of-Experts large language models (MoE-LLMs) marks a significant step forward of language models, however, they encounter two critical challenges in practice: 1) expert parameters lead to considerable memory consumption and loading…

Machine Learning · Computer Science 2025-02-25 Wei Huang , Yue Liao , Jianhui Liu , Ruifei He , Haoru Tan , Shiming Zhang , Hongsheng Li , Si Liu , Xiaojuan Qi

Large language models (LLMs) achieve remarkable performance through ever-increasing parameter counts, but scaling incurs steep computational costs. To better understand LLM scaling, we study representational differences between LLMs and…

Large Language Models (LLMs) are predominantly deployed as dense transformers, where every parameter in every feed-forward block is activated for every token. While architecturally simple, this is computationally inefficient, since…

Machine Learning · Computer Science 2025-11-27 Ivan Novikov

Large Language Models (LLMs) such as ChatGPT and LlaMA are advancing rapidly in generative Artificial Intelligence (AI), but their immense size poses significant challenges, such as huge training and inference costs, substantial energy…

We propose a method to enhance the performance of Large Language Models (LLMs) by integrating quantum computing and quantum-inspired techniques. Specifically, our approach involves replacing the weight matrices in the Self-Attention and…

Quantum Physics · Physics 2024-10-24 Borja Aizpurua , Saeed S. Jahromi , Sukhbinder Singh , Roman Orus

In recent years, Large Language Models (LLMs) through Transformer structures have dominated many machine learning tasks, especially text processing. However, these models require massive amounts of data for training and induce high resource…

Machine Learning · Computer Science 2025-04-17 Kilian Pfeiffer , Mohamed Aboelenien Ahmed , Ramin Khalili , Jörg Henkel

Mixture-of-Experts (MoE) Large Language Models (LLMs) face a trilemma of load imbalance, parameter redundancy, and communication overhead. We introduce a unified framework based on dynamic expert clustering and structured compression to…

Computation and Language · Computer Science 2026-02-06 Peijun Zhu , Ning Yang , Baoliang Tian , Jiayu Wei , Weihao Zhang , Haijun Zhang , Pin Lv
‹ Prev 1 2 3 10 Next ›