English
Related papers

Related papers: DiJiang: Efficient Large Language Models through C…

200 papers

In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approaches -- models attaining sub-quadratic attention complexity can utilize a notion of sparsity or a low-rank approximation of inputs to reduce…

Machine Learning · Computer Science 2022-11-09 Uladzislau Yorsh , Alexander Kovalenko

Large language models can be quantized to reduce inference time latency, model size, and energy consumption, thereby delivering a better user experience at lower cost. A challenge exists to deliver quantized models with minimal loss of…

Machine Learning · Computer Science 2025-07-24 Steven K. Esser , Jeffrey L. McKinstry , Deepika Bablani , Rathinakumar Appuswamy , Dharmendra S. Modha

Diffusion Transformers (DiT) have become a leading architecture in image generation. However, the quadratic complexity of attention mechanisms, which are responsible for modeling token-wise relationships, results in significant latency when…

Computer Vision and Pattern Recognition · Computer Science 2024-12-23 Songhua Liu , Zhenxiong Tan , Xinchao Wang

Pre-trained language models (e.g., BERT (Devlin et al., 2018) and its variants) have achieved remarkable success in varieties of NLP tasks. However, these models usually consist of hundreds of millions of parameters which brings challenges…

Computation and Language · Computer Science 2020-04-07 Wenhui Wang , Furu Wei , Li Dong , Hangbo Bao , Nan Yang , Ming Zhou

Transformer-based large language models (LLMs) exhibit impressive performance in generative tasks but also introduce significant challenges in real-world serving due to inefficient use of the expensive, computation-optimized accelerators.…

Machine Learning · Computer Science 2025-04-11 Shaoyuan Chen , Wencong Xiao , Yutong Lin , Mingxing Zhang , Yingdi Shan , Jinlei Jiang , Kang Chen , Yongwei Wu

Fine-tuned transformer models have shown superior performances in many natural language tasks. However, the large model size prohibits deploying high-performance transformer models on resource-constrained devices. This paper proposes a…

Computation and Language · Computer Science 2024-10-01 Zi Yang , Samridhi Choudhary , Siegfried Kunzmann , Zheng Zhang

The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has proven to be effective after pre-training and during fine-tuning, applying quantization in…

Machine Learning · Computer Science 2024-10-14 Kamran Chitsaz , Quentin Fournier , Gonçalo Mordido , Sarath Chandar

Knowledge distillation offers a transformative pathway to developing powerful, yet efficient, small language models (SLMs) suitable for resource-constrained environments. In this paper, we benchmark the performance and computational cost of…

Computation and Language · Computer Science 2026-02-25 Sachin Gopal Wani , Eric Page , Ajay Dholakia , David Ellison

Purely character-based language models (LMs) have been lagging in quality on large scale datasets, and current state-of-the-art LMs rely on word tokenization. It has been assumed that injecting the prior knowledge of a tokenizer into the…

Computation and Language · Computer Science 2019-08-28 Dokook Choe , Rami Al-Rfou , Mandy Guo , Heeyoung Lee , Noah Constant

Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits. We find that these methods break down at lower bit precision, and investigate quantization…

Computation and Language · Computer Science 2023-05-30 Zechun Liu , Barlas Oguz , Changsheng Zhao , Ernie Chang , Pierre Stock , Yashar Mehdad , Yangyang Shi , Raghuraman Krishnamoorthi , Vikas Chandra

This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for the first time how to train a large-scale binary language model from scratch (not the partial binary or ternary LLM like BitNet b1.58) to match the…

Computation and Language · Computer Science 2024-07-10 Liqun Ma , Mingjie Sun , Zhiqiang Shen

The rapid evolution of Large Language Models (LLMs), epitomized by architectures like GPT-4, has reshaped the landscape of natural language processing. This paper introduces a pioneering approach to address the efficiency concerns…

The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce…

Computation and Language · Computer Science 2024-02-19 Dayou Du , Yijia Zhang , Shijie Cao , Jiaqi Guo , Ting Cao , Xiaowen Chu , Ningyi Xu

Although transformer architectures have achieved state-of-the-art performance across diverse domains, their quadratic computational complexity with respect to sequence length remains a significant bottleneck, particularly for…

Computation and Language · Computer Science 2025-11-05 Zeyu Liu , Souvik Kundu , Lianghao Jiang , Anni Li , Srikanth Ronanki , Sravan Bodapati , Gourav Datta , Peter A. Beerel

This work proposes kernel transform learning. The idea of dictionary learning is well known; it is a synthesis formulation where a basis is learnt along with the coefficients so as to generate or synthesize the data. Transform learning is…

Computer Vision and Pattern Recognition · Computer Science 2020-08-10 Jyoti Maggu , Angshul Majumdar

Deploying large language models (LLMs) in resource-constrained environments is hindered by heavy computational and memory requirements. We present LBLLM, a lightweight binarization framework that achieves effective W(1+1)A4 quantization…

Machine Learning · Computer Science 2026-04-22 Siqing Song , Chuang Wang , Yong Lang , Yi Yang , Xu-Yao Zhang

Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the…

Machine Learning · Computer Science 2019-11-13 Yao-Hung Hubert Tsai , Shaojie Bai , Makoto Yamada , Louis-Philippe Morency , Ruslan Salakhutdinov

Effective pre-training of large language models (LLMs) has been challenging due to the immense resource demands and the complexity of the technical processes involved. This paper presents a detailed technical report on YuLan-Mini, a highly…

Computation and Language · Computer Science 2024-12-25 Yiwen Hu , Huatong Song , Jia Deng , Jiapeng Wang , Jie Chen , Kun Zhou , Yutao Zhu , Jinhao Jiang , Zican Dong , Wayne Xin Zhao , Ji-Rong Wen

Current LLM structured pruning methods typically involve two steps: (1) compression with calibration data and (2) costly continued pretraining on billions of tokens to recover lost performance. This second step is necessary as the first…

Machine Learning · Computer Science 2024-12-31 Yaya Sy , Christophe Cerisara , Irina Illina

Diffusion Language Models (DLMs) offer a promising parallel generation paradigm but suffer from slow inference due to numerous refinement steps and the inability to use standard KV caching. We introduce CDLM (Consistency Diffusion Language…

Machine Learning · Computer Science 2026-02-23 Minseo Kim , Chenfeng Xu , Coleman Hooper , Harman Singh , Ben Athiwaratkun , Ce Zhang , Kurt Keutzer , Amir Gholami
‹ Prev 1 2 3 10 Next ›