English
Related papers

Related papers: LPCD: Unified Framework from Layer-Wise to Submodu…

200 papers

A natural and intuitive idea in model quantization is to approximate each component's quantized output to match its original. Motivated by this idea, most layer-wise post-training quantization (PTQ) methods focus on weight approximation at…

Machine Learning · Computer Science 2026-01-28 Li Lin , Xiaojun Wan

Layer-wise PTQ is a promising technique for compressing large language models (LLMs), due to its simplicity and effectiveness without requiring retraining. However, recent progress in this area is saturating, underscoring the need to…

Machine Learning · Computer Science 2026-01-14 Yamato Arai , Yuma Ichikawa

Network quantization has gained increasing attention with the rapid growth of large pre-trained language models~(PLMs). However, most existing quantization methods for PLMs follow quantization-aware training~(QAT) that requires end-to-end…

Computation and Language · Computer Science 2021-10-01 Haoli Bai , Lu Hou , Lifeng Shang , Xin Jiang , Irwin King , Michael R. Lyu

Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression…

Machine Learning · Computer Science 2026-05-18 Dung Anh Hoang , Cuong Pham , Cuong Nguyen , Trung le , Jianfei Cai , Thanh-Toan Do

Due to highly constrained computing power and memory, deploying 3D lidar-based detectors on edge devices equipped in autonomous vehicles and robots poses a crucial challenge. Being a convenient and straightforward model compression…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Sifan Zhou , Liang Li , Xinyu Zhang , Bo Zhang , Shipeng Bai , Miao Sun , Ziyu Zhao , Xiaobo Lu , Xiangxiang Chu

This paper introduces a post-training quantization~(PTQ) method achieving highly efficient Convolutional Neural Network~ (CNN) quantization with high performance. Previous PTQ methods usually reduce compression error via performing…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Chen Lin , Zheyang Li , Bo Peng , Haoji Hu , Wenming Tan , Ye Ren , Shiliang Pu

With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing…

Machine Learning · Statistics 2023-12-04 Kayhan Behdin , Ayan Acharya , Aman Gupta , Qingquan Song , Siyu Zhu , Sathiya Keerthi , Rahul Mazumder

Low-bit post-training quantization (PTQ) is a pivotal technique for deploying Vision-Language Models (VLMs) on resource-constrained devices. However, existing PTQ methods often degrade VLMs' accuracy due to the heterogeneous activation…

Computer Vision and Pattern Recognition · Computer Science 2026-05-20 Yi Zhong , Haotong Qin , Xindong Zhang , Lei Zhang , Guolei Sun

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Large language models (LLMs) have shown remarkable performance in various domains, but they are constrained by massive computational and storage costs. Quantization, an effective technique for compressing models to fit resource-limited…

Computation and Language · Computer Science 2026-04-14 Han Liu , Haotian Gao , Xiaotong Zhang , Changya Li , Feng Zhang , Wei Wang , Fenglong Ma , Hong Yu

Latent Diffusion Models (LDMs) capture the dynamic evolution of latent variables over time, blending patterns and multimodality in a generative system. Despite the proficiency of LDM in various applications, such as text-to-image…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Yuewei Yang , Xiaoliang Dai , Jialiang Wang , Peizhao Zhang , Hongbo Zhang

Post-training quantization (PTQ) is a primary approach for deploying large language models without fine-tuning, and the quantized performance is often strongly affected by the calibration in PTQ. By contrast, in vision-language models…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Zhenhao Shang , Haizhao Jing , Guoting Wei , Haokui Zhang , Rong Xiao , Jianqing Gao , Peng Wang

Post-training quantization (PTQ) enables efficient deployment of large language models by mapping pretrained weights to low-bit formats without retraining, typically using a small calibration set to minimize a layer-wise calibration…

Machine Learning · Computer Science 2026-05-12 Seohyeon Cha , Huancheng Chen , Dongjun Kim , Haoran Zhang , Kevin Chan , Gustavo de Veciana , Haris Vikalo

In this paper, we address post-training quantization (PTQ) for large language models (LLMs) from an overlooked perspective: given a pre-trained high-precision LLM, the predominant sequential quantization framework treats different layers…

Artificial Intelligence · Computer Science 2026-03-27 Shigeng Wang , Chao Li , Yangyuxuan Kang , Jiawei Fan , Zhonghong Ou , Anbang Yao

In spite of the great potential of large language models (LLMs) across various tasks, their deployment on resource-constrained devices remains challenging due to their excessive computational and memory demands. Quantization has emerged as…

Machine Learning · Computer Science 2025-02-28 Hao Mark Chen , Fuwen Tan , Alexandros Kouris , Royson Lee , Hongxiang Fan , Stylianos I. Venieris

Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction…

Artificial Intelligence · Computer Science 2026-05-11 Yanlong Zhao , Xiaoyuan Cheng , Huihang Liu , Baihua He , Xinyu Zhang , Harrison Bo Hua Zhu , Wenlong Chen , Li Zeng , Zhuo Sun

Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Yefei He , Luping Liu , Jing Liu , Weijia Wu , Hong Zhou , Bohan Zhuang

Post-training quantization (PTQ) is a widely used method to compress large language models (LLMs) without fine-tuning. It typically sets quantization hyperparameters (e.g., scaling factors) based on current-layer activations. Although this…

Machine Learning · Computer Science 2026-02-04 Zheqi Lv , Zhenxuan Fan , Qi Tian , Wenqiao Zhang , Yueting Zhuang

Diffusionmodels(DMs)havedemonstratedremarkableachievements in synthesizing images of high fidelity and diversity. However, the extensive computational requirements and slow generative speed of diffusion models have limited their widespread…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Jiaojiao Ye , Zhen Wang , Linnan Jiang

Methods based on weight compensation, which iteratively apply quantization and weight compensation to minimize the output error, have recently demonstrated remarkable success in quantizing Large Language Models (LLMs). The representative…

Machine Learning · Computer Science 2026-04-10 Shuaiting Li , Juncan Deng , Kedong Xu , Rongtao Deng , Hong Gu , Minghan Jiang , Haibin Shen , Kejie Huang
‹ Prev 1 2 3 10 Next ›