APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

Yonghao Tan; Pingcheng Dong; Yongkun Wu; Yu Liu; Xuejiao Liu; Peng Luo; Shih-Yang Liu; Xijie Huang; Dong Zhang; Luhong Liang; Kwang-Ting Cheng

APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

Hardware Architecture 2025-05-08 v1 Artificial Intelligence

Authors: Yonghao Tan , Pingcheng Dong , Yongkun Wu , Yu Liu , Xuejiao Liu , Peng Luo , Shih-Yang Liu , Xijie Huang , Dong Zhang , Luhong Liang , Kwang-Ting Cheng

View on arXiv ↗ PDF ↗

Abstract

DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of high-precision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for 69% of power consumption. This study introduces a novel Additive Partial Sum Quantization (APSQ) method, seamlessly integrating PSUM accumulation into the quantization framework. A grouping strategy that combines APSQ with PSUM quantization enhanced by a reconfigurable architecture is further proposed. The APSQ performs nearly lossless on NLP and CV tasks across BERT, Segformer, and EfficientViT models while compressing PSUMs to INT8. This leads to a notable reduction in energy costs by 28-87%. Extended experiments on LLaMA2-7B demonstrate the potential of APSQ for large language models. Code is available at https://github.com/Yonghao-Tan/APSQ.

Keywords

quantization processing-in-memory optimization

Cite

@article{arxiv.2505.03748,
  title  = {APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design},
  author = {Yonghao Tan and Pingcheng Dong and Yongkun Wu and Yu Liu and Xuejiao Liu and Peng Luo and Shih-Yang Liu and Xijie Huang and Dong Zhang and Luhong Liang and Kwang-Ting Cheng},
  journal= {arXiv preprint arXiv:2505.03748},
  year   = {2025}
}

Comments

62nd ACM/IEEE Design Automation Conference (DAC) 2025

APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

Abstract

Keywords

Cite

Comments

Related papers