Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding

Hanling Yi; Feng Lin; Hongbin Li; Peiyang Ning; Xiaotian Yu; Rong Xiao

Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding

Computation and Language 2024-05-21 v3 Artificial Intelligence Machine Learning

Authors: Hanling Yi , Feng Lin , Hongbin Li , Peiyang Ning , Xiaotian Yu , Rong Xiao

Abstract

This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables autoregressive LLMs to parallelize token generation and verification. This is realized through a specialized semi-autoregressive supervised fine-tuning process that equips existing LLMs with the ability to simultaneously predict multiple tokens. Additionally, an auto-correct decoding algorithm facilitates the simultaneous generation and verification of token sequences within a single model invocation. Through extensive experiments on a range of LLMs, SPACE has demonstrated inference speedup ranging from 2.7x-4.0x on HumanEval-X while maintaining output quality.

Keywords

speculative decoding tokenization text generation

Cite

@article{arxiv.2402.11809,
  title  = {Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding},
  author = {Hanling Yi and Feng Lin and Hongbin Li and Peiyang Ning and Xiaotian Yu and Rong Xiao},
  journal= {arXiv preprint arXiv:2402.11809},
  year   = {2024}
}

Comments

Accepted by ACL 2024 Findings

Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding

Abstract

Keywords

Cite

Comments

Related papers