Flexible and Efficient Grammar-Constrained Decoding

Kanghee Park; Timothy Zhou; Loris D'Antoni

Flexible and Efficient Grammar-Constrained Decoding

Computation and Language 2025-07-17 v2 Artificial Intelligence

Authors: Kanghee Park , Timothy Zhou , Loris D'Antoni

Abstract

Large Language Models (LLMs) are often asked to generate structured outputs that obey precise syntactic rules, such as code snippets or formatted data. Grammar-constrained decoding (GCD) can guarantee that LLM outputs matches such rules by masking out tokens that will provably lead to outputs that do not belong to a specified context-free grammar (CFG). To guarantee soundness, GCD algorithms have to compute how a given LLM subword tokenizer can align with the tokens used by a given context-free grammar and compute token masks based on this information. Doing so efficiently is challenging and existing GCD algorithms require tens of minutes to preprocess common grammars. We present a new GCD algorithm together with an implementation that offers 17.71x faster offline preprocessing than existing approaches while preserving state-of-the-art efficiency in online mask computation.

Keywords

tokenization code generation large language model

Cite

@article{arxiv.2502.05111,
  title  = {Flexible and Efficient Grammar-Constrained Decoding},
  author = {Kanghee Park and Timothy Zhou and Loris D'Antoni},
  journal= {arXiv preprint arXiv:2502.05111},
  year   = {2025}
}

Flexible and Efficient Grammar-Constrained Decoding

Abstract

Keywords

Cite

Related papers