Related papers: Flexible and Efficient Grammar-Constrained Decodin…

Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning

Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required output format exactly. To address this issue, grammar-constrained…

Computation and Language · Computer Science 2024-01-19 Saibo Geng , Martin Josifoski , Maxime Peyrard , Robert West

Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models

Large Language Models (LLMs) have demonstrated impressive performance on multiple-choice question answering (MCQA) benchmarks, yet they remain highly vulnerable to minor input perturbations. In this paper, we introduce and evaluate Token…

Computation and Language · Computer Science 2025-06-12 Jui-Ming Yao , Hao-Yuan Chen , Zi-Xian Tang , Bing-Jia Tan , Sheng-Wei Peng , Bing-Cheng Xie , Shun-Feng Su

Grammar-Aligned Decoding

Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what…

Artificial Intelligence · Computer Science 2025-12-15 Kanghee Park , Jiayu Wang , Taylor Berg-Kirkpatrick , Nadia Polikarpova , Loris D'Antoni

Draft-Conditioned Constrained Decoding for Structured Generation in LLMs

Large language models (LLMs) are increasingly used to generate executable outputs, JSON objects, and API calls, where a single syntax error can make the output unusable. Constrained decoding enforces validity token-by-token via masking and…

Computation and Language · Computer Science 2026-03-05 Avinash Reddy , Thayne T. Walker , James S. Ide , Amrit Singh Bedi

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding proposes to enforce strict formal language constraints during generation. However, as we show in this work, not only do such…

Machine Learning · Computer Science 2024-03-13 Luca Beurer-Kellner , Marc Fischer , Martin Vechev

Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access

Constrained decoding, a technique for enforcing constraints on language model outputs, offers a way to control text generation without retraining or architectural modifications. Its application is, however, typically restricted to models…

Computation and Language · Computer Science 2024-07-23 Saibo Geng , Berkay Döner , Chris Wendler , Martin Josifoski , Robert West

Accelerating Constrained Decoding with Token Space Compression

To guarantee that an LLM's outputs conform to a specified structure, context-free grammar (CFG) decoding engines force the selection of next tokens that produce strings that conform to a given CFG. While current CFG-constrained decoding…

Artificial Intelligence · Computer Science 2026-05-29 Michael Sullivan , Alexander Koller

Large Language Model Meets Constraint Propagation

Large Language Models (LLMs) excel at generating fluent text but struggle to enforce external constraints because they generate tokens sequentially without explicit control mechanisms. GenCP addresses this limitation by combining LLM…

Computation and Language · Computer Science 2025-06-02 Alexandre Bonlarron , Florian Régin , Elisabetta De Maria , Jean-Charles Régin

WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding

Structured decoding enables large language models (LLMs) to generate outputs in formats required by downstream systems, such as HTML or JSON. However, existing methods suffer from efficiency bottlenecks due to grammar compilation, state…

Artificial Intelligence · Computer Science 2025-07-23 Ran Wang , Xiaoxuan Liu , Hao Ren , Gang Chen , Fanchao Qi , Maosong Sun

XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models

The applications of LLM Agents are becoming increasingly complex and diverse, leading to a high demand for structured outputs that can be parsed into code, structured function calls, and embodied agent commands. These developments bring…

Computation and Language · Computer Science 2025-05-13 Yixin Dong , Charlie F. Ruan , Yaxing Cai , Ruihang Lai , Ziyi Xu , Yilong Zhao , Tianqi Chen

SynCode: LLM Generation with Grammar Augmentation

LLMs are widely used in complex AI applications. These applications underscore the need for LLM outputs to adhere to a specific format, for their integration with other components in the systems. Typically the format rules e.g., for data…

Machine Learning · Computer Science 2024-11-07 Shubham Ugare , Tarun Suresh , Hangoo Kang , Sasa Misailovic , Gagandeep Singh

Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding

We study grammar-constrained decoding (GCD) as a coupling between an autoregressive next-token distribution and a reachability oracle over a pushdown system compiled from a context-free grammar (CFG). We prove an oracle invariance theorem:…

Computation and Language · Computer Science 2026-03-09 Faruk Alpay , Bilge Senturk

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is…

Computation and Language · Computer Science 2025-08-19 Benjamin Lipkin , Benjamin LeBrun , Jacob Hoover Vigly , João Loula , David R. MacIver , Li Du , Jason Eisner , Ryan Cotterell , Vikash Mansinghka , Timothy J. O'Donnell , Alexander K. Lew , Tim Vieira

A Universal List Decoding Algorithm with Application to Decoding of Polar Codes

This paper is concerned with a guessing codeword decoding (GCD) of linear block codes. Compared with the guessing noise decoding (GND), which is only efficient for high-rate codes, the GCD is efficient for not only high-rate codes but also…

Information Theory · Computer Science 2024-05-07 Xiangping Zheng , Xiao Ma

Graph-Structured Speculative Decoding

Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models (LLMs) by employing a small language model to draft a hypothesis sequence, which is then validated by the LLM. The effectiveness…

Computation and Language · Computer Science 2024-07-24 Zhuocheng Gong , Jiahao Liu , Ziyue Wang , Pengfei Wu , Jingang Wang , Xunliang Cai , Dongyan Zhao , Rui Yan

Constrained Decoding of Diffusion LLMs with Context-Free Grammars

Large language models (LLMs) have shown promising performance across diverse domains. Many practical applications of LLMs, such as code completion and structured data extraction, require adherence to syntactic constraints specified by a…

Machine Learning · Computer Science 2025-08-18 Niels Mündler , Jasper Dekoninck , Martin Vechev

Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation

Large language models (LLMs) often experience language confusion, which is the unintended mixing of languages during text generation. Current solutions to this problem either necessitate model retraining or cannot differentiate between…

Computation and Language · Computer Science 2025-10-21 Collin Zhang , Fei Huang , Chenhan Yuan , Junyang Lin

Adaptive Draft-Verification for Efficient Large Language Model Decoding

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires…

Computation and Language · Computer Science 2024-08-20 Xukun Liu , Bowen Lei , Ruqi Zhang , Dongkuan Xu

Deferred Commitment Decoding for Diffusion Language Models

Diffusion language models (DLMs) have recently emerged as a strong alternative to autoregressive models by enabling parallel text generation. To improve inference efficiency and KV-cache compatibility, prior work commonly adopts block-based…

Computation and Language · Computer Science 2026-01-21 Yingte Shu , Yuchuan Tian , Chao Xu , Yunhe Wang , Hanting Chen

CSV-Decode: Certifiable Sub-Vocabulary Decoding for Efficient Large Language Model Inference

Large language models face significant computational bottlenecks during inference due to the expensive output layer computation over large vocabularies. We present CSV-Decode, a novel approach that uses geometric upper bounds to construct…

Computation and Language · Computer Science 2025-12-01 Dong Liu , Yanxuan Yu , Ben Lengerich