Related papers: Draft-Conditioned Constrained Decoding for Structu…

Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning

Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required output format exactly. To address this issue, grammar-constrained…

Computation and Language · Computer Science 2024-01-19 Saibo Geng , Martin Josifoski , Maxime Peyrard , Robert West

Flexible and Efficient Grammar-Constrained Decoding

Large Language Models (LLMs) are often asked to generate structured outputs that obey precise syntactic rules, such as code snippets or formatted data. Grammar-constrained decoding (GCD) can guarantee that LLM outputs matches such rules by…

Computation and Language · Computer Science 2025-07-17 Kanghee Park , Timothy Zhou , Loris D'Antoni

Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access

Constrained decoding, a technique for enforcing constraints on language model outputs, offers a way to control text generation without retraining or architectural modifications. Its application is, however, typically restricted to models…

Computation and Language · Computer Science 2024-07-23 Saibo Geng , Berkay Döner , Chris Wendler , Martin Josifoski , Robert West

Type-Constrained Code Generation with Language Models

Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although…

Machine Learning · Computer Science 2025-05-09 Niels Mündler , Jingxuan He , Hao Wang , Koushik Sen , Dawn Song , Martin Vechev

Constrained Decoding of Diffusion LLMs with Context-Free Grammars

Large language models (LLMs) have shown promising performance across diverse domains. Many practical applications of LLMs, such as code completion and structured data extraction, require adherence to syntactic constraints specified by a…

Machine Learning · Computer Science 2025-08-18 Niels Mündler , Jasper Dekoninck , Martin Vechev

Deferred Commitment Decoding for Diffusion Language Models

Diffusion language models (DLMs) have recently emerged as a strong alternative to autoregressive models by enabling parallel text generation. To improve inference efficiency and KV-cache compatibility, prior work commonly adopts block-based…

Computation and Language · Computer Science 2026-01-21 Yingte Shu , Yuchuan Tian , Chao Xu , Yunhe Wang , Hanting Chen

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding proposes to enforce strict formal language constraints during generation. However, as we show in this work, not only do such…

Machine Learning · Computer Science 2024-03-13 Luca Beurer-Kellner , Marc Fischer , Martin Vechev

Improving Multi-candidate Speculative Decoding

Speculative Decoding (SD) is a technique to accelerate the inference of Large Language Models (LLMs) by using a lower complexity draft model to propose candidate tokens verified by a larger target model. To further improve efficiency,…

Computation and Language · Computer Science 2024-12-17 Xiaofan Lu , Yixiao Zeng , Feiyang Ma , Zixu Yu , Marco Levorato

Constrained Decoding for Secure Code Generation

Code Large Language Models (Code LLMs) have been increasingly used by developers to boost productivity, but they often generate vulnerable code. Thus, there is an urgent need to ensure that code generated by Code LLMs is correct and secure.…

Cryptography and Security · Computer Science 2024-07-23 Yanjun Fu , Ethan Baker , Yu Ding , Yizheng Chen

Graph-Structured Speculative Decoding

Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models (LLMs) by employing a small language model to draft a hypothesis sequence, which is then validated by the LLM. The effectiveness…

Computation and Language · Computer Science 2024-07-24 Zhuocheng Gong , Jiahao Liu , Ziyue Wang , Pengfei Wu , Jingang Wang , Xunliang Cai , Dongyan Zhao , Rui Yan

Adaptive Draft-Verification for Efficient Large Language Model Decoding

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires…

Computation and Language · Computer Science 2024-08-20 Xukun Liu , Bowen Lei , Ruqi Zhang , Dongkuan Xu

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is…

Computation and Language · Computer Science 2025-08-19 Benjamin Lipkin , Benjamin LeBrun , Jacob Hoover Vigly , João Loula , David R. MacIver , Li Du , Jason Eisner , Ryan Cotterell , Vikash Mansinghka , Timothy J. O'Donnell , Alexander K. Lew , Tim Vieira

DocCGen: Document-based Controlled Code Generation

Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical…

Software Engineering · Computer Science 2024-07-04 Sameer Pimparkhede , Mehant Kammakomati , Srikanth Tamilselvam , Prince Kumar , Ashok Pon Kumar , Pushpak Bhattacharyya

Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models

Large Language Models (LLMs) have demonstrated impressive performance on multiple-choice question answering (MCQA) benchmarks, yet they remain highly vulnerable to minor input perturbations. In this paper, we introduce and evaluate Token…

Computation and Language · Computer Science 2025-06-12 Jui-Ming Yao , Hao-Yuan Chen , Zi-Xian Tang , Bing-Jia Tan , Sheng-Wei Peng , Bing-Cheng Xie , Shun-Feng Su

Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective

Constrained decoding enables Language Models (LMs) to produce samples that provably satisfy hard constraints. However, existing constrained-decoding approaches often distort the underlying model distribution, a limitation that is especially…

Artificial Intelligence · Computer Science 2025-06-09 Emmanuel Anaya Gonzalez , Sairam Vaidya , Kanghee Park , Ruyi Ji , Taylor Berg-Kirkpatrick , Loris D'Antoni

DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting

Large language models (LLMs) exhibit exceptional performance across a wide range of tasks; however, their token-by-token autoregressive generation process significantly hinders inference speed. Speculative decoding presents a promising…

Computation and Language · Computer Science 2025-03-04 Kai Lv , Honglin Guo , Qipeng Guo , Xipeng Qiu

Make Every Draft Count: Hidden State based Speculative Decoding

Speculative decoding has emerged as a pivotal technique to accelerate LLM inference by employing a lightweight draft model to generate candidate tokens that are subsequently verified by the target model in parallel. However, while this…

Computation and Language · Computer Science 2026-02-26 Yuetao Chen , Xuliang Wang , Xinzhou Zheng , Ming Li , Peng Wang , Hong Xu

Constrained Decoding with Speculative Lookaheads

Constrained decoding with lookahead heuristics (CDLH) is a highly effective method for aligning LLM generations to human preferences. However, the extensive lookahead roll-out operations for each generated token makes CDLH prohibitively…

Computation and Language · Computer Science 2025-02-12 Nishanth Nakshatri , Shamik Roy , Rajarshi Das , Suthee Chaidaroon , Leonid Boytsov , Rashmi Gangadharaiah

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can…

Computation and Language · Computer Science 2026-05-29 Ngoc Trinh Hung Nguyen , Alonso Silva , Laith Zumot , Liubov Tupikina , Armen Aghasaryan , Mehwish Alam

ChopChop: a Programmable Framework for Semantically Constraining the Output of Language Models

Language models (LMs) can generate code but cannot guarantee its correctness$\unicode{x2014}$often producing outputs that violate type safety, program invariants, or other semantic properties. Constrained decoding offers a solution by…

Programming Languages · Computer Science 2025-12-03 Shaan Nagy , Timothy Zhou , Nadia Polikarpova , Loris D'Antoni