Related papers: AdapTrack: Constrained Decoding without Distorting…

Type-Constrained Code Generation with Language Models

Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although…

Machine Learning · Computer Science 2025-05-09 Niels Mündler , Jingxuan He , Hao Wang , Koushik Sen , Dawn Song , Martin Vechev

Grammar-Aligned Decoding

Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what…

Artificial Intelligence · Computer Science 2025-12-15 Kanghee Park , Jiayu Wang , Taylor Berg-Kirkpatrick , Nadia Polikarpova , Loris D'Antoni

ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Prompt Tuning (PT) enables the adaptation of Pre-trained Large Language Models (PLMs) to downstream tasks by optimizing a small amount of soft virtual tokens, which are prepended to the input token embeddings. Recently, Decomposed Prompt…

Computation and Language · Computer Science 2025-12-23 Pengwei Tang , Xiaolin Hu , Yong Liu

Approximately Aligned Decoding

It is common to reject undesired outputs of Large Language Models (LLMs); however, current methods to do so require an excessive amount of computation to re-sample after a rejection, or distort the distribution of outputs by constraining…

Computation and Language · Computer Science 2025-10-09 Daniel Melcer , Sujan Gonugondla , Pramuditha Perera , Haifeng Qian , Wen-Hao Chiang , Yanjun Wang , Nihal Jain , Pranav Garg , Xiaofei Ma , Anoop Deoras

IntentCoding: Amplifying User Intent in Code Generation

Large Language Models (LLMs) have shown strong capabilities in code generation, but their adherence to fine-grained user intent with multiple constraints remains a significant challenge. Our empirical analysis reveals two key observations:…

Software Engineering · Computer Science 2026-02-03 Zheng Fang , Yihong Dong , Lili Mou , Dongming Jin , Zhi Jin , Ge Li

AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability

Speculative decoding is a powerful technique that attempts to circumvent the autoregressive constraint of modern Large Language Models (LLMs). The aim of speculative decoding techniques is to improve the average inference time of a large,…

Computation and Language · Computer Science 2024-10-25 Sudhanshu Agrawal , Wonseok Jeon , Mingu Lee

Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs

Large Language Models (LLMs) have recently made significant advances in code generation through the 'Chain-of-Thought' prompting technique. This technique empowers the model to autonomously devise "solution plans" to tackle intricate…

Software Engineering · Computer Science 2024-03-21 Zhihong Sun , Chen Lyu , Bolun Li , Yao Wan , Hongyu Zhang , Ge Li , Zhi Jin

Draft-Conditioned Constrained Decoding for Structured Generation in LLMs

Large language models (LLMs) are increasingly used to generate executable outputs, JSON objects, and API calls, where a single syntax error can make the output unusable. Constrained decoding enforces validity token-by-token via masking and…

Computation and Language · Computer Science 2026-03-05 Avinash Reddy , Thayne T. Walker , James S. Ide , Amrit Singh Bedi

AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference

Large language models (LLMs) have achieved remarkable performance across a wide range of tasks, but their increasing parameter sizes significantly slow down inference. Speculative decoding mitigates this issue by leveraging a smaller draft…

Computation and Language · Computer Science 2026-05-27 Kuan-Wei Lu , Ding-Yong Hong , Pangfeng Liu , Jan-Jan Wu

APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation

Large language models (LLMs) often suffer from hallucinations due to error accumulation in autoregressive decoding, where suboptimal early token choices misguide subsequent generation. Although multi-path decoding can improve robustness by…

Computation and Language · Computer Science 2026-05-21 Tianyu Zheng , Hong Wu , Jiaji Zhong

ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning

Conventional continual pretraining (CPT) for large language model (LLM) domain adaptation often suffers from catastrophic forgetting and limited domain capacity. Existing strategies adopt layer expansion, introducing additional trainable…

Machine Learning · Computer Science 2025-10-14 Jinyang Zhang , Yue Fang , Hongxin Ding , Weibin Liao , Muyang Ye , Xu Chu , Junfeng Zhao , Yasha Wang

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is…

Computation and Language · Computer Science 2025-08-19 Benjamin Lipkin , Benjamin LeBrun , Jacob Hoover Vigly , João Loula , David R. MacIver , Li Du , Jason Eisner , Ryan Cotterell , Vikash Mansinghka , Timothy J. O'Donnell , Alexander K. Lew , Tim Vieira

ADaPT: As-Needed Decomposition and Planning with Language Models

Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment. Recent works employ LLMs-as-agents in broadly two ways: iteratively determining the next…

Artificial Intelligence · Computer Science 2024-04-10 Archiki Prasad , Alexander Koller , Mareike Hartmann , Peter Clark , Ashish Sabharwal , Mohit Bansal , Tushar Khot

Adacc: An Adaptive Framework Unifying Compression and Activation Recomputation for LLM Training

Training large language models (LLMs) is often constrained by GPU memory limitations. To alleviate memory pressure, activation recomputation and data compression have been proposed as two major strategies. However, both approaches have…

Machine Learning · Computer Science 2025-08-11 Ping Chen , Zhuohong Deng , Ping Li , Shuibing He , Hongzi Zhu , Yi Zheng , Zhefeng Wang , Baoxing Huai , Minyi Guo

Constraint-Aware Flow Matching: Decision Aligned End-to-End Training for Constrained Sampling

Deep generative models provide state-of-the-art performance across a wide array of applications, with recent studies showing increasing applicability for science and engineering. Despite a growing corpus of literature focused on the…

Machine Learning · Computer Science 2026-05-14 Jacob K. Christopher , James E. Warner , Ferdinando Fioretto

ChopChop: a Programmable Framework for Semantically Constraining the Output of Language Models

Language models (LMs) can generate code but cannot guarantee its correctness$\unicode{x2014}$often producing outputs that violate type safety, program invariants, or other semantic properties. Constrained decoding offers a solution by…

Programming Languages · Computer Science 2025-12-03 Shaan Nagy , Timothy Zhou , Nadia Polikarpova , Loris D'Antoni

DocPrompting: Generating Code by Retrieving the Docs

Publicly available source-code libraries are continuously growing and changing. This makes it impossible for models of code to keep current with all available APIs by simply training these models on existing code repositories. Thus,…

Computation and Language · Computer Science 2023-02-21 Shuyan Zhou , Uri Alon , Frank F. Xu , Zhiruo Wang , Zhengbao Jiang , Graham Neubig

AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism

Large language models (LLMs) are increasingly used for long-content generation (e.g., long Chain-of-Thought reasoning) where decoding efficiency becomes a critical bottleneck: Autoregressive decoding is inherently limited by its sequential…

Computation and Language · Computer Science 2025-06-05 Zhepei Wei , Wei-Lin Chen , Xinyu Zhu , Yu Meng

Adaptive Draft-Verification for Efficient Large Language Model Decoding

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires…

Computation and Language · Computer Science 2024-08-20 Xukun Liu , Bowen Lei , Ruqi Zhang , Dongkuan Xu

TreeDiff: AST-Guided Code Generation with Diffusion LLMs

Code generation is increasingly critical for real-world applications. Still, diffusion-based large language models continue to struggle with this demand. Unlike free-form text, code requires syntactic precision; even minor structural…

Computation and Language · Computer Science 2026-01-07 Yiming Zeng , Jinghan Cao , Zexin Li , Yiming Chen , Tao Ren , Zhuochun Li , Dawei Xiang , Xidong Wu , Shangqian Gao , Tingting Yu