Related papers: SynCode: LLM Generation with Grammar Augmentation

Synthetic Data Generation Using Large Language Models: Advances in Text and Code

This survey reviews how large language models (LLMs) are transforming synthetic training data generation in both natural language and code domains. By producing artificial but task-relevant examples, these models can significantly augment…

Computation and Language · Computer Science 2025-11-21 Mihai Nadas , Laura Diosan , Andreea Tomescu

SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code Generation

Large Language Models (LLMs) often struggle with complex mathematical reasoning, where prose-based generation leads to unverified and arithmetically unsound solutions. Current prompting strategies like Chain of Thought still operate within…

Computation and Language · Computer Science 2026-01-27 Sina Bagheri Nezhad , Yao Li , Ameeta Agrawal

SynthCoder: A Synthetical Strategy to Tune LLMs for Code Completion

Code completion is a prominent application of Large Language Models (LLMs) in software engineering. Due to the near real-time response requirements of this task, base models with small to medium-sized parameters are typically employed,…

Software Engineering · Computer Science 2025-09-18 Dongjun Yu , Xiao Yan , Zhenrui Li , Jipeng Xiao , Haochuan He , Yongda Yu , Hao Zhang , Guoping Rong , Xiaobo Huang

Flexible and Efficient Grammar-Constrained Decoding

Large Language Models (LLMs) are often asked to generate structured outputs that obey precise syntactic rules, such as code snippets or formatted data. Grammar-constrained decoding (GCD) can guarantee that LLM outputs matches such rules by…

Computation and Language · Computer Science 2025-07-17 Kanghee Park , Timothy Zhou , Loris D'Antoni

CodecLM: Aligning Language Models with Tailored Synthetic Data

Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor…

Computation and Language · Computer Science 2024-04-10 Zifeng Wang , Chun-Liang Li , Vincent Perot , Long T. Le , Jin Miao , Zizhao Zhang , Chen-Yu Lee , Tomas Pfister

Assessing, Exploiting, and Mitigating Syntactic Robustness Failures in LLM-Based Code Generation

Rapid advances in the field of Large Language Models (LLMs) have made LLM-based code generation an important area for investigation. An LLM-based code generator takes a prompt as input and produces code that implements the requirements…

Software Engineering · Computer Science 2026-05-11 Laboni Sarker , Mara Downing , Achintya Desai , Tevfik Bultan

LogiCode: an LLM-Driven Framework for Logical Anomaly Detection

This paper presents LogiCode, a novel framework that leverages Large Language Models (LLMs) for identifying logical anomalies in industrial settings, moving beyond traditional focus on structural inconsistencies. By harnessing LLMs for…

Machine Learning · Computer Science 2024-06-10 Yiheng Zhang , Yunkang Cao , Xiaohao Xu , Weiming Shen

Grounding Data Science Code Generation with Input-Output Specifications

Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language (NL) prompts. However, in the real world, NL is often too ambiguous to capture the true intent behind programming problems,…

Machine Learning · Computer Science 2024-03-18 Yeming Wen , Pengcheng Yin , Kensen Shi , Henryk Michalewski , Swarat Chaudhuri , Alex Polozov

TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and…

Machine Learning · Computer Science 2026-04-27 Henrijs Princis , Arindam Sharma , Cristina David

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…

Software Engineering · Computer Science 2026-05-22 Wei Ma , Zhihao Lin , Shangqing Liu , Qiang Hu , Ye Liu , Wenhan Wang , Cen Zhang , Liming Nie , Li Li , Yang Liu , Lingxiao Jiang

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

Code generation tasks aim to automate the conversion of user requirements into executable code, significantly reducing manual development efforts and enhancing software productivity. The emergence of large language models (LLMs) has…

Software Engineering · Computer Science 2026-01-15 Sicong Liu , Yanxian Huang , Mingwei Liu , Jiachi Chen , Ensheng Shi , Yuchi Ma , Hongyu Zhang , Yin Zhang , Yanlin Wang

CRANE: Reasoning with constrained LLM generation

Code generation, symbolic math reasoning, and other tasks require LLMs to produce outputs that are both syntactically and semantically correct. Constrained LLM generation is a promising direction to enforce adherence to formal grammar, but…

Programming Languages · Computer Science 2025-09-08 Debangshu Banerjee , Tarun Suresh , Shubham Ugare , Sasa Misailovic , Gagandeep Singh

DocCGen: Document-based Controlled Code Generation

Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical…

Software Engineering · Computer Science 2024-07-04 Sameer Pimparkhede , Mehant Kammakomati , Srikanth Tamilselvam , Prince Kumar , Ashok Pon Kumar , Pushpak Bhattacharyya

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code…

Software Engineering · Computer Science 2024-02-06 Shihan Dou , Yan Liu , Haoxiang Jia , Limao Xiong , Enyu Zhou , Wei Shen , Junjie Shan , Caishuang Huang , Xiao Wang , Xiaoran Fan , Zhiheng Xi , Yuhao Zhou , Tao Ji , Rui Zheng , Qi Zhang , Xuanjing Huang , Tao Gui

Constrained Decoding of Diffusion LLMs with Context-Free Grammars

Large language models (LLMs) have shown promising performance across diverse domains. Many practical applications of LLMs, such as code completion and structured data extraction, require adherence to syntactic constraints specified by a…

Machine Learning · Computer Science 2025-08-18 Niels Mündler , Jasper Dekoninck , Martin Vechev

Case2Code: Scalable Synthetic Data for Code Generation

Large Language Models (LLMs) have shown outstanding breakthroughs in code generation. Recent work improves code LLMs by training on synthetic data generated by some powerful LLMs, which can be challenging to scale due to the dependence on a…

Computation and Language · Computer Science 2025-02-11 Yunfan Shao , Linyang Li , Yichuan Ma , Peiji Li , Demin Song , Qinyuan Cheng , Shimin Li , Xiaonan Li , Pengyu Wang , Qipeng Guo , Hang Yan , Xipeng Qiu , Xuanjing Huang , Dahua Lin

Can Code Language Models Learn Clarification-Seeking Behaviors?

Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, a gap remains between their output and the problem-solving strategies of human developers. Unlike humans, who spend substantial time…

Software Engineering · Computer Science 2025-09-29 Jie JW Wu , Manav Chaudhary , Davit Abrahamyan , Arhaan Khaku , Anjiang Wei , Fatemeh H. Fard

Type-Constrained Code Generation with Language Models

Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although…

Machine Learning · Computer Science 2025-05-09 Niels Mündler , Jingxuan He , Hao Wang , Koushik Sen , Dawn Song , Martin Vechev

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages

Node-based programming languages are increasingly popular in media arts coding domains. These languages are designed to be accessible to users with limited coding experience, allowing them to achieve creative output without an extensive…

Software Engineering · Computer Science 2024-09-04 William Zhang , Maria Leon , Ryan Xu , Adrian Cardenas , Amelia Wissink , Hanna Martin , Maya Srikanth , Kaya Dorogi , Christian Valadez , Pedro Perez , Citlalli Grijalva , Corey Zhang , Mark Santolucito