Related papers: CodePAD: Sequence-based Code Generation with Pushd…

Execution-based Code Generation using Deep Reinforcement Learning

The utilization of programming language (PL) models, pre-trained on large-scale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such…

Machine Learning · Computer Science 2023-07-21 Parshin Shojaee , Aneesh Jain , Sindhu Tipirneni , Chandan K. Reddy

CodeDPO: Aligning Code Models with Self Generated and Verified Source Code

Code generation models have shown significant potential for programming tasks. However, existing training methods like supervised fine-tuning face key limitations: they do not effectively teach models to prioritize correct over incorrect…

Software Engineering · Computer Science 2025-06-04 Kechi Zhang , Ge Li , Yihong Dong , Jingjing Xu , Jun Zhang , Jing Su , Yongfei Liu , Zhi Jin

Uncovering Weaknesses in Neural Code Generation

Code generation, the task of producing source code from prompts, has seen significant advancements with the advent of pre-trained large language models (PLMs). Despite these achievements, there lacks a comprehensive taxonomy of weaknesses…

Software Engineering · Computer Science 2024-07-18 Xiaoli Lian , Shuaisong Wang , Jieping Ma , Fang Liu , Xin Tan , Li Zhang , Lin Shi , Cuiyun Gao

Generating refactored code accurately using reinforcement learning

Automated source code refactoring, particularly extract method refactoring, is a crucial and frequently employed technique during software development. Despite its importance and frequent use by practitioners, current automated techniques…

Software Engineering · Computer Science 2024-12-25 Indranil Palit , Tushar Sharma

Automatic Code Generation using Pre-Trained Language Models

Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly…

Computation and Language · Computer Science 2021-02-23 Luis Perez , Lizi Ottens , Sudharshan Viswanathan

Focused-DPO: Enhancing Code Generation Through Focused Preference Optimization on Error-Prone Points

Code generation models have shown significant potential for automating programming tasks. However, the challenge of generating accurate and reliable code persists due to the highly complex and long-reasoning nature of the task. Even…

Software Engineering · Computer Science 2025-06-04 Kechi Zhang , Ge Li , Jia Li , Yihong Dong , Jia Li , Zhi Jin

Correctness-Guaranteed Code Generation via Constrained Decoding

Language Models (LMs) are increasingly being used for code generation, but ensuring the correctness of generated programs remains a significant challenge. Although imperfect code may be acceptable during software development with human…

Programming Languages · Computer Science 2025-08-25 Lingxiao Li , Salar Rahili , Yiwei Zhao

Code Generation from Natural Language with Less Prior and More Monolingual Data

Training datasets for semantic parsing are typically small due to the higher expertise required for annotation than most other NLP tasks. As a result, models for this application usually need additional prior knowledge to be built into the…

Computation and Language · Computer Science 2021-06-11 Sajad Norouzi , Keyi Tang , Yanshuai Cao

Code Difference Guided Adversarial Example Generation for Deep Code Models

Adversarial examples are important to test and enhance the robustness of deep code models. As source code is discrete and has to strictly stick to complex grammar and semantics constraints, the adversarial example generation techniques in…

Cryptography and Security · Computer Science 2023-08-22 Zhao Tian , Junjie Chen , Zhi Jin

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical…

Machine Learning · Computer Science 2022-11-04 Hung Le , Yue Wang , Akhilesh Deepak Gotmare , Silvio Savarese , Steven C. H. Hoi

CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information

Automatic code generation is to generate the program code according to the given natural language description. The current mainstream approach uses neural networks to encode natural language descriptions, and output abstract syntax trees…

Software Engineering · Computer Science 2022-02-16 Maosheng Zhong , Gen Liu , Hongwei Li , Jiangling Kuang , Jinshan Zeng , Mingwen Wang

Set Block Decoding is a Language Model Inference Accelerator

Autoregressive next token prediction language models offer powerful capabilities but face significant challenges in practical deployment due to the high computational and memory costs of inference, particularly during the decoding stage. We…

Machine Learning · Computer Science 2025-09-05 Itai Gat , Heli Ben-Hamu , Marton Havasi , Daniel Haziza , Jeremy Reizenstein , Gabriel Synnaeve , David Lopez-Paz , Brian Karrer , Yaron Lipman

CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP

We present CoDa (Constrained Generation based Data Augmentation), a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP. Our approach is based on prompting off-the-shelf…

Computation and Language · Computer Science 2024-04-02 Chandra Kiran Reddy Evuru , Sreyan Ghosh , Sonal Kumar , Ramaneswaran S , Utkarsh Tyagi , Dinesh Manocha

Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets

The increasing adoption of large language models (LLMs) for code-related tasks has raised concerns about the security of their training datasets. One critical threat is dead code poisoning, where syntactically valid but functionally…

Computation and Language · Computer Science 2025-03-03 Chi-Chien Tsai , Chia-Mu Yu , Ying-Dar Lin , Yu-Sung Wu , Wei-Bin Lee

Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation

Visual programming languages (VPLs) allow users to create programs through graphical interfaces, which results in easier accessibility and their widespread usage in various domains. To further enhance this accessibility, recent research has…

Computation and Language · Computer Science 2025-05-26 Deokhyung Kang , Jeonghun Cho , Yejin Jeon , Sunbin Jang , Minsub Lee , Jawoon Cho , Gary Geunbae Lee

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is…

Computation and Language · Computer Science 2025-08-19 Benjamin Lipkin , Benjamin LeBrun , Jacob Hoover Vigly , João Loula , David R. MacIver , Li Du , Jason Eisner , Ryan Cotterell , Vikash Mansinghka , Timothy J. O'Donnell , Alexander K. Lew , Tim Vieira

CodeGRU: Context-aware Deep Learning with Gated Recurrent Unit for Source Code Modeling

Recently deep learning based Natural Language Processing (NLP) models have shown great potential in the modeling of source code. However, a major limitation of these approaches is that they take source code as simple tokens of text and…

Neural and Evolutionary Computing · Computer Science 2020-07-15 Yasir Hussain , Zhiqiu Huang , Yu Zhou , Senzhang Wang

Understanding Defects in Generated Codes by Language Models

This study investigates the reliability of code generation by Large Language Models (LLMs), focusing on identifying and analyzing defects in the generated code. Despite the advanced capabilities of LLMs in automating code generation,…

Software Engineering · Computer Science 2024-08-27 Ali Mohammadi Esfahani , Nafiseh Kahani , Samuel A. Ajila

Keeping Notes: Conditional Natural Language Generation with a Scratchpad Mechanism

We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation…

Computation and Language · Computer Science 2019-06-14 Ryan Y. Benmalek , Madian Khabsa , Suma Desu , Claire Cardie , Michele Banko

A Novel Approach for Rapid Development Based on ChatGPT and Prompt Engineering

Code generation stands as a powerful technique in modern software development, improving development efficiency, reducing errors, and fostering standardization and consistency. Recently, ChatGPT has exhibited immense potential in automatic…

Software Engineering · Computer Science 2023-12-22 Youjia Li , Jianjun Shi , Zheng Zhang