Related papers: COCO: Testing Code Generation Systems via Concreti…

ReCode: Robustness Evaluation of Code Generation Models

Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in…

Machine Learning · Computer Science 2022-12-21 Shiqi Wang , Zheng Li , Haifeng Qian , Chenghao Yang , Zijian Wang , Mingyue Shang , Varun Kumar , Samson Tan , Baishakhi Ray , Parminder Bhatia , Ramesh Nallapati , Murali Krishna Ramanathan , Dan Roth , Bing Xiang

CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing

Large Language Models have revolutionized code generation ability by converting natural language descriptions into executable code. However, generating complex code within real-world scenarios remains challenging due to intricate…

Software Engineering · Computer Science 2024-10-15 Xinyi He , Jiaru Zou , Yun Lin , Mengyu Zhou , Shi Han , Zejian Yuan , Dongmei Zhang

CodeFort: Robust Training for Code Generation Models

Code generation models are not robust to small perturbations, which often lead to incorrect generations and significantly degrade the performance of these models. Although improving the robustness of code generation models is crucial to…

Software Engineering · Computer Science 2024-10-30 Yuhao Zhang , Shiqi Wang , Haifeng Qian , Zijian Wang , Mingyue Shang , Linbo Liu , Sanjay Krishna Gouda , Baishakhi Ray , Murali Krishna Ramanathan , Xiaofei Ma , Anoop Deoras

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions

Large Language Models (LLMs) have showcased remarkable capabilities in following human instructions. However, recent studies have raised concerns about the robustness of LLMs when prompted with instructions combining textual adversarial…

Computation and Language · Computer Science 2024-02-27 Yuansen Zhang , Xiao Wang , Zhiheng Xi , Han Xia , Tao Gui , Qi Zhang , Xuanjing Huang

CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

Recent advancements in Unified Multimodal Models (UMMs) have significantly advanced text-to-image (T2I) generation, particularly through the integration of Chain-of-Thought (CoT) reasoning. However, existing CoT-based T2I methods largely…

Artificial Intelligence · Computer Science 2026-03-10 Haodong Li , Chunmei Qing , Huanyu Zhang , Dongzhi Jiang , Yihang Zou , Hongbo Peng , Dingming Li , Yuhong Dai , ZePeng Lin , Juanxi Tian , Yi Zhou , Siqi Dai , Jingwei Wu

On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot

Software engineering research has always being concerned with the improvement of code completion approaches, which suggest the next tokens a developer will likely type while coding. The release of GitHub Copilot constitutes a big step…

Software Engineering · Computer Science 2023-02-02 Antonio Mastropaolo , Luca Pascarella , Emanuela Guglielmi , Matteo Ciniselli , Simone Scalabrino , Rocco Oliveto , Gabriele Bavota

On the Reliability and Explainability of Language Models for Program Generation

Recent studies have adopted pre-trained language models, such as CodeT5 and CodeGPT, for automated program generation tasks like code generation, repair, and translation. Numerous language model-based approaches have been proposed and…

Software Engineering · Computer Science 2024-01-09 Yue Liu , Chakkrit Tantithamthavorn , Yonghui Liu , Li Li

Evaluating perturbation robustness of generative systems that use COBOL code inputs

Systems incorporating large language models (LLMs) as a component are known to be sensitive (i.e., non-robust) to minor input variations that do not change the meaning of the input; such sensitivity may reduce the system's usefulness. Here,…

Software Engineering · Computer Science 2026-01-19 Samuel Ackerman , Wesam Ibraheem , Orna Raz , Marcel Zalmanovici

Structured Unit Testable Templated Code for Efficient Code Review Process

Modern software development teams are distributed across onsite and off-shore locations. Each team has developers with varying experience levels and English communication skills. In such a diverse development environment it is important to…

Software Engineering · Computer Science 2016-10-19 Amol Patwardhan

Assessing, Exploiting, and Mitigating Syntactic Robustness Failures in LLM-Based Code Generation

Rapid advances in the field of Large Language Models (LLMs) have made LLM-based code generation an important area for investigation. An LLM-based code generator takes a prompt as input and produces code that implements the requirements…

Software Engineering · Computer Science 2026-05-11 Laboni Sarker , Mara Downing , Achintya Desai , Tevfik Bultan

Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding

As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as…

Software Engineering · Computer Science 2025-12-05 Xinkui Zhao , Rongkai Liu , Yifan Zhang , Chen Zhi , Lufei Zhang , Guanjie Cheng , Yueshen Xu , Shuiguang Deng , Jianwei Yin

Prompting Techniques for Secure Code Generation: A Systematic Investigation

Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from natural language (NL) instructions. However, studies have questioned their ability to produce…

Software Engineering · Computer Science 2025-02-27 Catherine Tony , Nicolás E. Díaz Ferreyra , Markus Mutas , Salem Dhiff , Riccardo Scandariato

Towards Better Correctness and Efficiency in Code Generation

While code large language models have demonstrated remarkable progress in code generation, the generated code often exhibits poor runtime efficiency, limiting its practical application in performance-sensitive scenarios. To address this…

Software Engineering · Computer Science 2025-08-29 Yunlong Feng , Yang Xu , Xiao Xu , Binyuan Hui , Junyang Lin

On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex

Semantic parsing is a technique aimed at constructing a structured representation of the meaning of a natural-language question. Recent advancements in few-shot language models trained on code have demonstrated superior performance in…

Computation and Language · Computer Science 2023-03-10 Terry Yue Zhuo , Zhuang Li , Yujin Huang , Fatemeh Shiri , Weiqing Wang , Gholamreza Haffari , Yuan-Fang Li

Assessing the Security of GitHub Copilot Generated Code -- A Targeted Replication Study

AI-powered code generation models have been developing rapidly, allowing developers to expedite code generation and thus improve their productivity. These models are trained on large corpora of code (primarily sourced from public…

Software Engineering · Computer Science 2023-11-21 Vahid Majdinasab , Michael Joshua Bishop , Shawn Rasheed , Arghavan Moradidakhel , Amjed Tahir , Foutse Khomh

CodeDSI: Differentiable Code Search

Reimplementing solutions to previously solved software engineering problems is not only inefficient but also introduces inadequate and error-prone code. Many existing methods achieve impressive performance on this issue by using…

Software Engineering · Computer Science 2022-10-04 Usama Nadeem , Noah Ziems , Shaoen Wu

A Preliminary Study on the Robustness of Code Generation by Large Language Models

Robustness is a critical factor for reliable code generation by large language models, yet most evaluations focus on correctness and overlook key issues such as missing input validation and inadequate error handling. In this work, we…

Software Engineering · Computer Science 2025-09-24 Zike Li , Mingwei Liu , Anji Li , Kaifeng He , Yanlin Wang , Xin Peng , Zibin Zheng

A Multi-Language Perspective on the Robustness of LLM Code Generation

Large language models have gained significant traction and popularity in recent times, extending their usage to code-generation tasks. While this field has garnered considerable attention, the exploration of testing and evaluating the…

Software Engineering · Computer Science 2026-05-05 Fazle Rabbi , Zishuo Ding , Jinqiu Yang

Understanding Chain-of-Thought Effectiveness in Code Generation: An Empirical and Information-Theoretic Analysis

Large language models (LLMs) achieve strong performance on code generation, but the mechanisms by which Chain-of-Thought (CoT) prompting helps remain unclear. We present a systematic empirical and information-theoretic study of CoT…

Software Engineering · Computer Science 2025-12-11 Naizhu Jin , Zhong Li , Guang Yang , Tian Zhang , Qingkai Zeng

Understanding Defects in Generated Codes by Language Models

This study investigates the reliability of code generation by Large Language Models (LLMs), focusing on identifying and analyzing defects in the generated code. Despite the advanced capabilities of LLMs in automating code generation,…

Software Engineering · Computer Science 2024-08-27 Ali Mohammadi Esfahani , Nafiseh Kahani , Samuel A. Ajila