English

Code Roulette: How Prompt Variability Affects LLM Code Generation

Software Engineering 2026-03-19 v2 Machine Learning

Abstract

Code generation is one of the most active areas of application of Large Language Models (LLMs). While LLMs lower barriers to writing code and accelerate development process, the overall quality of generated programs depends on the quality of given prompts. Specifically, functionality and quality of generated code can be sensitive to user's background and familiarity with software development. It is therefore important to quantify LLM's sensitivity to variations in the input. To this end we propose an evaluation pipeline for LLM code generation with a focus on measuring sensitivity to prompt augmentations, completely agnostic to a specific programming tasks and LLMs, and thus widely applicable. We provide extensive experimental evidence illustrating utility of our method and share our code for the benefit of the community.

Keywords

Cite

@article{arxiv.2506.10204,
  title  = {Code Roulette: How Prompt Variability Affects LLM Code Generation},
  author = {Andrei Paleyes and Radzim Sendyka and Diana Robinson and Christian Cabrera and Neil D. Lawrence},
  journal= {arXiv preprint arXiv:2506.10204},
  year   = {2026}
}

Comments

Extended version of the paper accepted to LLM4Code @ ICSE 2026