English

Programming Language Confusion: When Code LLMs Can't Keep their Languages Straight

Software Engineering 2026-02-03 v2

Abstract

Large Language Models (LLMs) have achieved state-of-the-art performance across software engineering tasks, from code generation to translation. However, we identify and systematically evaluate a critical failure mode: Programming Language Confusion (PLC) -- the generation of code in unintended languages despite explicit instructions. Through evaluation of 10 popular LLMs across six multilingual datasets (LiveCodeBench, BabelCode variants, HumanEval-XL, and McEval), we demonstrate that PLC is pervasive, with some specialized models exhibiting the highest confusion rates. Our analysis reveals that PLC is not random noise but reflects systematic patterns: models consistently generate syntactically valid code even when it deviates from language specifications. This behavior produces distinct language migration patterns, most notably a strong default to Python and systematic shifts between syntactically similar language pairs (e.g., C#/Java). These migrations reflect statistical preferences learned from training data rather than goal-directed reasoning. We demonstrate that explicit language keywords provide the most effective mitigation, while natural language instructions have limited influence on model behavior. Furthermore, model quantization -- though essential for practical deployment -- significantly amplifies PLC and degrades syntactic stability in complex tasks. Our findings underscore that language fidelity should be treated as a core evaluation dimension for code LLMs. We advocate for standardized benchmarks and prompt formats with explicit language constraints to enable more reliable assessment and foster the development of robust, multilingual code generation systems.

Keywords

Cite

@article{arxiv.2503.13620,
  title  = {Programming Language Confusion: When Code LLMs Can't Keep their Languages Straight},
  author = {Micheline Bénédicte Moumoula and Serge Lionel Nikiema and Abdoul Kader Kabore and Jacques Klein and Tegawendé F. Bissyande},
  journal= {arXiv preprint arXiv:2503.13620},
  year   = {2026}
}

Comments

Accepted for publication at SANER 2026