English

CodeSCM: Causal Analysis for Multi-Modal Code Generation

Computation and Language 2025-02-10 v1

Abstract

In this paper, we propose CodeSCM, a Structural Causal Model (SCM) for analyzing multi-modal code generation using large language models (LLMs). By applying interventions to CodeSCM, we measure the causal effects of different prompt modalities, such as natural language, code, and input-output examples, on the model. CodeSCM introduces latent mediator variables to separate the code and natural language semantics of a multi-modal code generation prompt. Using the principles of Causal Mediation Analysis on these mediators we quantify direct effects representing the model's spurious leanings. We find that, in addition to natural language instructions, input-output examples significantly influence code generation.

Keywords

Cite

@article{arxiv.2502.05150,
  title  = {CodeSCM: Causal Analysis for Multi-Modal Code Generation},
  author = {Mukur Gupta and Noopur Bhatt and Suman Jana},
  journal= {arXiv preprint arXiv:2502.05150},
  year   = {2025}
}

Comments

Accepted to NAACL 2025