How Does Naming Affect LLMs on Code Analysis Tasks?

Zhilong Wang; Lan Zhang; Chen Cao; Nanqing Luo; Xinzhi Luo; Peng Liu

doi:10.4236/jsea.2024.1711044

How Does Naming Affect LLMs on Code Analysis Tasks?

Cryptography and Security 2024-11-14 v5 Artificial Intelligence

Authors: Zhilong Wang , Lan Zhang , Chen Cao , Nanqing Luo , Xinzhi Luo , Peng Liu

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

The Large Language Models (LLMs), such as GPT and BERT, were proposed for natural language processing (NLP) and have shown promising results as general-purpose language models. An increasing number of industry professionals and researchers are adopting LLMs for program analysis tasks. However, one significant difference between programming languages and natural languages is that a programmer has the flexibility to assign any names to variables, methods, and functions in the program, whereas a natural language writer does not. Intuitively, the quality of naming in a program affects the performance of LLMs in program analysis tasks. This paper investigates how naming affects LLMs on code analysis tasks. Specifically, we create a set of datasets with code containing nonsense or misleading names for variables, methods, and functions, respectively. We then use well-trained models (CodeBERT) to perform code analysis tasks on these datasets. The experimental results show that naming has a significant impact on the performance of code analysis tasks based on LLMs, indicating that code representation learning based on LLMs heavily relies on well-defined names in code. Additionally, we conduct a case study on some special code analysis tasks using GPT, providing further insights.

Keywords

large language model code generation language modeling

Cite

@article{arxiv.2307.12488,
  title  = {How Does Naming Affect LLMs on Code Analysis Tasks?},
  author = {Zhilong Wang and Lan Zhang and Chen Cao and Nanqing Luo and Xinzhi Luo and Peng Liu},
  journal= {arXiv preprint arXiv:2307.12488},
  year   = {2024}
}

Comments

3 Table, 8 figures

How Does Naming Affect LLMs on Code Analysis Tasks?

Abstract

Keywords

Cite

Comments

Related papers