English

Code to Comment Translation: A Comparative Study on Model Effectiveness & Errors

Software Engineering 2021-06-17 v1 Artificial Intelligence Computation and Language Machine Learning

Abstract

Automated source code summarization is a popular software engineering research topic wherein machine translation models are employed to "translate" code snippets into relevant natural language descriptions. Most evaluations of such models are conducted using automatic reference-based metrics. However, given the relatively large semantic gap between programming languages and natural language, we argue that this line of research would benefit from a qualitative investigation into the various error modes of current state-of-the-art models. Therefore, in this work, we perform both a quantitative and qualitative comparison of three recently proposed source code summarization models. In our quantitative evaluation, we compare the models based on the smoothed BLEU-4, METEOR, and ROUGE-L machine translation metrics, and in our qualitative evaluation, we perform a manual open-coding of the most common errors committed by the models when compared to ground truth captions. Our investigation reveals new insights into the relationship between metric-based performance and model prediction errors grounded in an empirically derived error taxonomy that can be used to drive future research efforts

Keywords

Cite

@article{arxiv.2106.08415,
  title  = {Code to Comment Translation: A Comparative Study on Model Effectiveness & Errors},
  author = {Junayed Mahmud and Fahim Faisal and Raihan Islam Arnob and Antonios Anastasopoulos and Kevin Moran},
  journal= {arXiv preprint arXiv:2106.08415},
  year   = {2021}
}

Comments

Accepted to the 2021 NLP4Prog Workshop co-located with The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)