English

AixBench: A Code Generation Benchmark Dataset

Software Engineering 2022-07-22 v2

Abstract

We present a benchmark dataset for evaluating method-level code generation task. The benchmark contains a dataset of 175 samples for automated evaluation and a dataset of 161 samples for manual evaluation. We also present a new metric for automatically evaluating the correctness of the generated code, and a set of criteria to manually evaluating the overall quality of the generated code.

Keywords

Cite

@article{arxiv.2206.13179,
  title  = {AixBench: A Code Generation Benchmark Dataset},
  author = {Yiyang Hao and Ge Li and Yongqiang Liu and Xiaowei Miao and He Zong and Siyuan Jiang and Yang Liu and He Wei},
  journal= {arXiv preprint arXiv:2206.13179},
  year   = {2022}
}
R2 v1 2026-06-24T12:05:04.257Z