Related papers: CodeBERTScore: Evaluating Code Generation with Pre…

BERTScore: Evaluating Text Generation with BERT

We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However,…

Computation and Language · Computer Science 2020-02-25 Tianyi Zhang , Varsha Kishore , Felix Wu , Kilian Q. Weinberger , Yoav Artzi

On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation

The task of code generation from natural language (NL2Code) has become extremely popular, especially with the advent of Large Language Models (LLMs). However, efforts to quantify and track this progress have suffered due to a lack of…

Software Engineering · Computer Science 2024-05-06 Atharva Naik

CodeBERT-nt: code naturalness via CodeBERT

Much of software-engineering research relies on the naturalness of code, the fact that code, in small code snippets, is repetitive and can be predicted using statistical language models like n-gram. Although powerful, training such models…

Software Engineering · Computer Science 2022-08-15 Ahmed Khanfir , Matthieu Jimenez , Mike Papadakis , Yves Le Traon

BARTScore: Evaluating Generated Text as Text Generation

A wide variety of NLP applications, such as machine translation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate,…

Computation and Language · Computer Science 2021-10-28 Weizhe Yuan , Graham Neubig , Pengfei Liu

Toward Human-Like Evaluation for Natural Language Generation with Error Analysis

The state-of-the-art language model-based automatic metrics, e.g. BARTScore, benefiting from large-scale contextualized pre-training, have been successfully used in a wide range of natural language generation (NLG) tasks, including machine…

Computation and Language · Computer Science 2022-12-21 Qingyu Lu , Liang Ding , Liping Xie , Kanjian Zhang , Derek F. Wong , Dacheng Tao

Learning and Evaluating Contextual Embedding of Source Code

Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come…

Software Engineering · Computer Science 2020-08-19 Aditya Kanade , Petros Maniatis , Gogul Balakrishnan , Kensen Shi

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper…

Sound · Computer Science 2024-09-04 Takaaki Saeki , Soumi Maiti , Shinnosuke Takamichi , Shinji Watanabe , Hiroshi Saruwatari

CodeT: Code Generation with Generated Tests

The task of generating code solutions for a given programming problem can benefit from the use of pre-trained language models such as Codex, which can produce multiple diverse samples. However, a major challenge for this task is to select…

Computation and Language · Computer Science 2022-11-24 Bei Chen , Fengji Zhang , Anh Nguyen , Daoguang Zan , Zeqi Lin , Jian-Guang Lou , Weizhu Chen

Layer or Representation Space: What makes BERT-based Evaluation Metrics Robust?

The evaluation of recent embedding-based evaluation metrics for text generation is primarily based on measuring their correlation with human evaluations on standard benchmarks. However, these benchmarks are mostly from similar domains to…

Computation and Language · Computer Science 2022-09-08 Doan Nam Long Vu , Nafise Sadat Moosavi , Steffen Eger

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

CodeScore: Evaluating Code Generation by Learning Code Execution

A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation, which is an important research field in NLP and software engineering. Prevailing match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) suffer from…

Software Engineering · Computer Science 2024-09-06 Yihong Dong , Jiazheng Ding , Xue Jiang , Ge Li , Zhuo Li , Zhi Jin

CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information

Automatic code generation is to generate the program code according to the given natural language description. The current mainstream approach uses neural networks to encode natural language descriptions, and output abstract syntax trees…

Software Engineering · Computer Science 2022-02-16 Maosheng Zhong , Gen Liu , Hongwei Li , Jiangling Kuang , Jinshan Zeng , Mingwen Wang

IFEvalCode: Controlled Code Generation

Code large language models (Code LLMs) have made significant progress in code generation by translating natural language descriptions into functional code; however, real-world applications often demand stricter adherence to detailed…

Computation and Language · Computer Science 2025-08-04 Jian Yang , Wei Zhang , Shukai Liu , Linzheng Chai , Yingshui Tan , Jiaheng Liu , Ge Zhang , Wangchunshu Zhou , Guanglin Niu , Zhoujun Li , Binyuan Hui , Junyang Lin

ICE-Score: Instructing Large Language Models to Evaluate Code

Recent advancements in the field of natural language generation have facilitated the use of large language models to assess the quality of generated text. Although these models have shown promising results in tasks such as machine…

Artificial Intelligence · Computer Science 2024-01-23 Terry Yue Zhuo

Toward Code Generation: A Survey and Lessons from Semantic Parsing

With the growth of natural language processing techniques and demand for improved software engineering efficiency, there is an emerging interest in translating intention from human languages to programming languages. In this survey paper,…

Software Engineering · Computer Science 2021-05-20 Celine Lee , Justin Gottschlich , Dan Roth

DATScore: Evaluating Translation with Data Augmented Translations

The rapid development of large pretrained language models has revolutionized not only the field of Natural Language Generation (NLG) but also its evaluation. Inspired by the recent work of BARTScore: a metric leveraging the BART language…

Computation and Language · Computer Science 2022-10-14 Moussa Kamal Eddine , Guokan Shang , Michalis Vazirgiannis

Automatic Code Generation using Pre-Trained Language Models

Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly…

Computation and Language · Computer Science 2021-02-23 Luis Perez , Lizi Ottens , Sudharshan Viswanathan

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from…

Software Engineering · Computer Science 2021-10-19 Mehdi Bahrami , N. C. Shrikanth , Yuji Mizobuchi , Lei Liu , Masahiro Fukuyori , Wei-Peng Chen , Kazuki Munakata

CoditT5: Pretraining for Source Code and Natural Language Editing

Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel…

Software Engineering · Computer Science 2022-09-15 Jiyang Zhang , Sheena Panthaplackel , Pengyu Nie , Junyi Jessy Li , Milos Gligoric

Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models

Automated evaluation of open domain natural language generation (NLG) models remains a challenge and widely used metrics such as BLEU and Perplexity can be misleading in some cases. In our paper, we propose to evaluate natural language…

Computation and Language · Computer Science 2020-02-13 Wangchunshu Zhou , Ke Xu