English
Related papers

Related papers: BARTScore: Evaluating Generated Text as Text Gener…

200 papers

The rapid development of large pretrained language models has revolutionized not only the field of Natural Language Generation (NLG) but also its evaluation. Inspired by the recent work of BARTScore: a metric leveraging the BART language…

Computation and Language · Computer Science 2022-10-14 Moussa Kamal Eddine , Guokan Shang , Michalis Vazirgiannis

The state-of-the-art language model-based automatic metrics, e.g. BARTScore, benefiting from large-scale contextualized pre-training, have been successfully used in a wide range of natural language generation (NLG) tasks, including machine…

Computation and Language · Computer Science 2022-12-21 Qingyu Lu , Liang Ding , Liping Xie , Kanjian Zhang , Derek F. Wong , Dacheng Tao

We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However,…

Computation and Language · Computer Science 2020-02-25 Tianyi Zhang , Varsha Kishore , Felix Wu , Kilian Q. Weinberger , Yoav Artzi

Automatically evaluating the quality of language generation is critical. Although recent learned metrics show high correlation with human judgement, these metrics can not explain their verdict or associate the scores with defects in…

Computation and Language · Computer Science 2023-10-30 Wenda Xu , Danqing Wang , Liangming Pan , Zhenqiao Song , Markus Freitag , William Yang Wang , Lei Li

Fast and reliable evaluation metrics are key to R&D progress. While traditional natural language generation metrics are fast, they are not very reliable. Conversely, new metrics based on large pretrained language models are much more…

Computation and Language · Computer Science 2021-10-19 Moussa Kamal Eddine , Guokan Shang , Antoine J. -P. Tixier , Michalis Vazirgiannis

Since the rise of neural natural-language-to-code models (NL->Code) that can generate long expressions and statements rather than a single next-token, one of the major problems has been reliably evaluating their generated output. In this…

Software Engineering · Computer Science 2023-11-01 Shuyan Zhou , Uri Alon , Sumit Agarwal , Graham Neubig

A robust evaluation metric has a profound impact on the development of text generation systems. A desirable metric compares system output against references based on their semantics rather than surface forms. In this paper we investigate…

Computation and Language · Computer Science 2019-09-27 Wei Zhao , Maxime Peyrard , Fei Liu , Yang Gao , Christian M. Meyer , Steffen Eger

Generative Artificial Intelligence (AI) has enabled the development of sophisticated models that are capable of producing high-caliber text, images, and other outputs through the utilization of large pre-trained models. Nevertheless,…

Computation and Language · Computer Science 2023-02-14 Jinlan Fu , See-Kiong Ng , Zhengbao Jiang , Pengfei Liu

A new metric \texttt{BaryScore} to evaluate text generation based on deep contextualized embeddings e.g., BERT, Roberta, ELMo) is introduced. This metric is motivated by a new framework relying on optimal transport tools, i.e., Wasserstein…

Computation and Language · Computer Science 2021-09-10 Pierre Colombo , Guillaume Staerman , Chloe Clavel , Pablo Piantanida

While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper…

Text summarizing is a critical Natural Language Processing (NLP) task with applications ranging from information retrieval to content generation. Large Language Models (LLMs) have shown remarkable promise in generating fluent abstractive…

Computation and Language · Computer Science 2025-03-03 Colleen Gilhuly , Haleh Shahzad

Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric:…

Computation and Language · Computer Science 2020-08-20 Jing Gu , Qingyang Wu , Zhou Yu

Automatic evaluation of generated textual content presents an ongoing challenge within the field of NLP. Given the impressive capabilities of modern language models (LMs) across diverse NLP tasks, there is a growing trend to employ these…

Computation and Language · Computer Science 2024-06-10 Yiqi Liu , Nafise Sadat Moosavi , Chenghua Lin

Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics (such as BERTScore or MoverScore) are based on black-box language models such as BERT or XLM-R. They often achieve strong correlations with human…

Computation and Language · Computer Science 2022-03-22 Christoph Leiter , Piyawat Lertvittayakumjorn , Marina Fomicheva , Wei Zhao , Yang Gao , Steffen Eger

Text generation is the automated process of producing written or spoken language using computational methods. It involves generating coherent and contextually relevant text based on predefined rules or learned patterns. However, challenges…

Computation and Language · Computer Science 2025-01-30 Rahimanuddin Shaik , Katikela Sreeharsha Kishore

Natural language processing (NLP) systems are increasingly trained to generate open-ended text rather than classifying between responses. This makes research on evaluation metrics for generated language -- functions that score system output…

Computation and Language · Computer Science 2021-10-19 Thomas Scialom , Felix Hill

Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input context of generation, rendering the lack of deep understanding of the relevance between the generated…

Computation and Language · Computer Science 2022-05-02 Xiaoqiang Wang , Bang Liu , Siliang Tang , Lingfei Wu

Automated evaluation of text generation systems has recently seen increasing attention, particularly checking whether generated text stays truthful to input sources. Existing methods frequently rely on an evaluation using task-specific…

Computation and Language · Computer Science 2023-05-23 Jing Fan , Dennis Aumiller , Michael Gertz

Lexically constrained text generation aims to control the generated text by incorporating some pre-specified keywords into the output. Previous work injects lexical constraints into the output by controlling the decoding process or refining…

Computation and Language · Computer Science 2021-09-28 Xingwei He

Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets are constructed as an i.i.d. sample from the overall data, this approach overly…

‹ Prev 1 2 3 10 Next ›