English
Related papers

Related papers: WIDAR -- Weighted Input Document Augmented ROUGE

200 papers

Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While…

Information Retrieval · Computer Science 2018-03-07 Kavita Ganesan

Abstractive summarization approaches based on Reinforcement Learning (RL) have recently been proposed to overcome classical likelihood maximization. RL enables to consider complex, possibly non-differentiable, metrics that globally assess…

Computation and Language · Computer Science 2019-09-05 Thomas Scialom , Sylvain Lamprier , Benjamin Piwowarski , Jacopo Staiano

ROUGE is a widely adopted, automatic evaluation measure for text summarization. While it has been shown to correlate well with human judgements, it is biased towards surface lexical similarities. This makes it unsuitable for the evaluation…

Computation and Language · Computer Science 2015-08-26 Jun-Ping Ng , Viktoria Abrecht

A desirable property of a reference-based evaluation metric that measures the content quality of a summary is that it should estimate how much information that summary has in common with a reference. Traditional text overlap based metrics…

Computation and Language · Computer Science 2021-07-28 Daniel Deutsch , Tania Bedrax-Weiss , Dan Roth

Several code summarization techniques have been proposed in the literature to automatically document a code snippet or a function. Ideally, software developers should be involved in assessing the quality of the generated summaries. However,…

Software Engineering · Computer Science 2023-12-27 Antonio Mastropaolo , Matteo Ciniselli , Massimiliano Di Penta , Gabriele Bavota

This paper proposes NOrmed Index of Retention (NOIR), a quantitative objective metric for evaluating summarization quality of arbitrary texts that relies on both the retention of semantic meaning and the summary length compression. This…

Computation and Language · Computer Science 2025-07-11 Andrew D. Foland

Evaluation of text summarization approaches have been mostly based on metrics that measure similarities of system generated summaries with a set of human written gold-standard summaries. The most widely used metric in summarization…

Computation and Language · Computer Science 2016-04-05 Arman Cohan , Nazli Goharian

Reference-based metrics such as ROUGE or BERTScore evaluate the content quality of a summary by comparing the summary to a reference. Ideally, this comparison should measure the summary's information quality by calculating how much…

Computation and Language · Computer Science 2020-10-26 Daniel Deutsch , Dan Roth

Our task is to generate an effective summary for a given document with specific realtime requirements. We use the softplus function to enhance keyword rankings to favor important sentences, based on which we present a number of…

Information Retrieval · Computer Science 2017-10-03 Liqun Shao , Hao Zhang , Ming Jia , Jie Wang

Evaluating multi-document summarization (MDS) quality is difficult. This is especially true in the case of MDS for biomedical literature reviews, where models must synthesize contradicting evidence reported across different documents. Prior…

Computation and Language · Computer Science 2023-05-24 Lucy Lu Wang , Yulia Otmakhova , Jay DeYoung , Thinh Hung Truong , Bailey E. Kuehl , Erin Bransom , Byron C. Wallace

Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging…

Computation and Language · Computer Science 2023-05-23 David Uthus , Jianmo Ni

Automated evaluation metrics as a stand-in for manual evaluation are an essential part of the development of text-generation tasks such as text summarization. However, while the field has progressed, our standard metrics have not -- for…

Computation and Language · Computer Science 2020-10-15 Manik Bhandari , Pranav Gour , Atabak Ashfaq , Pengfei Liu , Graham Neubig

Canonical automatic summary evaluation metrics, such as ROUGE, focus on lexical similarity which cannot well capture semantics nor linguistic quality and require a reference summary which is costly to obtain. Recently, there have been a…

Computation and Language · Computer Science 2022-05-06 Forrest Sheng Bao , Hebi Li , Ge Luo , Minghui Qiu , Yinfei Yang , Youbiao He , Cen Chen

Many Artificial Intelligence tasks cannot be evaluated with a single quality criterion and some sort of weighted combination is needed to provide system rankings. A problem of weighted combination measures is that slight changes in the…

Artificial Intelligence · Computer Science 2014-01-21 Enrique Amigó , Julio Gonzalo , Javier Artiles , Felisa Verdejo

Text summarization refers to the process that generates a shorter form of text from the source document preserving salient information. Many existing works for text summarization are generally evaluated by using recall-oriented understudy…

Computation and Language · Computer Science 2020-11-03 Dongyub Lee , Myeongcheol Shin , Taesun Whang , Seungwoo Cho , Byeongil Ko , Daniel Lee , Eunggyun Kim , Jaechoon Jo

ROUGE is one of the first and most widely used evaluation metrics for text summarization. However, its assessment merely relies on surface similarities between peer and model summaries. Consequently, ROUGE is unable to fairly evaluate…

Computation and Language · Computer Science 2017-10-23 Elaheh ShafieiBavani , Mohammad Ebrahimi , Raymond Wong , Fang Chen

Due to the exponential growth of information and the need for efficient information consumption the task of summarization has gained paramount importance. Evaluating summarization accurately and objectively presents significant challenges,…

Computation and Language · Computer Science 2024-12-31 Dong Yuan , Eti Rastogi , Fen Zhao , Sagar Goyal , Gautam Naik , Sree Prasanna Rajagopal

Summarization evaluation remains an open research problem: current metrics such as ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate this issue, recent work has proposed evaluation metrics which rely…

Computation and Language · Computer Science 2021-04-12 Thomas Scialom , Paul-Alexis Dray , Patrick Gallinari , Sylvain Lamprier , Benjamin Piwowarski , Jacopo Staiano , Alex Wang

The ROUGE metric is commonly used to evaluate extractive summarization task, but it has been criticized for its lack of semantic awareness and its ignorance about the ranking quality of the extractive summarizer. Previous research has…

Computation and Language · Computer Science 2024-07-30 Mousumi Akter , Santu Karmaker

Automatic metrics are used as proxies to evaluate abstractive summarization systems when human annotations are too expensive. To be useful, these metrics should be fine-grained, show a high correlation with human annotations, and ideally be…

Computation and Language · Computer Science 2024-10-16 Théo Gigant , Camille Guinaudeau , Marc Decombas , Frédéric Dufaux
‹ Prev 1 2 3 10 Next ›