Related papers: WIDAR -- Weighted Input Document Augmented ROUGE

ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks

Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While…

Information Retrieval · Computer Science 2018-03-07 Kavita Ganesan

Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

Abstractive summarization approaches based on Reinforcement Learning (RL) have recently been proposed to overcome classical likelihood maximization. RL enables to consider complex, possibly non-differentiable, metrics that globally assess…

Computation and Language · Computer Science 2019-09-05 Thomas Scialom , Sylvain Lamprier , Benjamin Piwowarski , Jacopo Staiano

Better Summarization Evaluation with Word Embeddings for ROUGE

ROUGE is a widely adopted, automatic evaluation measure for text summarization. While it has been shown to correlate well with human judgements, it is biased towards surface lexical similarities. This makes it unsuitable for the evaluation…

Computation and Language · Computer Science 2015-08-26 Jun-Ping Ng , Viktoria Abrecht

Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

A desirable property of a reference-based evaluation metric that measures the content quality of a summary is that it should estimate how much information that summary has in common with a reference. Traditional text overlap based metrics…

Computation and Language · Computer Science 2021-07-28 Daniel Deutsch , Tania Bedrax-Weiss , Dan Roth

Evaluating Code Summarization Techniques: A New Metric and an Empirical Characterization

Several code summarization techniques have been proposed in the literature to automatically document a code snippet or a function. Ideally, software developers should be involved in assessing the quality of the generated summaries. However,…

Software Engineering · Computer Science 2023-12-27 Antonio Mastropaolo , Matteo Ciniselli , Massimiliano Di Penta , Gabriele Bavota

An Automated Length-Aware Quality Metric for Summarization

This paper proposes NOrmed Index of Retention (NOIR), a quantitative objective metric for evaluating summarization quality of arbitrary texts that relies on both the retention of semantic meaning and the summary length compression. This…

Computation and Language · Computer Science 2025-07-11 Andrew D. Foland

Revisiting Summarization Evaluation for Scientific Articles

Evaluation of text summarization approaches have been mostly based on metrics that measure similarities of system generated summaries with a set of human written gold-standard summaries. The most widely used metric in summarization…

Computation and Language · Computer Science 2016-04-05 Arman Cohan , Nazli Goharian

Understanding the Extent to which Summarization Evaluation Metrics Measure the Information Quality of Summaries

Reference-based metrics such as ROUGE or BERTScore evaluate the content quality of a summary by comparing the summary to a reference. Ideally, this comparison should measure the summary's information quality by calculating how much…

Computation and Language · Computer Science 2020-10-26 Daniel Deutsch , Dan Roth

Efficient and Effective Single-Document Summarizations and A Word-Embedding Measurement of Quality

Our task is to generate an effective summary for a given document with specific realtime requirements. We use the softplus function to enhance keyword rankings to favor important sentences, based on which we present a number of…

Information Retrieval · Computer Science 2017-10-03 Liqun Shao , Hao Zhang , Ming Jia , Jie Wang

Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations

Evaluating multi-document summarization (MDS) quality is difficult. This is especially true in the case of MDS for biomedical literature reviews, where models must synthesize contradicting evidence reported across different documents. Prior…

Computation and Language · Computer Science 2023-05-24 Lucy Lu Wang , Yulia Otmakhova , Jay DeYoung , Thinh Hung Truong , Bailey E. Kuehl , Erin Bransom , Byron C. Wallace

RISE: Leveraging Retrieval Techniques for Summarization Evaluation

Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging…

Computation and Language · Computer Science 2023-05-23 David Uthus , Jianmo Ni

Re-evaluating Evaluation in Text Summarization

Automated evaluation metrics as a stand-in for manual evaluation are an essential part of the development of text-generation tasks such as text summarization. However, while the field has progressed, our standard metrics have not -- for…

Computation and Language · Computer Science 2020-10-15 Manik Bhandari , Pranav Gour , Atabak Ashfaq , Pengfei Liu , Graham Neubig

SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling

Canonical automatic summary evaluation metrics, such as ROUGE, focus on lexical similarity which cannot well capture semantics nor linguistic quality and require a reference summary which is costly to obtain. Recently, there have been a…

Computation and Language · Computer Science 2022-05-06 Forrest Sheng Bao , Hebi Li , Ge Luo , Minghui Qiu , Yinfei Yang , Youbiao He , Cen Chen

Combining Evaluation Metrics via the Unanimous Improvement Ratio and its Application to Clustering Tasks

Many Artificial Intelligence tasks cannot be evaluated with a single quality criterion and some sort of weighted combination is needed to provide system rankings. A problem of weighted combination measures is that slight changes in the…

Artificial Intelligence · Computer Science 2014-01-21 Enrique Amigó , Julio Gonzalo , Javier Artiles , Felisa Verdejo

Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization

Text summarization refers to the process that generates a shorter form of text from the source document preserving salient information. Many existing works for text summarization are generally evaluated by using recall-oriented understudy…

Computation and Language · Computer Science 2020-11-03 Dongyub Lee , Myeongcheol Shin , Taesun Whang , Seungwoo Cho , Byeongil Ko , Daniel Lee , Eunggyun Kim , Jaechoon Jo

A Semantically Motivated Approach to Compute ROUGE Scores

ROUGE is one of the first and most widely used evaluation metrics for text summarization. However, its assessment merely relies on surface similarities between peer and model summaries. Consequently, ROUGE is unable to fairly evaluate…

Computation and Language · Computer Science 2017-10-23 Elaheh ShafieiBavani , Mohammad Ebrahimi , Raymond Wong , Fang Chen

Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM

Due to the exponential growth of information and the need for efficient information consumption the task of summarization has gained paramount importance. Evaluating summarization accurately and objectively presents significant challenges,…

Computation and Language · Computer Science 2024-12-31 Dong Yuan , Eti Rastogi , Fen Zhao , Sagar Goyal , Gautam Naik , Sree Prasanna Rajagopal

QuestEval: Summarization Asks for Fact-based Evaluation

Summarization evaluation remains an open research problem: current metrics such as ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate this issue, recent work has proposed evaluation metrics which rely…

Computation and Language · Computer Science 2021-04-12 Thomas Scialom , Paul-Alexis Dray , Patrick Gallinari , Sylvain Lamprier , Benjamin Piwowarski , Jacopo Staiano , Alex Wang

Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization

The ROUGE metric is commonly used to evaluate extractive summarization task, but it has been criticized for its lack of semantic awareness and its ignorance about the ranking quality of the extractive summarizer. Previous research has…

Computation and Language · Computer Science 2024-07-30 Mousumi Akter , Santu Karmaker

Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics

Automatic metrics are used as proxies to evaluate abstractive summarization systems when human annotations are too expensive. To be useful, these metrics should be fine-grained, show a high correlation with human annotations, and ideally be…

Computation and Language · Computer Science 2024-10-16 Théo Gigant , Camille Guinaudeau , Marc Decombas , Frédéric Dufaux