English
Related papers

Related papers: Mark-Evaluate: Assessing Language Generation using…

200 papers

Mark-and-Recapture is a methodology from Population Biology to estimate the number of a species without counting every individual. This is done by multiple samplings of the species using traps and discounting the instances that were caught…

Digital Libraries · Computer Science 2015-03-24 Chuan Wen Loe , Henrik Jeldtoft Jensen

Automatic evaluation of language generation systems is a well-studied problem in Natural Language Processing. While novel metrics are proposed every year, a few popular metrics remain as the de facto metrics to evaluate tasks such as image…

Computation and Language · Computer Science 2020-10-27 Ozan Caglayan , Pranava Madhyastha , Lucia Specia

Capture-recapture methods aim to estimate the size of a closed population on the basis of multiple incomplete enumerations of individuals. In many applications, the individual probability of being recorded is heterogeneous in the…

Methodology · Statistics 2016-06-08 James E. Johndrow , Kristian Lum , Daniel Manrique-Vallier

Population size estimation based on capture-recapture experiment under triple record system is an interesting problem in various fields including epidemiology, population studies, etc. In many real life scenarios, there exists inherent…

Methodology · Statistics 2022-01-04 Kiranmoy Chatterjee , Prajamitra Bhuyan

Response diversity has become an important criterion for evaluating the quality of open-domain dialogue generation models. However, current evaluation metrics for response diversity often fail to capture the semantic diversity of generated…

Computation and Language · Computer Science 2022-10-25 Seungju Han , Beomsu Kim , Buru Chang

Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can…

Computation and Language · Computer Science 2018-10-03 Jekaterina Novikova , Ondřej Dušek , Verena Rieser

Automatic metrics are extensively used to evaluate natural language processing systems. However, there has been increasing focus on how they are used and reported by practitioners within the field. In this paper, we have conducted a survey…

Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric:…

Computation and Language · Computer Science 2020-08-20 Jing Gu , Qingyang Wu , Zhou Yu

Evaluating Natural Language Generation (NLG) systems is a challenging task. Firstly, the metric should ensure that the generated hypothesis reflects the reference's semantics. Secondly, it should consider the grammatical quality of the…

Computation and Language · Computer Science 2022-03-18 Md Rashad Al Hasan Rony , Liubov Kovriguina , Debanjan Chaudhuri , Ricardo Usbeck , Jens Lehmann

Human ratings are one of the most prevalent methods to evaluate the performance of natural language processing algorithms. Similarly, it is common to measure the quality of sentences generated by a natural language generation model using…

Computation and Language · Computer Science 2021-04-13 Jakob Nyberg , Ramesh Manuvinakurike , Maike Paetzel-Prüsmann

How can we measure whether a natural language generation system produces both high quality and diverse outputs? Human evaluation captures quality but not diversity, as it does not catch models that simply plagiarize from the training set.…

Computation and Language · Computer Science 2019-04-08 Tatsunori B. Hashimoto , Hugh Zhang , Percy Liang

Population size estimation based on two sample capture-recapture type experiment is an interesting problem in various fields including epidemiology, pubic health, population studies, etc. The Lincoln-Petersen estimate is popularly used…

Methodology · Statistics 2019-01-21 Kiranmoy Chatterjee , Prajamitra Bhuyan

We propose a modern method to estimate population size based on capture-recapture designs of K samples. The observed data is formulated as a sample of n i.i.d. K-dimensional vectors of binary indicators, where the k-th component of each…

Text generation is an important Natural Language Processing task with various applications. Although several metrics have already been introduced to evaluate the text generation methods, each of them has its own shortcomings. The most…

Machine Learning · Computer Science 2019-05-22 Ehsan Montahaei , Danial Alihosseini , Mahdieh Soleymani Baghshah

Natural language processing (NLP) systems are increasingly trained to generate open-ended text rather than classifying between responses. This makes research on evaluation metrics for generated language -- functions that score system output…

Computation and Language · Computer Science 2021-10-19 Thomas Scialom , Felix Hill

The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years. We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic…

Computation and Language · Computer Science 2021-05-19 Asli Celikyilmaz , Elizabeth Clark , Jianfeng Gao

Large language models can now directly generate answers to many factual questions without referencing external sources. Unfortunately, relatively little attention has been paid to methods for evaluating the quality and correctness of these…

Information Retrieval · Computer Science 2024-01-11 Negar Arabzadeh , Amin Bigdeli , Charles L. A. Clarke

Although current state-of-the-art language models have achieved impressive results in numerous natural language processing tasks, still they could not solve the problem of producing repetitive, dull and sometimes inconsistent text in…

Computation and Language · Computer Science 2021-08-10 An Nguyen

Collecting human judgements is currently the most reliable evaluation method for natural language generation systems. Automatic metrics have reported flaws when applied to measure quality aspects of generated text and have been shown to…

Computation and Language · Computer Science 2022-04-29 Thórhildur Thorleiksdóttir , Cedric Renggli , Nora Hollenstein , Ce Zhang

Recent advances in language modeling have demonstrated significant improvements in zero-shot capabilities, including in-context learning, instruction following, and machine translation for extremely under-resourced languages (Tanzer et al.,…

Computation and Language · Computer Science 2024-12-30 Albert Kornilov , Tatiana Shavrina
‹ Prev 1 2 3 10 Next ›