English
Related papers

Related papers: Designing generalisation evaluation function throu…

200 papers

Many real world problems can be defined as optimisation problems in which the aim is to maximise an objective function. The quality of obtained solution is directly linked to the pertinence of the used objective function. However, designing…

Machine Learning · Computer Science 2012-04-24 Patrick Taillandier , Julien Gaffuri

Automatically evaluating text-based, non-task-oriented dialogue systems (i.e., `chatbots') remains an open problem. Previous approaches have suffered challenges ranging from poor correlation with human judgment to poor generalization and…

Computation and Language · Computer Science 2021-04-14 Ian Berlot-Attwell , Frank Rudzicz

Automatic dialogue evaluation plays a crucial role in open-domain dialogue research. Previous works train neural networks with limited annotation for conducting automatic dialogue evaluation, which would naturally affect the evaluation…

Computation and Language · Computer Science 2019-12-11 Lu Li , Zhongheng He , Xiangyang Zhou , Dianhai Yu

Evaluation of open-domain dialogue systems is highly challenging and development of better techniques is highlighted time and again as desperately needed. Despite substantial efforts to carry out reliable live evaluation of systems in…

Computation and Language · Computer Science 2022-03-14 Tianbo Ji , Yvette Graham , Gareth J. F. Jones , Chenyang Lyu , Qun Liu

Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly…

What makes large language models (LLMs) impressive is also what makes them hard to evaluate: their diversity of uses. To evaluate these models, we must understand the purposes they will be used for. We consider a setting where these…

Computation and Language · Computer Science 2024-06-04 Keyon Vafa , Ashesh Rambachan , Sendhil Mullainathan

Recent advances in AI -- including generative approaches -- have resulted in technology that can support humans in scientific discovery and forming decisions, but may also disrupt democracies and target individuals. The responsible use of…

The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation. Existing metrics have been shown to correlate poorly with human judgement, particularly in…

Computation and Language · Computer Science 2019-09-10 Prakhar Gupta , Shikib Mehri , Tiancheng Zhao , Amy Pavel , Maxine Eskenazi , Jeffrey P. Bigham

Evaluating the quality of a dialogue system is an understudied problem. The recent evolution of evaluation method motivated this survey, in which an explicit and comprehensive analysis of the existing methods is sought. We are first to…

Computation and Language · Computer Science 2021-08-04 Xinmeng Li , Wansen Wu , Long Qin , Quanjun Yin

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response…

Computation and Language · Computer Science 2018-01-18 Ryan Lowe , Michael Noseworthy , Iulian V. Serban , Nicolas Angelard-Gontier , Yoshua Bengio , Joelle Pineau

In designing an intelligent system that must be able to explain its reasoning to a human user, or to provide generalizations that the human user finds reasonable, it may be useful to take into consideration psychological data on what types…

Artificial Intelligence · Computer Science 2013-04-15 James E. Corter , Mark A. Gluck

In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and…

Computation and Language · Computer Science 2020-06-29 Jan Deriu , Alvaro Rodrigo , Arantxa Otegi , Guillermo Echegoyen , Sophie Rosset , Eneko Agirre , Mark Cieliebak

Automatic evaluation of various text quality criteria produced by data-driven intelligent methods is very common and useful because it is cheap, fast, and usually yields repeatable results. In this paper, we present an attempt to automate…

Computation and Language · Computer Science 2020-06-08 Erion Çano , Ondřej Bojar

We present "AutoJudge", an automated evaluation method for conversational dialogue systems. The method works by first generating dialogues based on self-talk, i.e. dialogue systems talking to itself. Then, it uses human ratings on these…

Artificial Intelligence · Computer Science 2020-06-26 Jan Deriu , Mark Cieliebak

Automatic dialogue response evaluator has been proposed as an alternative to automated metrics and human evaluation. However, existing automatic evaluators achieve only moderate correlation with human judgement and they are not robust. In…

Computation and Language · Computer Science 2020-04-27 Tianyu Zhao , Divesh Lala , Tatsuya Kawahara

Complex, multi-task problems have proven to be difficult to solve efficiently in a sparse-reward reinforcement learning setting. In order to be sample efficient, multi-task learning requires reuse and sharing of low-level policies. To…

Machine Learning · Computer Science 2021-09-28 Valerie Chen , Abhinav Gupta , Kenneth Marino

Though generative dialogue modeling is widely seen as a language modeling task, the task demands an agent to have a complex natural language understanding of its input text to carry a meaningful interaction with an user. The automatic…

Computation and Language · Computer Science 2020-08-25 Prasanna Parthasarathi , Joelle Pineau , Sarath Chandar

A major challenge in the field of Text Generation is evaluation: Human evaluations are cost-intensive, and automated metrics often display considerable disagreement with human judgments. In this paper, we propose a statistical model of Text…

Computation and Language · Computer Science 2023-06-07 Jan Deriu , Pius von Däniken , Don Tuggener , Mark Cieliebak

Effective summarisation evaluation metrics enable researchers and practitioners to compare different summarisation systems efficiently. Estimating the effectiveness of an automatic evaluation metric, termed meta-evaluation, is a critically…

Computation and Language · Computer Science 2024-10-01 Xiang Dai , Sarvnaz Karimi , Biaoyan Fang

Although pre-trained language models encode generic knowledge beneficial for planning and control, they may fail to generate appropriate control policies for domain-specific tasks. Existing fine-tuning methods use human feedback to address…

Artificial Intelligence · Computer Science 2024-04-02 Yunhao Yang , Neel P. Bhatt , Tyler Ingebrand , William Ward , Steven Carr , Zhangyang Wang , Ufuk Topcu
‹ Prev 1 2 3 10 Next ›