English
Related papers

Related papers: FlowEval: A Consensus-Based Dialogue Evaluation Fr…

200 papers

A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring…

Computation and Language · Computer Science 2021-06-08 Chen Zhang , Yiming Chen , Luis Fernando D'Haro , Yan Zhang , Thomas Friedrichs , Grandee Lee , Haizhou Li

Building a reliable and automated evaluation metric is a necessary but challenging problem for open-domain dialogue systems. Recent studies proposed evaluation metrics that assess generated responses by considering their relevance to…

Computation and Language · Computer Science 2024-07-19 ChaeHun Park , Minseok Choi , Dohyun Lee , Jaegul Choo

Recent dialogue coherence models use the coherence features designed for monologue texts, e.g. nominal entities, to represent utterances and then explicitly augment them with dialogue-relevant features, e.g., dialogue act labels. It…

Computation and Language · Computer Science 2020-06-04 Mohsen Mesgar , Sebastian Bücker , Iryna Gurevych

While large language models (LLMs) and coding agents are often applied to user interface (UI) development, developers find it difficult to reliably assess their proficiency in visual and interaction design. Existing evaluations either rely…

Multiagent Systems · Computer Science 2026-05-07 Jason Wu , Priyan Vaithilingam , Eldon Schoop , Jeffrey Nichols , Titus Barik

Nowadays, open-domain dialogue models can generate acceptable responses according to the historical context based on the large-scale pre-trained language models. However, they generally concatenate the dialogue history directly as the model…

Computation and Language · Computer Science 2021-06-07 Zekang Li , Jinchao Zhang , Zhengcong Fei , Yang Feng , Jie Zhou

Recent model-based reference-free metrics for open-domain dialogue evaluation exhibit promising correlations with human judgment. However, they either perform turn-level evaluation or look at a single dialogue quality dimension. One would…

Computation and Language · Computer Science 2022-11-01 Chen Zhang , Luis Fernando D'Haro , Qiquan Zhang , Thomas Friedrichs , Haizhou Li

Conversational machine comprehension requires deep understanding of the dialogue flow, and the prior work proposed FlowQA to implicitly model the context representations in reasoning for better understanding. This paper proposes to…

Computation and Language · Computer Science 2020-01-20 Yi-Ting Yeh , Yun-Nung Chen

Evaluating the quality of a dialogue interaction between two agents is a difficult task, especially in open-domain chit-chat style dialogue. There have been recent efforts to develop automatic dialogue evaluation metrics, but most of them…

Computation and Language · Computer Science 2020-05-05 Koustuv Sinha , Prasanna Parthasarathi , Jasmine Wang , Ryan Lowe , William L. Hamilton , Joelle Pineau

This paper introduces a novel Self-supervised Fine-grained Dialogue Evaluation framework (SelF-Eval). The core idea is to model the correlation between turn quality and the entire dialogue quality. We first propose a novel automatic data…

Computation and Language · Computer Science 2022-09-19 Longxuan Ma , Ziyu Zhuang , Weinan Zhang , Mingda Li , Ting Liu

Existing dialogue quality evaluation systems can return a score for a given system turn from a particular viewpoint, e.g., engagingness. However, to improve dialogue systems by locating exactly where in a system turn potential problems lie,…

Computation and Language · Computer Science 2023-10-03 Rikiya Takehi , Akihisa Watanabe , Tetsuya Sakai

Progress in speech processing has been facilitated by shared datasets and benchmarks. Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks. Interest has been growing in…

Computation and Language · Computer Science 2022-08-01 Suwon Shon , Ankita Pasad , Felix Wu , Pablo Brusco , Yoav Artzi , Karen Livescu , Kyu J. Han

We present an automated evaluation method to measure fluidity in conversational dialogue systems. The method combines various state of the art Natural Language tools into a classifier, and human ratings on these dialogues to train an…

Computation and Language · Computer Science 2019-10-28 Keith Vella , Massimo Poesio , Michael Sigamani , Cihan Dogan , Aimore Dutra , Dimitrios Dimakopoulos , Alfredo Gemma , Ella Walters

Dialogue assessment plays a critical role in the development of open-domain dialogue systems. Existing work are uncapable of providing an end-to-end and human-epistemic assessment dataset, while they only provide sub-metrics like coherence…

Computation and Language · Computer Science 2023-10-26 Yukun Zhao , Lingyong Yan , Weiwei Sun , Chong Meng , Shuaiqiang Wang , Zhicong Cheng , Zhaochun Ren , Dawei Yin

DataFlow has been emerging as a new paradigm for building task-oriented chatbots due to its expressive semantic representations of the dialogue tasks. Despite the availability of a large dataset SMCalFlow and a simplified syntax, the…

Computation and Language · Computer Science 2022-12-19 Han He , Song Feng , Daniele Bonadiman , Yi Zhang , Saab Mansour

Commonsense reasoning is omnipresent in human communications and thus is an important feature for open-domain dialogue systems. However, evaluating commonsense in dialogue systems is still an open challenge. We take the first step by…

Computation and Language · Computer Science 2023-11-06 Sarik Ghazarian , Yijia Shao , Rujun Han , Aram Galstyan , Nanyun Peng

Automatic evaluation is beneficial for open-domain dialog system development. However, standard word-overlap metrics (BLEU, ROUGE) do not correlate well with human judgements of open-domain dialog systems. In this work we propose to use the…

Computation and Language · Computer Science 2022-02-18 Sarik Ghazarian , Behnam Hedayatnia , Alexandros Papangelis , Yang Liu , Dilek Hakkani-Tur

The ability to model and automatically detect dialogue act is an important step toward understanding spontaneous speech and Instant Messages. However, it has been difficult to infer a dialogue act from a surface utterance because it highly…

Computation and Language · Computer Science 2018-06-05 AbdelRahim Elmadany , Sherif Abdou , Mervat Gheith

Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to…

Computation and Language · Computer Science 2022-03-29 Sarik Ghazarian , Behnam Hedayatnia , Alexandros Papangelis , Yang Liu , Dilek Hakkani-Tur

Large Language Models (LLMs) have demonstrated remarkable capabilities in orchestrating tools for reasoning tasks. However, existing methods rely on a step-wise paradigm that lacks a global perspective, which causes error accumulation over…

Artificial Intelligence · Computer Science 2026-05-11 Tairan Huang , Siyu Shang , Qiang Chen , Xiu Su , Yi Chen

Evaluating the quality of a dialogue system is an understudied problem. The recent evolution of evaluation method motivated this survey, in which an explicit and comprehensive analysis of the existing methods is sought. We are first to…

Computation and Language · Computer Science 2021-08-04 Xinmeng Li , Wansen Wu , Long Qin , Quanjun Yin
‹ Prev 1 2 3 10 Next ›