English
Related papers

Related papers: Learning Evaluation Models from Large Language Mod…

200 papers

Personalized text generation presents a specialized mechanism for delivering content that is specific to a user's personal context. While the research progress in this area has been rapid, evaluation still presents a challenge. Traditional…

Computation and Language · Computer Science 2023-10-19 Yaqing Wang , Jiepu Jiang , Mingyang Zhang , Cheng Li , Yi Liang , Qiaozhu Mei , Michael Bendersky

Many natural language processing applications use language models to generate text. These models are typically trained to predict the next word in a sequence, given the previous words and some context such as an image. However, at test time…

Machine Learning · Computer Science 2016-05-10 Marc'Aurelio Ranzato , Sumit Chopra , Michael Auli , Wojciech Zaremba

Large Language Models (LLMs) are powerful models for generation tasks, but they may not generate good quality outputs in their first attempt. Apart from model fine-tuning, existing approaches to improve prediction accuracy and quality…

Computation and Language · Computer Science 2024-11-05 Jason Cai , Hang Su , Monica Sunkara , Igor Shalyminov , Saab Mansour

Is it possible to train a general metric for evaluating text generation quality without human annotated ratings? Existing learned metrics either perform unsatisfactorily across text generation tasks or require human ratings for training on…

Computation and Language · Computer Science 2023-07-10 Wenda Xu , Xian Qian , Mingxuan Wang , Lei Li , William Yang Wang

Is it possible to build a general and automatic natural language generation (NLG) evaluation metric? Existing learned metrics either perform unsatisfactorily or are restricted to tasks where large human rating data is already available. We…

Computation and Language · Computer Science 2022-10-27 Wenda Xu , Yilin Tuan , Yujie Lu , Michael Saxon , Lei Li , William Yang Wang

Text generation has made significant advances in the last few years. Yet, evaluation metrics have lagged behind, as the most popular choices (e.g., BLEU and ROUGE) may correlate poorly with human judgments. We propose BLEURT, a learned…

Computation and Language · Computer Science 2020-05-22 Thibault Sellam , Dipanjan Das , Ankur P. Parikh

The quality of meeting summaries generated by natural language generation (NLG) systems is hard to measure automatically. Established metrics such as ROUGE and BERTScore have a relatively low correlation with human judgments and fail to…

Computation and Language · Computer Science 2025-02-19 Frederic Kirstein , Terry Ruas , Bela Gipp

As a fundamental task in natural language processing, Chinese Grammatical Error Correction (CGEC) has gradually received widespread attention and become a research hotspot. However, one obvious deficiency for the existing CGEC evaluation…

Computation and Language · Computer Science 2022-05-03 Nankai Lin , Nankai Lin , Xiaotian Lin , Ziyu Yang , Shengyi Jiang

Response diversity has become an important criterion for evaluating the quality of open-domain dialogue generation models. However, current evaluation metrics for response diversity often fail to capture the semantic diversity of generated…

Computation and Language · Computer Science 2022-10-25 Seungju Han , Beomsu Kim , Buru Chang

Safe deployment of large language models (LLMs) may benefit from a reliable method for assessing their generated content to determine when to abstain or to selectively generate. While likelihood-based metrics such as perplexity are widely…

Computation and Language · Computer Science 2023-12-18 Jie Ren , Yao Zhao , Tu Vu , Peter J. Liu , Balaji Lakshminarayanan

A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation, which is an important research field in NLP and software engineering. Prevailing match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) suffer from…

Software Engineering · Computer Science 2024-09-06 Yihong Dong , Jiazheng Ding , Xue Jiang , Ge Li , Zhuo Li , Zhi Jin

Recent advancements in the field of natural language generation have facilitated the use of large language models to assess the quality of generated text. Although these models have shown promising results in tasks such as machine…

Artificial Intelligence · Computer Science 2024-01-23 Terry Yue Zhuo

Grammar competency estimation is essential for assessing linguistic proficiency in both written and spoken language; however, the spoken modality presents additional challenges due to its spontaneous, unstructured, and disfluent nature.…

Computation and Language · Computer Science 2025-11-18 Sourya Dipta Das , Shubham Kumar , Kuldeep Yadav

Existing Large Language Models (LLMs) generate text through unidirectional autoregressive decoding methods to respond to various user queries. These methods tend to consider token selection in a simple sequential manner, making it easy to…

Computation and Language · Computer Science 2024-05-28 Ziqin Luo , Haixia Han , Haokun Zhao , Guochao Jiang , Chengyu Du , Tingyun Li , Jiaqing Liang , Deqing Yang , Yanghua Xiao

As world knowledge advances and new task schemas emerge, Continual Learning (CL) becomes essential for keeping Large Language Models (LLMs) current and addressing their shortcomings. This process typically involves continual instruction…

Machine Learning · Computer Science 2024-12-17 Haokun Zhao , Haixia Han , Jie Shi , Chengyu Du , Jiaqing Liang , Yanghua Xiao

Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation. To train such evaluators, the standard approach is to collect a large amount of human…

Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their…

Machine Learning · Computer Science 2024-11-05 David Farr , Nico Manzonelli , Iain Cruickshank , Jevin West

Widely used learned metrics for machine translation evaluation, such as COMET and BLEURT, estimate the quality of a translation hypothesis by providing a single sentence-level score. As such, they offer little insight into translation…

Computation and Language · Computer Science 2023-10-17 Nuno M. Guerreiro , Ricardo Rei , Daan van Stigt , Luisa Coheur , Pierre Colombo , André F. T. Martins

Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and have achieved state-of-the-art performance in practically every task in this field. However, the prevalent approach used in text generation,…

Computation and Language · Computer Science 2024-08-12 Nicolo Micheletti , Samuel Belkadi , Lifeng Han , Goran Nenadic

Large Language Models (LLMs) have spurred interest in automatic evaluation methods for summarization, offering a faster, more cost-effective alternative to human evaluation. However, existing methods often fall short when applied to complex…

Computation and Language · Computer Science 2024-09-18 Ziwei Gong , Lin Ai , Harshsaiprasad Deshpande , Alexander Johnson , Emmy Phung , Zehui Wu , Ahmad Emami , Julia Hirschberg
‹ Prev 1 2 3 10 Next ›