Related papers: Exploring Automatic Evaluation Methods based on a …

LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods

Recent studies have applied large language models (LLMs) to machine translation quality estimation (MTQE) by prompting models to assign numeric scores. Nonetheless, these direct scoring methods tend to show low segment-level correlation…

Computation and Language · Computer Science 2025-05-23 Hyang Cui

Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

Large Language Models (LLMs) have excelled at language understanding and generating human-level text. However, even with supervised training and human alignment, these LLMs are susceptible to adversarial attacks where malicious users can…

Computation and Language · Computer Science 2024-08-08 Shachi H Kumar , Saurav Sahay , Sahisnu Mazumder , Eda Okur , Ramesh Manuvinakurike , Nicole Beckage , Hsuan Su , Hung-yi Lee , Lama Nachman

SED: Self-Evaluation Decoding Enhances Large Language Models for Better Generation

Existing Large Language Models (LLMs) generate text through unidirectional autoregressive decoding methods to respond to various user queries. These methods tend to consider token selection in a simple sequential manner, making it easy to…

Computation and Language · Computer Science 2024-05-28 Ziqin Luo , Haixia Han , Haokun Zhao , Guochao Jiang , Chengyu Du , Tingyun Li , Jiaqing Liang , Deqing Yang , Yanghua Xiao

Exploration of Masked and Causal Language Modelling for Text Generation

Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and have achieved state-of-the-art performance in practically every task in this field. However, the prevalent approach used in text generation,…

Computation and Language · Computer Science 2024-08-12 Nicolo Micheletti , Samuel Belkadi , Lifeng Han , Goran Nenadic

GEM: Empowering LLM for both Embedding Generation and Language Understanding

Large decoder-only language models (LLMs) have achieved remarkable success in generation and reasoning tasks, where they generate text responses given instructions. However, many applications, e.g., retrieval augmented generation (RAG),…

Computation and Language · Computer Science 2025-06-06 Caojin Zhang , Qiang Zhang , Ke Li , Sai Vidyaranya Nuthalapati , Benyu Zhang , Jason Liu , Serena Li , Lizhu Zhang , Xiangjun Fan

Language Model Evaluation in Open-ended Text Generation

Although current state-of-the-art language models have achieved impressive results in numerous natural language processing tasks, still they could not solve the problem of producing repetitive, dull and sometimes inconsistent text in…

Computation and Language · Computer Science 2021-08-10 An Nguyen

Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation

Decoding strategies for generative large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Guided by specific hyperparameters, these strategies aim to transform the raw probability distributions…

Computation and Language · Computer Science 2024-12-17 Esteban Garces Arias , Meimingwei Li , Christian Heumann , Matthias Aßenmacher

Machine-generated text detection prevents language model collapse

As Large Language Models (LLMs) become increasingly prevalent, their generated outputs are proliferating across the web, risking a future where machine-generated content dilutes human-authored text. Since online data is the primary resource…

Computation and Language · Computer Science 2025-09-23 George Drayson , Emine Yilmaz , Vasileios Lampos

A Thorough Examination of Decoding Methods in the Era of LLMs

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current…

Computation and Language · Computer Science 2024-10-10 Chufan Shi , Haoran Yang , Deng Cai , Zhisong Zhang , Yifan Wang , Yujiu Yang , Wai Lam

LLM-based NLG Evaluation: Current Status and Challenges

Evaluating natural language generation (NLG) is a vital but challenging problem in natural language processing. Traditional evaluation metrics mainly capturing content (e.g. n-gram) overlap between system outputs and references are far from…

Computation and Language · Computer Science 2025-05-15 Mingqi Gao , Xinyu Hu , Jie Ruan , Xiao Pu , Xiaojun Wan

Evaluating Small Decoder-Only Language Models for Grammar Correction and Text Simplification

Large language models have become extremely popular recently due to their ability to achieve strong performance on a variety of tasks, such as text generation and rewriting, but their size and computation cost make them difficult to access,…

Computation and Language · Computer Science 2026-01-08 Anthony Lamelas

Automatic Detection of Generated Text is Easiest when Humans are Fooled

Recent advancements in neural language modelling make it possible to rapidly generate vast amounts of human-sounding text. The capabilities of humans and automatic discriminators to detect machine-generated text have been a large source of…

Computation and Language · Computer Science 2020-05-11 Daphne Ippolito , Daniel Duckworth , Chris Callison-Burch , Douglas Eck

A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations

Recent advances in deep learning have significantly enhanced generative AI capabilities across text, images, and audio. However, automatically evaluating the quality of these generated outputs presents ongoing challenges. Although numerous…

Computation and Language · Computer Science 2025-06-13 Tian Lan , Yang-Hao Zhou , Zi-Ao Ma , Fanshu Sun , Rui-Qing Sun , Junyu Luo , Rong-Cheng Tu , Heyan Huang , Chen Xu , Zhijing Wu , Xian-Ling Mao

Automated Evaluation of Personalized Text Generation using Large Language Models

Personalized text generation presents a specialized mechanism for delivering content that is specific to a user's personal context. While the research progress in this area has been rapid, evaluation still presents a challenge. Traditional…

Computation and Language · Computer Science 2023-10-19 Yaqing Wang , Jiepu Jiang , Mingyang Zhang , Cheng Li , Yi Liang , Qiaozhu Mei , Michael Bendersky

Evaluation of Text Generation: A Survey

The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years. We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic…

Computation and Language · Computer Science 2021-05-19 Asli Celikyilmaz , Elizabeth Clark , Jianfeng Gao

Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models

Accurately evaluating machine-translated text remains a long-standing challenge, particularly for long documents. Recent work has shown that large language models (LLMs) can serve as reliable and interpretable sentence-level translation…

Computation and Language · Computer Science 2025-10-06 Tobias Domhan , Dawei Zhu

Entailed Opinion Matters: Improving the Fact-Checking Performance of Language Models by Relying on their Entailment Ability

Automated fact-checking has been a challenging task for the research community. Prior work has explored various strategies, such as end-to-end training, retrieval-augmented generation, and prompt engineering, to build robust fact-checking…

Computation and Language · Computer Science 2026-02-23 Gaurav Kumar , Ayush Garg , Debajyoti Mazumder , Aditya Kishore , Babu kumar , Jasabanta Patro

Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?

This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of user-generated content (UGC) that contains emotional expressions, without the use of reference translations. To…

Computation and Language · Computer Science 2024-10-10 Shenbin Qian , Constantin Orăsan , Diptesh Kanojia , Félix do Carmo

Neural Text Generation: A Practical Guide

Deep learning methods have recently achieved great empirical success on machine translation, dialogue response generation, summarization, and other text generation tasks. At a high level, the technique has been to train end-to-end neural…

Computation and Language · Computer Science 2017-11-28 Ziang Xie

Self-Evaluation Improves Selective Generation in Large Language Models

Safe deployment of large language models (LLMs) may benefit from a reliable method for assessing their generated content to determine when to abstain or to selectively generate. While likelihood-based metrics such as perplexity are widely…

Computation and Language · Computer Science 2023-12-18 Jie Ren , Yao Zhao , Tu Vu , Peter J. Liu , Balaji Lakshminarayanan