English
Related papers

Related papers: Direct Judgement Preference Optimization

200 papers

Learning from preference feedback is a common practice for aligning large language models~(LLMs) with human value. Conventionally, preference data is learned and encoded into a scalar reward model that connects a value head with an LLM to…

Computation and Language · Computer Science 2025-09-03 Ziyi Ye , Xiangsheng Li , Qiuchi Li , Qingyao Ai , Yujia Zhou , Wei Shen , Dong Yan , Yiqun Liu

LLM-as-a-Judge refers to the automatic modeling of preferences for responses generated by Large Language Models (LLMs), which is of significant importance for both LLM evaluation and reward modeling. Although generative LLMs have made…

Computation and Language · Computer Science 2026-01-13 Hui Huang , Yancheng He , Hongli Zhou , Rui Zhang , Wei Liu , Weixun Wang , Jiaheng Liu , Wenbo Su

LLM-as-a-Judge leverages the generative and reasoning capabilities of large language models (LLMs) to evaluate LLM responses across diverse scenarios, providing accurate preference signals. This approach plays a vital role in aligning LLMs…

Computation and Language · Computer Science 2025-09-09 Jiachen Yu , Shaoning Sun , Xiaohui Hu , Jiaxu Yan , Kaidong Yu , Xuelong Li

Large language models (LLMs) are being widely applied across various fields, but as tasks become more complex, evaluating their responses is increasingly challenging. Compared to human evaluators, the use of LLMs to support performance…

Artificial Intelligence · Computer Science 2025-04-25 Yuran Li , Jama Hussein Mohamud , Chongren Sun , Di Wu , Benoit Boulet

Large language model (LLM)-based judges are widely adopted for automated evaluation and reward modeling, yet their judgments are often affected by judgment biases. Accurately evaluating these biases is essential for ensuring the reliability…

Computation and Language · Computer Science 2026-03-10 Hongli Zhou , Hui Huang , Rui Zhang , Kehai Chen , Bing Xu , Conghui Zhu , Tiejun Zhao , Muyun Yang

Large Language Models (LLMs) as automatic evaluators, commonly referred to as LLM-as-a-Judge, have also attracted growing attention. This approach plays a vital role in aligning LLMs with human judgments, providing accurate and reliable…

Computation and Language · Computer Science 2026-04-22 Shuliang Liu , Zhipeng Xu , Zhenghao Liu , Yukun Yan , Minghe Yu , Yu Gu , Chong Chen , Huiyuan Xie , Ge Yu

Large language models (LLMs) can serve as judges that offer rapid and reliable assessments of other LLM outputs. However, models may systematically assign overly favorable ratings to their own outputs, a phenomenon known as self-bias, which…

Computation and Language · Computer Science 2025-08-12 Evangelia Spiliopoulou , Riccardo Fogliato , Hanna Burnsky , Tamer Soliman , Jie Ma , Graham Horwood , Miguel Ballesteros

Large Language Models (LLMs) have demonstrated remarkable progress through preference-based fine-tuning, which critically depends on the quality of the underlying training data. While human feedback is essential for improving data quality,…

Artificial Intelligence · Computer Science 2025-10-31 Derin Cayir , Renjie Tao , Rashi Rungta , Kai Sun , Sean Chen , Haidar Khan , Minseok Kim , Julia Reinspach , Yue Liu

Large language models (LLMs) are increasingly used as automatic evaluators in applications such as benchmarking, reward modeling, and self-refinement. Prior work highlights a potential self-preference bias where LLMs favor their own…

Computation and Language · Computer Science 2025-12-16 Wei-Lin Chen , Zhepei Wei , Xinyu Zhu , Shi Feng , Yu Meng

Recent advancements in Large Language Models (LLMs) have been remarkable, with new models consistently surpassing their predecessors. These advancements are underpinned by extensive research on various training mechanisms. Among these,…

Computation and Language · Computer Science 2024-12-12 Hansle Gwon , Imjin Ahn , Young-Hak Kim , Sanghyun Park , Tae Joon Jun

Effective training of language models (LMs) for mathematical reasoning tasks demands high-quality supervised fine-tuning data. Besides obtaining annotations from human experts, a common alternative is sampling from larger and more powerful…

Computation and Language · Computer Science 2024-07-26 Tianduo Wang , Shichen Li , Wei Lu

The capabilities of Large Language Models (LLMs) are routinely evaluated by other LLMs trained to predict human preferences. This framework--known as LLM-as-a-judge--is highly scalable and relatively low cost. However, it is also vulnerable…

Computation and Language · Computer Science 2026-02-03 Lisa Alazraki , Tan Yi-Chern , Jon Ander Campos , Maximilian Mozes , Marek Rei , Max Bartolo

The rapid development of Large Language Models (LLMs) has substantially expanded the range of tasks they can address. In the field of Natural Language Processing (NLP), researchers have shifted their focus from conventional NLP tasks (e.g.,…

Computation and Language · Computer Science 2023-12-08 Junlong Li , Shichao Sun , Weizhe Yuan , Run-Ze Fan , Hai Zhao , Pengfei Liu

Large Language Models (LLMs) are often used as automated judges to evaluate text, but their effectiveness can be hindered by various unintentional biases. We propose using linear classifying probes, trained by leveraging differences between…

Computation and Language · Computer Science 2025-03-25 Sharan Maiya , Yinhong Liu , Ramit Debnath , Anna Korhonen

Recent studies have shown that large language models' (LLMs) mathematical problem-solving capabilities can be enhanced by integrating external tools, such as code interpreters, and employing multi-turn Chain-of-Thought (CoT) reasoning.…

Evaluating Large Language Models (LLMs) in open-ended scenarios is challenging because existing benchmarks and metrics can not measure them comprehensively. To address this problem, we propose to fine-tune LLMs as scalable judges (JudgeLM)…

Computation and Language · Computer Science 2025-03-04 Lianghui Zhu , Xinggang Wang , Xinlong Wang

Automated evaluation leveraging large language models (LLMs), commonly referred to as LLM evaluators or LLM-as-a-judge, has been widely used in measuring the performance of dialogue systems. However, the self-preference bias in LLMs has…

Computation and Language · Computer Science 2025-06-24 Koki Wataoka , Tsubasa Takahashi , Ryokan Ri

Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often…

Machine Learning · Computer Science 2025-05-13 Shenao Zhang , Zhihan Liu , Boyi Liu , Yufeng Zhang , Yingxiang Yang , Yongfei Liu , Liyu Chen , Tao Sun , Zhaoran Wang

Automatic evaluation by large language models (LLMs) is a prominent topic today; however, judgment and evaluation tasks are often subjective and influenced by various factors, making adaptation challenging. While many studies demonstrate…

Computation and Language · Computer Science 2024-12-11 Javad Seraj , Mohammad Mahdi Mohajeri , Mohammad Javad Dousti , Majid Nili Ahmadabadi

LLM-as-Judge frameworks are increasingly popular for AI evaluation, yet research findings on the relationship between models' generation and judgment abilities remain inconsistent. We investigate this relationship through systematic…

Computation and Language · Computer Science 2025-09-25 Wei-Hsiang Lin , Sheng-Lun Wei , Hen-Hsen Huang , Hsin-Hsi Chen
‹ Prev 1 2 3 10 Next ›