Related papers: Preference Alignment Improves Language Model-Based…

MPO: Multidimensional Preference Optimization for Language Model-based Text-to-Speech

In recent years, text-to-speech (TTS) has seen impressive advancements through large-scale language models, achieving human-level speech quality. Integrating human feedback has proven effective for enhancing robustness in these systems.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-03 Kangxiang Xia , Xinfa Zhu , Jixun Yao , Lei Xie

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Large language models (LLMs) demonstrate impressive performance but lack the flexibility to adapt to human preferences quickly without retraining. In this work, we introduce Test-time Preference Optimization (TPO), a framework that aligns…

Computation and Language · Computer Science 2025-01-23 Yafu Li , Xuyang Hu , Xiaoye Qu , Linjie Li , Yu Cheng

Tangent Space Fine-Tuning for Directional Preference Alignment in Large Language Models

Our goal is to enable large language models (LLMs) to balance multiple human preference dimensions; such as helpfulness, safety, and verbosity, through principled and controllable alignment. Existing preference optimization methods,…

Machine Learning · Computer Science 2026-02-03 Mete Erdogan

Data-efficient Targeted Token-level Preference Optimization for LLM-based Text-to-Speech

Aligning text-to-speech (TTS) system outputs with human feedback through preference optimization has been shown to effectively improve the robustness and naturalness of language model-based TTS models. Current approaches primarily require…

Computation and Language · Computer Science 2026-04-28 Rikuto Kotoge , Yuichi Sasaki

Optimizing LLMs with Direct Preferences: A Data Efficiency Perspective

Aligning the output of Large Language Models (LLMs) with human preferences (e.g., by means of reinforcement learning with human feedback, or RLHF) is essential for ensuring their effectiveness in real-world scenarios. Despite significant…

Artificial Intelligence · Computer Science 2024-10-23 Pietro Bernardelle , Gianluca Demartini

Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization

Developing high-quality text-to-speech (TTS) systems for low-resource languages is challenging due to the scarcity of paired text and speech data. In contrast, automatic speech recognition (ASR) models for such languages are often more…

Artificial Intelligence · Computer Science 2025-09-29 Shehzeen Hussain , Paarth Neekhara , Xuesong Yang , Edresson Casanova , Subhankar Ghosh , Roy Fejgin , Ryan Langman , Mikyas Desta , Leili Tavabi , Jason Li

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced…

Computation and Language · Computer Science 2024-07-19 Janghwan Lee , Seongmin Park , Sukjin Hong , Minsoo Kim , Du-Seong Chang , Jungwook Choi

TSO: Self-Training with Scaled Preference Optimization

Enhancing the conformity of large language models (LLMs) to human preferences remains an ongoing research challenge. Recently, offline approaches such as Direct Preference Optimization (DPO) have gained prominence as attractive options due…

Machine Learning · Computer Science 2024-09-05 Kaihui Chen , Hao Yi , Qingyang Li , Tianyu Qi , Yulan Hu , Fuzheng Zhang , Yong Liu

Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood…

Artificial Intelligence · Computer Science 2025-05-27 Anirudhan Badrinath , Prabhat Agarwal , Jiajing Xu

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Effective training of language models (LMs) for mathematical reasoning tasks demands high-quality supervised fine-tuning data. Besides obtaining annotations from human experts, a common alternative is sampling from larger and more powerful…

Computation and Language · Computer Science 2024-07-26 Tianduo Wang , Shichen Li , Wei Lu

Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech

Integrating human feedback to align text-to-speech (TTS) system outputs with human preferences has proven to be an effective approach for enhancing the robustness of language model-based TTS systems. Current approaches primarily focus on…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-29 Jixun Yao , Yuguang Yang , Yu Pan , Yuan Feng , Ziqian Ning , Jianhao Ye , Hongbin Zhou , Lei Xie

TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees

In the domain of complex reasoning tasks, such as mathematical reasoning, recent advancements have proposed the use of Direct Preference Optimization (DPO) to suppress output of dispreferred responses, thereby enhancing the long-chain…

Computation and Language · Computer Science 2025-10-27 Weibin Liao , Xu Chu , Yasha Wang

Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities

Automatic text simplification (ATS) aims to enhance language accessibility for various target groups, particularly persons with intellectual disabilities. Recent advancements in generative AI, especially large language models (LLMs), have…

Computation and Language · Computer Science 2025-07-03 Yingqiang Gao , Kaede Johnson , David Froehlich , Luisa Carrer , Sarah Ebling

Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization

Direct Preference Optimization (DPO) and its variants have become the de facto standards for aligning large language models (LLMs) with human preferences or specific goals. However, DPO requires high-quality preference data and suffers from…

Machine Learning · Computer Science 2024-11-12 Zhuotong Chen , Fang Liu , Jennifer Zhu , Wanyu Du , Yanjun Qi

Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

Neural metrics for machine translation (MT) evaluation have become increasingly prominent due to their superior correlation with human judgments compared to traditional lexical metrics. Researchers have therefore utilized neural metrics…

Computation and Language · Computer Science 2025-11-21 Hippolyte Gisserot-Boukhlef , Ricardo Rei , Emmanuel Malherbe , Céline Hudelot , Pierre Colombo , Nuno M. Guerreiro

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often…

Machine Learning · Computer Science 2025-05-13 Shenao Zhang , Zhihan Liu , Boyi Liu , Yufeng Zhang , Yingxiang Yang , Yongfei Liu , Liyu Chen , Tao Sun , Zhaoran Wang

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment

Modern zero-shot text-to-speech (TTS) systems, despite using extensive pre-training, often struggle in challenging scenarios such as tongue twisters, repeated words, code-switching, and cross-lingual synthesis, leading to intelligibility…

Sound · Computer Science 2025-06-09 Xueyao Zhang , Yuancheng Wang , Chaoren Wang , Ziniu Li , Zhuo Chen , Zhizheng Wu

Aligning CodeLLMs with Direct Preference Optimization

The last year has witnessed the rapid progress of large language models (LLMs) across diverse domains. Among them, CodeLLMs have garnered particular attention because they can not only assist in completing various programming tasks but also…

Artificial Intelligence · Computer Science 2024-10-25 Yibo Miao , Bofei Gao , Shanghaoran Quan , Junyang Lin , Daoguang Zan , Jiaheng Liu , Jian Yang , Tianyu Liu , Zhijie Deng

TODO: Enhancing LLM Alignment with Ternary Preferences

Aligning large language models (LLMs) with human intent is critical for enhancing their performance across a variety of tasks. Standard alignment techniques, such as Direct Preference Optimization (DPO), often rely on the binary…

Computation and Language · Computer Science 2025-04-01 Yuxiang Guo , Lu Yin , Bo Jiang , Jiaqi Zhang

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts

In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role in this area. It works by using pairs of preferences…

Computation and Language · Computer Science 2024-05-29 Yueqin Yin , Zhendong Wang , Yi Gu , Hai Huang , Weizhu Chen , Mingyuan Zhou