Related papers: Calibrated Multi-Preference Optimization for Align…

Towards Self-Improvement of Diffusion Models via Group Preference Optimization

Aligning text-to-image (T2I) diffusion models with Direct Preference Optimization (DPO) has shown notable improvements in generation quality. However, applying DPO to T2I faces two challenges: the sensitivity of DPO to preference pairs and…

Computer Vision and Pattern Recognition · Computer Science 2025-05-19 Renjie Chen , Wenfeng Lin , Yichen Zhang , Jiangchuan Wei , Boyuan Liu , Chao Feng , Jiao Ran , Mingyu Guo

Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization

Aligning large language models with human preferences has emerged as a critical focus in language modeling research. Yet, integrating preference learning into Text-to-Image (T2I) generative models is still relatively uncharted territory.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Yi Gu , Zhendong Wang , Yueqin Yin , Yujia Xie , Mingyuan Zhou

Rethinking Direct Preference Optimization in Diffusion Models

Aligning text-to-image (T2I) diffusion models with human preferences has emerged as a critical research challenge. While recent advances in this area have extended preference optimization techniques from large language models (LLMs) to the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-25 Junyong Kang , Seohyun Lim , Kyungjune Baek , Hyunjung Shim

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Modern preference alignment methods, such as DPO, rely on divergence regularization to a reference model for training stability-but this creates a fundamental problem we call "reference mismatch." In this paper, we investigate the negative…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Jiwoo Hong , Sayak Paul , Noah Lee , Kashif Rasul , James Thorne , Jongheon Jeong

Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences

Direct Preference Optimization (DPO) aligns text-to-image (T2I) generation models with human preferences using pairwise preference data. Although substantial resources are expended in collecting and labeling datasets, a critical aspect is…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Yunhong Lu , Qichao Wang , Hengyuan Cao , Xiaoyin Xu , Min Zhang

Preference-Based Alignment of Discrete Diffusion Models

Diffusion models have achieved state-of-the-art performance across multiple domains, with recent advancements extending their applicability to discrete data. However, aligning discrete diffusion models with task-specific preferences remains…

Machine Learning · Computer Science 2025-04-10 Umberto Borso , Davide Paglieri , Jude Wells , Tim Rocktäschel

InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment

Without using explicit reward, direct preference optimization (DPO) employs paired human preference data to fine-tune generative models, a method that has garnered considerable attention in large language models (LLMs). However, exploration…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Yunhong Lu , Qichao Wang , Hengyuan Cao , Xierui Wang , Xiaoyin Xu , Min Zhang

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

Aligning text-to-image diffusion model (T2I) with preference has been gaining increasing research attention. While prior works exist on directly optimizing T2I by preference data, these methods are developed under the bandit assumption of a…

Computer Vision and Pattern Recognition · Computer Science 2024-05-14 Shentao Yang , Tianqi Chen , Mingyuan Zhou

Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization

Text-to-image (T2I) diffusion models have made remarkable strides in generating and editing high-fidelity images from text. Yet, these models remain fundamentally generic, failing to adapt to the nuanced aesthetic preferences of individual…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Connor Dunlop , Matthew Zheng , Kavana Venkatesh , Pinar Yanardag

Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking

Direct preference optimization (DPO) has shown success in aligning diffusion models with human preference. Previous approaches typically assume a consistent preference label between final generations and noisy samples at intermediate steps,…

Machine Learning · Computer Science 2025-02-05 Jie Ren , Yuhang Zhang , Dongrui Liu , Xiaopeng Zhang , Qi Tian

BalancedDPO: Adaptive Multi-Metric Alignment

Diffusion models have achieved remarkable progress in text-to-image generation, yet aligning them with human preference remains challenging due to the presence of multiple, sometimes conflicting, evaluation metrics (e.g., semantic…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Dipesh Tamboli , Souradip Chakraborty , Aditya Malusare , Biplab Banerjee , Amrit Singh Bedi , Vaneet Aggarwal

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

Text-to-image diffusion models deliver high-quality images, yet aligning them with human preferences remains challenging. We revisit diffusion-based Direct Preference Optimization (DPO) for these models and identify a critical pathology:…

Computer Vision and Pattern Recognition · Computer Science 2025-12-03 Minghao Fu , Guo-Hua Wang , Tianyu Cui , Qing-Guo Chen , Zhao Xu , Weihua Luo , Kaifu Zhang

Towards General Preference Alignment: Diffusion Models at Nash Equilibrium

Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally…

Machine Learning · Computer Science 2026-05-07 Jiaming Hu , Jiamu Bai , Haoyu Wang , Debarghya Mukherjee , Ioannis Ch. Paschalidis

Towards Better Optimization For Listwise Preference in Diffusion Models

Reinforcement learning from human feedback (RLHF) has proven effectiveness for aligning text-to-image (T2I) diffusion models with human preferences. Although Direct Preference Optimization (DPO) is widely adopted for its computational…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Jiamu Bai , Xin Yu , Meilong Xu , Weitao Lu , Xin Pan , Kiwan Maeng , Daniel Kifer , Jian Wang , Yu Wang

DeDPO: Debiased Direct Preference Optimization for Diffusion Models

Direct Preference Optimization (DPO) has emerged as a predominant alignment method for diffusion models, facilitating off-policy training without explicit reward modeling. However, its reliance on large-scale, high-quality human preference…

Computer Vision and Pattern Recognition · Computer Science 2026-02-09 Khiem Pham , Quang Nguyen , Tung Nguyen , Jingsen Zhu , Michele Santacatterina , Dimitris Metaxas , Ramin Zabih

Aligning Diffusion Models by Optimizing Human Utility

We present Diffusion-KTO, a novel approach for aligning text-to-image diffusion models by formulating the alignment objective as the maximization of expected human utility. Since this objective applies to each generation independently,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Shufan Li , Konstantinos Kallidromitis , Akash Gokul , Yusuke Kato , Kazuki Kozuka

Beyond Binary Preference: Aligning Diffusion Models to Fine-grained Criteria by Decoupling Attributes

Post-training alignment of diffusion models relies on simplified signals, such as scalar rewards or binary preferences. This limits alignment with complex human expertise, which is hierarchical and fine-grained. To address this, we first…

Computer Vision and Pattern Recognition · Computer Science 2026-01-09 Chenye Meng , Zejian Li , Zhongni Liu , Yize Li , Changle Xie , Kaixin Jia , Ling Yang , Huanghuang Deng , Shiying Ding , Shengyuan Zhang , Jiayi Li , Lingyun Sun

PC-Diffusion: Aligning Diffusion Models with Human Preferences via Preference Classifier

Diffusion models have achieved remarkable success in conditional image generation, yet their outputs often remain misaligned with human preferences. To address this, recent work has applied Direct Preference Optimization (DPO) to diffusion…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Shaomeng Wang , He Wang , Xiaolu Wei , Longquan Dai , Jinhui Tang

Dual Caption Preference Optimization for Diffusion Models

Recent advancements in human preference optimization, originally developed for Large Language Models (LLMs), have shown significant potential in improving text-to-image diffusion models. These methods aim to learn the distribution of…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Amir Saeidi , Yiran Luo , Agneet Chatterjee , Shamanthak Hegde , Bimsara Pathiraja , Yezhou Yang , Chitta Baral

Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs

Recent advances in diffusion-based text-to-image (T2I) models have led to remarkable success in generating high-quality images from textual prompts. However, ensuring accurate alignment between the text and the generated image remains a…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Jia Jun Cheng Xian , Muchen Li , Haotian Yang , Xin Tao , Pengfei Wan , Leonid Sigal , Renjie Liao