Related papers: DiffPO: Diffusion-styled Preference Optimization f…

Diffusion Model Alignment Using Direct Preference Optimization

Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has…

Computer Vision and Pattern Recognition · Computer Science 2023-11-23 Bram Wallace , Meihua Dang , Rafael Rafailov , Linqi Zhou , Aaron Lou , Senthil Purushwalkam , Stefano Ermon , Caiming Xiong , Shafiq Joty , Nikhil Naik

Divergence Minimization Preference Optimization for Diffusion Model Alignment

Diffusion models have achieved remarkable success in generating realistic and versatile images from text prompts. Inspired by the recent advancements of language models, there is an increasing interest in further improving the models by…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Binxu Li , Minkai Xu , Jiaqi Han , Meihua Dang , Stefano Ermon

Rethinking Direct Preference Optimization in Diffusion Models

Aligning text-to-image (T2I) diffusion models with human preferences has emerged as a critical research challenge. While recent advances in this area have extended preference optimization techniques from large language models (LLMs) to the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-25 Junyong Kang , Seohyun Lim , Kyungjune Baek , Hyunjung Shim

InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment

Without using explicit reward, direct preference optimization (DPO) employs paired human preference data to fine-tune generative models, a method that has garnered considerable attention in large language models (LLMs). However, exploration…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Yunhong Lu , Qichao Wang , Hengyuan Cao , Xierui Wang , Xiaoyin Xu , Min Zhang

Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization

Aligning large language models with human preferences has emerged as a critical focus in language modeling research. Yet, integrating preference learning into Text-to-Image (T2I) generative models is still relatively uncharted territory.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Yi Gu , Zhendong Wang , Yueqin Yin , Yujia Xie , Mingyuan Zhou

Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking

Direct preference optimization (DPO) has shown success in aligning diffusion models with human preference. Previous approaches typically assume a consistent preference label between final generations and noisy samples at intermediate steps,…

Machine Learning · Computer Science 2025-02-05 Jie Ren , Yuhang Zhang , Dongrui Liu , Xiaopeng Zhang , Qi Tian

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Large language models (LLMs) demonstrate impressive performance but lack the flexibility to adapt to human preferences quickly without retraining. In this work, we introduce Test-time Preference Optimization (TPO), a framework that aligns…

Computation and Language · Computer Science 2025-01-23 Yafu Li , Xuyang Hu , Xiaoye Qu , Linjie Li , Yu Cheng

Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed…

Artificial Intelligence · Computer Science 2026-03-13 Min Cheng , Fatemeh Doudi , Dileep Kalathil , Mohammad Ghavamzadeh , Panganamala R. Kumar

MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time

Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora, making them powerful tools for various applications. To make LLMs more usable, aligning them with human preferences is essential.…

Computation and Language · Computer Science 2024-10-21 Mozhi Zhang , Pengyu Wang , Chenkun Tan , Mianqiu Huang , Dong Zhang , Yaqian Zhou , Xipeng Qiu

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically use Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Tao Zhang , Cheng Da , Kun Ding , Huan Yang , Kun Jin , Yan Li , Tingting Gao , Di Zhang , Shiming Xiang , Chunhong Pan

Diffusion-APO: Trajectory-Aware Direct Preference Alignment for Video Diffusion Transformers

Efficiently aligning large-scale video diffusion models with human intent requires a scalable and trajectory-aware pathway that bridges the inherent discrepancy between training noise distributions and practical inference trajectories.…

Computer Vision and Pattern Recognition · Computer Science 2026-05-11 Jingyuan Zhu , Biaolong Chen , Le Zhang , Aixi Zhang , Hao Jiang , Pipei Huang

Optimizing LLMs with Direct Preferences: A Data Efficiency Perspective

Aligning the output of Large Language Models (LLMs) with human preferences (e.g., by means of reinforcement learning with human feedback, or RLHF) is essential for ensuring their effectiveness in real-world scenarios. Despite significant…

Artificial Intelligence · Computer Science 2024-10-23 Pietro Bernardelle , Gianluca Demartini

DeDPO: Debiased Direct Preference Optimization for Diffusion Models

Direct Preference Optimization (DPO) has emerged as a predominant alignment method for diffusion models, facilitating off-policy training without explicit reward modeling. However, its reliance on large-scale, high-quality human preference…

Computer Vision and Pattern Recognition · Computer Science 2026-02-09 Khiem Pham , Quang Nguyen , Tung Nguyen , Jingsen Zhu , Michele Santacatterina , Dimitris Metaxas , Ramin Zabih

Aligning Diffusion Models with Noise-Conditioned Perception

Recent advancements in human preference optimization, initially developed for Language Models (LMs), have shown promise for text-to-image Diffusion Models, enhancing prompt alignment, visual appeal, and user preference. Unlike LMs,…

Computer Vision and Pattern Recognition · Computer Science 2025-12-03 Alexander Gambashidze , Anton Kulikov , Yuriy Sosnin , Ilya Makarov

Prompt Optimization Via Diffusion Language Models

We propose a diffusion-based framework for prompt optimization that leverages Diffusion Language Models (DLMs) to iteratively refine system prompts through masked denoising. By conditioning on interaction traces, including user queries,…

Computation and Language · Computer Science 2026-02-24 Shiyu Wang , Haolin Chen , Liangwei Yang , Jielin Qiu , Rithesh Murthy , Ming Zhu , Zixiang Chen , Silvio Savarese , Caiming Xiong , Shelby Heinecke , Huan Wang

Enhancing Reasoning for Diffusion LLMs via Distribution Matching Policy Optimization

Diffusion large language models (dLLMs) are promising alternatives to autoregressive large language models (AR-LLMs), as they potentially allow higher inference throughput. Reinforcement learning (RL) is a crucial component for dLLMs to…

Machine Learning · Computer Science 2026-02-24 Yuchen Zhu , Wei Guo , Jaemoo Choi , Petr Molodyk , Bo Yuan , Molei Tao , Yongxin Chen

Training Diffusion Models with Reinforcement Learning

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives…

Machine Learning · Computer Science 2024-01-08 Kevin Black , Michael Janner , Yilun Du , Ilya Kostrikov , Sergey Levine

Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners

This work addresses the challenge of personalizing trajectories generated in automated decision-making systems by introducing a resource-efficient approach that enables rapid adaptation to individual users' preferences. Our method leverages…

Machine Learning · Computer Science 2025-03-25 Wen Zheng Terence Ng , Jianda Chen , Yuan Xu , Tianwei Zhang

Rethinking Preference Alignment for Diffusion Models with Classifier-Free Guidance

Aligning large-scale text-to-image diffusion models with nuanced human preferences remains challenging. While direct preference optimization (DPO) is simple and effective, large-scale finetuning often shows a generalization gap. We take…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Zhou Jiang , Yandong Wen , Zhen Liu

Aligning Diffusion Language Models via Unpaired Preference Optimization

Diffusion language models (dLLMs) are an emerging alternative to autoregressive (AR) generators, but aligning them to human preferences is challenging because sequence log-likelihoods are intractable and pairwise preference data are costly…

Machine Learning · Computer Science 2025-11-13 Vaibhav Jindal , Hejian Sang , Chun-Mao Lai , Yanning Chen , Zhipeng Wang