English
Related papers

Related papers: Listwise Preference Diffusion Optimization for Use…

200 papers

Reinforcement learning from human feedback (RLHF) has proven effectiveness for aligning text-to-image (T2I) diffusion models with human preferences. Although Direct Preference Optimization (DPO) is widely adopted for its computational…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Jiamu Bai , Xin Yu , Meilong Xu , Weitao Lu , Xin Pan , Kiwan Maeng , Daniel Kifer , Jian Wang , Yu Wang

Diffusion models have achieved remarkable success in conditional image generation, yet their outputs often remain misaligned with human preferences. To address this, recent work has applied Direct Preference Optimization (DPO) to diffusion…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Shaomeng Wang , He Wang , Xiaolu Wei , Longquan Dai , Jinhui Tang

Diffusion-based models are recognized for their effectiveness in using real-world driving data to generate realistic and diverse traffic scenarios. These models employ guided sampling to incorporate specific traffic preferences and enhance…

Machine Learning · Computer Science 2025-02-19 Seungjun Yu , Kisung Kim , Daejung Kim , Haewook Han , Jinhan Lee

Aligning text-to-image (T2I) diffusion models with human preferences has emerged as a critical research challenge. While recent advances in this area have extended preference optimization techniques from large language models (LLMs) to the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-25 Junyong Kang , Seohyun Lim , Kyungjune Baek , Hyunjung Shim

Recent alignment methods based on Direct Preference Optimization (DPO) reformulate preference learning as supervised optimization over pairwise comparisons, offering improved efficiency and stability over reinforcement learning from human…

Machine Learning · Computer Science 2026-01-22 Yuhui Sun , Xiyao Wang , Zixi Li , YiTian Ding , Tianyang Ling , Jialuo Chen , Tianyi Yu , Zhenlong Yuan , Jinman Zhao

Recommender systems predict personalized item rankings based on user preference distributions derived from historical behavior data. Recently, diffusion models (DMs) have gained attention in recommendation for their ability to model complex…

Information Retrieval · Computer Science 2025-04-22 Shuo Liu , An Zhang , Guoqing Hu , Hong Qian , Tat-seng Chua

Diffusion models have achieved state-of-the-art performance across multiple domains, with recent advancements extending their applicability to discrete data. However, aligning discrete diffusion models with task-specific preferences remains…

Machine Learning · Computer Science 2025-04-10 Umberto Borso , Davide Paglieri , Jude Wells , Tim Rocktäschel

RLHF techniques like DPO can significantly improve the generation quality of text-to-image diffusion models. However, these methods optimize for a single reward that aligns model generation with population-level preferences, neglecting the…

Machine Learning · Computer Science 2025-01-14 Meihua Dang , Anikait Singh , Linqi Zhou , Stefano Ermon , Jiaming Song

Direct Preference Optimization (DPO) aligns text-to-image (T2I) generation models with human preferences using pairwise preference data. Although substantial resources are expended in collecting and labeling datasets, a critical aspect is…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Yunhong Lu , Qichao Wang , Hengyuan Cao , Xiaoyin Xu , Min Zhang

Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has…

Computer Vision and Pattern Recognition · Computer Science 2023-11-23 Bram Wallace , Meihua Dang , Rafael Rafailov , Linqi Zhou , Aaron Lou , Senthil Purushwalkam , Stefano Ermon , Caiming Xiong , Shafiq Joty , Nikhil Naik

Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time. Many existing recommendation algorithms -- including both shallow and deep ones -- often model such…

Information Retrieval · Computer Science 2022-04-05 Chao Chen , Dongsheng Li , Junchi Yan , Xiaokang Yang

Preference learning has garnered extensive attention as an effective technique for aligning diffusion models with human preferences in visual generation. However, existing alignment approaches such as Diffusion-DPO suffer from two…

Machine Learning · Computer Science 2026-05-19 Xiaomeng Yang , Mengping Yang , Junyan Wang , Zhijian Zhou , Zhiyu Tan , Hao Li

In this paper, we study the problem of procedure planning in instructional videos, which aims to make a plan (i.e. a sequence of actions) given the current visual observation and the desired goal. Previous works cast this as a sequence…

Computer Vision and Pattern Recognition · Computer Science 2025-01-23 Hanlin Wang , Yilu Wu , Sheng Guo , Limin Wang

Efficiently aligning large-scale video diffusion models with human intent requires a scalable and trajectory-aware pathway that bridges the inherent discrepancy between training noise distributions and practical inference trajectories.…

Computer Vision and Pattern Recognition · Computer Science 2026-05-11 Jingyuan Zhu , Biaolong Chen , Le Zhang , Aixi Zhang , Hao Jiang , Pipei Huang

Aligning large language models with human preferences has emerged as a critical focus in language modeling research. Yet, integrating preference learning into Text-to-Image (T2I) generative models is still relatively uncharted territory.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Yi Gu , Zhendong Wang , Yueqin Yin , Yujia Xie , Mingyuan Zhou

Offline multi-objective optimization aims to identify Pareto-optimal solutions given a dataset of designs and their objective values. In this work, we propose a preference-guided diffusion model that generates Pareto-optimal designs by…

Machine Learning · Computer Science 2025-12-19 Yashas Annadani , Syrine Belakaria , Stefano Ermon , Stefan Bauer , Barbara E Engelhardt

Direct preference optimization (DPO) has shown success in aligning diffusion models with human preference. Previous approaches typically assume a consistent preference label between final generations and noisy samples at intermediate steps,…

Machine Learning · Computer Science 2025-02-05 Jie Ren , Yuhang Zhang , Dongrui Liu , Xiaopeng Zhang , Qi Tian

Direct preference optimization (DPO) methods have shown strong potential in aligning text-to-image diffusion models with human preferences by training on paired comparisons. These methods improve training stability by avoiding the REINFORCE…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Yi-Lun Wu , Bo-Kai Ruan , Chiang Tseng , Hong-Han Shuai

Preferences within a group of people are not uniform but follow a distribution. While existing alignment methods like Direct Preference Optimization (DPO) attempt to steer models to reflect human preferences, they struggle to capture the…

Computation and Language · Computer Science 2025-05-14 Binwei Yao , Zefan Cai , Yun-Shiuan Chuang , Shanglin Yang , Ming Jiang , Diyi Yang , Junjie Hu

Diffusion Models have revolutionized the field of human motion generation by offering exceptional generation quality and fine-grained controllability through natural language conditioning. Their inherent stochasticity, that is the ability…

Computer Vision and Pattern Recognition · Computer Science 2024-05-08 Massimiliano Pappa , Luca Collorone , Giovanni Ficarra , Indro Spinelli , Fabio Galasso
‹ Prev 1 2 3 10 Next ›