English
Related papers

Related papers: HyperAlign: Hypernetwork for Efficient Test-Time A…

200 papers

Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function,…

Machine Learning · Statistics 2026-02-03 Yidong Ouyang , Liyan Xie , Hongyuan Zha , Guang Cheng

Text-to-motion generation, which synthesizes 3D human motions from text inputs, holds immense potential for applications in gaming, film, and robotics. Recently, diffusion-based methods have been shown to generate more diversity and…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Wanjiang Weng , Xiaofeng Tan , Junbo Wang , Guo-Sen Xie , Pan Zhou , Hongsong Wang

The new paradigm of test-time scaling has yielded remarkable breakthroughs in Large Language Models (LLMs) (e.g. reasoning models) and in generative vision models, allowing models to allocate additional computation during inference to…

Machine Learning · Computer Science 2025-08-14 Luca Eyring , Shyamgopal Karthik , Alexey Dosovitskiy , Nataniel Ruiz , Zeynep Akata

Diffusion models have become a central paradigm for image and multimodal generation, yet their deployment raises persistent questions about alignment, safety, preference satisfaction, and robustness to misuse. This survey reviews recent…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Preeti Lamba , Kiran Ravish , Ankita Kushwaha , Pawan Kumar

Diffusion models have shown promising generative capabilities across diverse domains, yet aligning their outputs with desired reward functions remains a challenge, particularly in cases where reward functions are non-differentiable. Some…

Machine Learning · Computer Science 2025-06-04 Xiner Li , Masatoshi Uehara , Xingyu Su , Gabriele Scalia , Tommaso Biancalani , Aviv Regev , Sergey Levine , Shuiwang Ji

Diffusion alignment aims to optimize diffusion models for the downstream objective. While existing methods based on reinforcement learning or direct backpropagation achieve considerable success in maximizing rewards, they often suffer from…

Machine Learning · Computer Science 2026-03-09 Jaewoo Lee , Minsu Kim , Sanghyeok Choi , Inhyuck Song , Sujin Yun , Hyeongyu Kang , Woocheol Shin , Taeyoung Yun , Kiyoung Om , Jinkyoo Park

Recent advances in diffusion models have led to impressive image generation capabilities, but aligning these models with human preferences remains challenging. Reward-based fine-tuning using models trained on human feedback improves…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Dmitrii Sorokin , Maksim Nakhodnov , Andrey Kuznetsov , Aibek Alanov

This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models. While diffusion models are renowned for their generative modeling capabilities,…

Artificial Intelligence · Computer Science 2025-01-22 Masatoshi Uehara , Yulai Zhao , Chenyu Wang , Xiner Li , Aviv Regev , Sergey Levine , Tommaso Biancalani

Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed…

Artificial Intelligence · Computer Science 2026-03-13 Min Cheng , Fatemeh Doudi , Dileep Kalathil , Mohammad Ghavamzadeh , Panganamala R. Kumar

Language model alignment is a critical step in training modern generative language models. Alignment targets to improve win rate of a sample from the aligned model against the base model. Today, we are increasingly using inference-time…

Diffusion models have become the de-facto approach for generating visual data, which are trained to match the distribution of the training dataset. In addition, we also want to control generation to fulfill desired properties such as…

Machine Learning · Computer Science 2024-12-30 Dinghuai Zhang , Yizhe Zhang , Jiatao Gu , Ruixiang Zhang , Josh Susskind , Navdeep Jaitly , Shuangfei Zhai

Faithful text rendering remains a persistent weakness of large text-to-image generative models, as it requires both semantic instruction following and fine-grained glyph-level structure. Prior methods often improve this ability through…

Computer Vision and Pattern Recognition · Computer Science 2026-05-20 Mingxuan Cui , Jingpu Yang , Fengxian Ji , Qian Jiang , Zhecheng Shi , Jiaming Wang , Zirui Song , Fajri Koto , Xiuying Chen

Diffusion models have become prevalent in generative modeling due to their ability to sample from complex distributions. To improve the quality of generated samples and their compliance with user requirements, two commonly used methods are:…

Machine Learning · Computer Science 2025-12-01 Shervin Khalafi , Ignacio Hounie , Dongsheng Ding , Alejandro Ribeiro

With the rapid development of text-to-image generation technology, accurately assessing the alignment between generated images and text prompts has become a critical challenge. Existing methods rely on Euclidean space metrics, neglecting…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Wenzhi Chen , Bo Hu , Leida Li , Lihuo He , Wen Lu , Xinbo Gao

This paper focuses on the alignment of flow matching models with human preferences. A promising way is fine-tuning by directly backpropagating reward gradients through the differentiable generation process of flow matching. However,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Zhanhao Liang , Tao Yang , Jie Wu , Chengjian Feng , Liang Zheng

Diffusion models are state-of-the-art generative models, yet their samples often fail to satisfy application objectives such as safety constraints or domain-specific validity. Existing techniques for alignment require gradients, internal…

Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient…

Artificial Intelligence · Computer Science 2025-09-12 Xiangwei Shen , Zhimin Li , Zhantao Yang , Shiyi Zhang , Yingfang Zhang , Donghao Li , Chunyu Wang , Qinglin Lu , Yansong Tang

Denoising-based generative models, particularly diffusion and flow matching algorithms, have achieved remarkable success. However, aligning their output distributions with complex downstream objectives, such as human preferences,…

Machine Learning · Computer Science 2025-08-29 Luozhijie Jin , Zijie Qiu , Jie Liu , Zijie Diao , Lifeng Qiao , Ning Ding , Alex Lamb , Xipeng Qiu

The remarkable progress in text-to-video diffusion models enables the generation of photorealistic videos, although the content of these generated videos often includes unnatural movement or deformation, reverse playback, and motionless…

Computer Vision and Pattern Recognition · Computer Science 2025-10-08 Yuta Oshima , Masahiro Suzuki , Yutaka Matsuo , Hiroki Furuta

The remarkable success of diffusion models in text-to-image generation has sparked growing interest in expanding their capabilities to a variety of multi-modal tasks, including image understanding, manipulation, and perception. These tasks…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Xinyang Song , Libin Wang , Weining Wang , Shaozhen Liu , Dandan Zheng , Jingdong Chen , Qi Li , Zhenan Sun
‹ Prev 1 2 3 10 Next ›