Related papers: Implicit and Explicit Language Guidance for Diffus…

Unleashing Text-to-Image Diffusion Models for Visual Perception

Diffusion models (DMs) have become the new trend of generative models and have demonstrated a powerful ability of conditional synthesis. Among those, text-to-image diffusion models pre-trained on large-scale image-text pairs are highly…

Computer Vision and Pattern Recognition · Computer Science 2023-03-06 Wenliang Zhao , Yongming Rao , Zuyan Liu , Benlin Liu , Jie Zhou , Jiwen Lu

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion…

Computer Vision and Pattern Recognition · Computer Science 2023-03-15 Yogesh Balaji , Seungjun Nah , Xun Huang , Arash Vahdat , Jiaming Song , Qinsheng Zhang , Karsten Kreis , Miika Aittala , Timo Aila , Samuli Laine , Bryan Catanzaro , Tero Karras , Ming-Yu Liu

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

In light of the remarkable success of in-context learning in large language models, its potential extension to the vision domain, particularly with visual foundation models like Stable Diffusion, has sparked considerable interest. Existing…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Tianqi Chen , Yongfei Liu , Zhendong Wang , Jianbo Yuan , Quanzeng You , Hongxia Yang , Mingyuan Zhou

Harnessing Diffusion Models for Visual Perception with Meta Prompts

The issue of generative pretraining for vision models has persisted as a long-standing conundrum. At present, the text-to-image (T2I) diffusion model demonstrates remarkable proficiency in generating high-definition images matching textual…

Computer Vision and Pattern Recognition · Computer Science 2023-12-25 Qiang Wan , Zilong Huang , Bingyi Kang , Jiashi Feng , Li Zhang

In-Context Learning Unlocked for Diffusion Models

We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Zhendong Wang , Yifan Jiang , Yadong Lu , Yelong Shen , Pengcheng He , Weizhu Chen , Zhangyang Wang , Mingyuan Zhou

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Recently large-scale language-image models (e.g., text-guided diffusion models) have considerably improved the image generation capabilities to generate photorealistic images in various domains. Based on this success, current image editing…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Wenkai Dong , Song Xue , Xiaoyue Duan , Shumin Han

Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance

We present a simple but effective training-free approach for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our goal is to generate an image that aligns with the target task while preserving the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-23 Hyunsoo Lee , Minsoo Kang , Bohyung Han

InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models

As one of the most successful generative models, diffusion models have demonstrated remarkable efficacy in synthesizing high-quality images. These models learn the underlying high-dimensional data distribution in an unsupervised manner.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-12 Min Hou , Yueying Wu , Chang Xu , Yu-Hao Huang , Chenxi Bai , Le Wu , Jiang Bian

Image Editing As Programs with Diffusion Models

While diffusion models have achieved remarkable success in text-to-image generation, they encounter significant challenges with instruction-driven image editing. Our research highlights a key challenge: these models particularly struggle…

Computer Vision and Pattern Recognition · Computer Science 2025-06-05 Yujia Hu , Songhua Liu , Zhenxiong Tan , Xingyi Yang , Xinchao Wang

DDP: Diffusion Model for Dense Visual Prediction

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a…

Computer Vision and Pattern Recognition · Computer Science 2023-05-16 Yuanfeng Ji , Zhe Chen , Enze Xie , Lanqing Hong , Xihui Liu , Zhaoqiang Liu , Tong Lu , Zhenguo Li , Ping Luo

Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization

Text-to-image diffusion models have emerged as powerful tools for high-quality image generation and editing. Many existing approaches rely on text prompts as editing guidance. However, these methods are constrained by the need for manual…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Yuanyuan Chang , Yinghua Yao , Tao Qin , Mengmeng Wang , Ivor Tsang , Guang Dai

Language-Depth Navigated Thermal and Visible Image Fusion

Depth-guided multimodal fusion combines depth information from visible and infrared images, significantly enhancing the performance of 3D reconstruction and robotics applications. Existing thermal-visible image fusion mainly focuses on…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Jinchang Zhang , Zijun Li , Guoyu Lu

Contextualized Diffusion Models for Text-Guided Image and Video Generation

Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion models primarily focus on incorporating text-visual…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ling Yang , Zhilong Zhang , Zhaochen Yu , Jingwei Liu , Minkai Xu , Stefano Ermon , Bin Cui

Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation

Text-to-image diffusion models sometimes depict blended concepts in the generated images. One promising use case of this effect would be the nonword-to-image generation task which attempts to generate images intuitively imaginable from a…

Multimedia · Computer Science 2024-11-07 Chihaya Matsuhira , Marc A. Kastner , Takahiro Komamizu , Takatsugu Hirayama , Ichiro Ide

Semantic Guidance Tuning for Text-To-Image Diffusion Models

Recent advancements in Text-to-Image (T2I) diffusion models have demonstrated impressive success in generating high-quality images with zero-shot generalization capabilities. Yet, current models struggle to closely adhere to prompt…

Computer Vision and Pattern Recognition · Computer Science 2024-01-31 Hyun Kang , Dohae Lee , Myungjin Shin , In-Kwon Lee

EruDiff: Refactoring Knowledge in Diffusion Models for Advanced Text-to-Image Synthesis

Text-to-image diffusion models have achieved remarkable fidelity in synthesizing images from explicit text prompts, yet exhibit a critical deficiency in processing implicit prompts that require deep-level world knowledge, ranging from…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Xiefan Guo , Xinzhu Ma , Haoxiang Ma , Zihao Zhou , Di Huang

ImitDiff: Transferring Foundation-Model Priors for Distraction Robust Visuomotor Policy

Visuomotor imitation learning policies enable robots to efficiently acquire manipulation skills from visual demonstrations. However, as scene complexity and visual distractions increase, policies that perform well in simple settings often…

Artificial Intelligence · Computer Science 2025-11-11 Yuhang Dong , Haizhou Ge , Yupei Zeng , Jiangning Zhang , Beiwen Tian , Hongrui Zhu , Yufei Jia , Ruixiang Wang , Zhucun Xue , Guyue Zhou , Longhua Ma , Guanzhong Tian

Textual and Visual Prompt Fusion for Image Editing via Step-Wise Alignment

The use of denoising diffusion models is becoming increasingly popular in the field of image editing. However, current approaches often rely on either image-guided methods, which provide a visual reference but lack control over semantic…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Zhanbo Feng , Zenan Ling , Xinyu Lu , Ci Gong , Feng Zhou , Wugedele Bao , Jie Li , Fan Yang , Robert C. Qiu

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Recent works have explored text-guided image editing using diffusion models and generated edited images based on text prompts. However, the models struggle to accurately locate the regions to be edited and faithfully perform precise edits.…

Computer Vision and Pattern Recognition · Computer Science 2023-05-30 Qian Wang , Biao Zhang , Michael Birsak , Peter Wonka

PDPP: Projected Diffusion for Procedure Planning in Instructional Videos

In this paper, we study the problem of procedure planning in instructional videos, which aims to make a plan (i.e. a sequence of actions) given the current visual observation and the desired goal. Previous works cast this as a sequence…

Computer Vision and Pattern Recognition · Computer Science 2025-01-23 Hanlin Wang , Yilu Wu , Sheng Guo , Limin Wang