English
Related papers

Related papers: Implicit and Explicit Language Guidance for Diffus…

200 papers

Diffusion models (DMs) have become the new trend of generative models and have demonstrated a powerful ability of conditional synthesis. Among those, text-to-image diffusion models pre-trained on large-scale image-text pairs are highly…

Computer Vision and Pattern Recognition · Computer Science 2023-03-06 Wenliang Zhao , Yongming Rao , Zuyan Liu , Benlin Liu , Jie Zhou , Jiwen Lu

Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion…

Computer Vision and Pattern Recognition · Computer Science 2023-03-15 Yogesh Balaji , Seungjun Nah , Xun Huang , Arash Vahdat , Jiaming Song , Qinsheng Zhang , Karsten Kreis , Miika Aittala , Timo Aila , Samuli Laine , Bryan Catanzaro , Tero Karras , Ming-Yu Liu

In light of the remarkable success of in-context learning in large language models, its potential extension to the vision domain, particularly with visual foundation models like Stable Diffusion, has sparked considerable interest. Existing…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Tianqi Chen , Yongfei Liu , Zhendong Wang , Jianbo Yuan , Quanzeng You , Hongxia Yang , Mingyuan Zhou

The issue of generative pretraining for vision models has persisted as a long-standing conundrum. At present, the text-to-image (T2I) diffusion model demonstrates remarkable proficiency in generating high-definition images matching textual…

Computer Vision and Pattern Recognition · Computer Science 2023-12-25 Qiang Wan , Zilong Huang , Bingyi Kang , Jiashi Feng , Li Zhang

We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Zhendong Wang , Yifan Jiang , Yadong Lu , Yelong Shen , Pengcheng He , Weizhu Chen , Zhangyang Wang , Mingyuan Zhou

Recently large-scale language-image models (e.g., text-guided diffusion models) have considerably improved the image generation capabilities to generate photorealistic images in various domains. Based on this success, current image editing…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Wenkai Dong , Song Xue , Xiaoyue Duan , Shumin Han

We present a simple but effective training-free approach for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our goal is to generate an image that aligns with the target task while preserving the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-23 Hyunsoo Lee , Minsoo Kang , Bohyung Han

As one of the most successful generative models, diffusion models have demonstrated remarkable efficacy in synthesizing high-quality images. These models learn the underlying high-dimensional data distribution in an unsupervised manner.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-12 Min Hou , Yueying Wu , Chang Xu , Yu-Hao Huang , Chenxi Bai , Le Wu , Jiang Bian

While diffusion models have achieved remarkable success in text-to-image generation, they encounter significant challenges with instruction-driven image editing. Our research highlights a key challenge: these models particularly struggle…

Computer Vision and Pattern Recognition · Computer Science 2025-06-05 Yujia Hu , Songhua Liu , Zhenxiong Tan , Xingyi Yang , Xinchao Wang

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a…

Computer Vision and Pattern Recognition · Computer Science 2023-05-16 Yuanfeng Ji , Zhe Chen , Enze Xie , Lanqing Hong , Xihui Liu , Zhaoqiang Liu , Tong Lu , Zhenguo Li , Ping Luo

Text-to-image diffusion models have emerged as powerful tools for high-quality image generation and editing. Many existing approaches rely on text prompts as editing guidance. However, these methods are constrained by the need for manual…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Yuanyuan Chang , Yinghua Yao , Tao Qin , Mengmeng Wang , Ivor Tsang , Guang Dai

Depth-guided multimodal fusion combines depth information from visible and infrared images, significantly enhancing the performance of 3D reconstruction and robotics applications. Existing thermal-visible image fusion mainly focuses on…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Jinchang Zhang , Zijun Li , Guoyu Lu

Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion models primarily focus on incorporating text-visual…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ling Yang , Zhilong Zhang , Zhaochen Yu , Jingwei Liu , Minkai Xu , Stefano Ermon , Bin Cui

Text-to-image diffusion models sometimes depict blended concepts in the generated images. One promising use case of this effect would be the nonword-to-image generation task which attempts to generate images intuitively imaginable from a…

Multimedia · Computer Science 2024-11-07 Chihaya Matsuhira , Marc A. Kastner , Takahiro Komamizu , Takatsugu Hirayama , Ichiro Ide

Recent advancements in Text-to-Image (T2I) diffusion models have demonstrated impressive success in generating high-quality images with zero-shot generalization capabilities. Yet, current models struggle to closely adhere to prompt…

Computer Vision and Pattern Recognition · Computer Science 2024-01-31 Hyun Kang , Dohae Lee , Myungjin Shin , In-Kwon Lee

Text-to-image diffusion models have achieved remarkable fidelity in synthesizing images from explicit text prompts, yet exhibit a critical deficiency in processing implicit prompts that require deep-level world knowledge, ranging from…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Xiefan Guo , Xinzhu Ma , Haoxiang Ma , Zihao Zhou , Di Huang

Visuomotor imitation learning policies enable robots to efficiently acquire manipulation skills from visual demonstrations. However, as scene complexity and visual distractions increase, policies that perform well in simple settings often…

Artificial Intelligence · Computer Science 2025-11-11 Yuhang Dong , Haizhou Ge , Yupei Zeng , Jiangning Zhang , Beiwen Tian , Hongrui Zhu , Yufei Jia , Ruixiang Wang , Zhucun Xue , Guyue Zhou , Longhua Ma , Guanzhong Tian

The use of denoising diffusion models is becoming increasingly popular in the field of image editing. However, current approaches often rely on either image-guided methods, which provide a visual reference but lack control over semantic…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Zhanbo Feng , Zenan Ling , Xinyu Lu , Ci Gong , Feng Zhou , Wugedele Bao , Jie Li , Fan Yang , Robert C. Qiu

Recent works have explored text-guided image editing using diffusion models and generated edited images based on text prompts. However, the models struggle to accurately locate the regions to be edited and faithfully perform precise edits.…

Computer Vision and Pattern Recognition · Computer Science 2023-05-30 Qian Wang , Biao Zhang , Michael Birsak , Peter Wonka

In this paper, we study the problem of procedure planning in instructional videos, which aims to make a plan (i.e. a sequence of actions) given the current visual observation and the desired goal. Previous works cast this as a sequence…

Computer Vision and Pattern Recognition · Computer Science 2025-01-23 Hanlin Wang , Yilu Wu , Sheng Guo , Limin Wang
‹ Prev 1 2 3 10 Next ›