Related papers: Diffusion Explainer: Visual Explanation for Text-t…

Interactive Visual Learning for Stable Diffusion

Diffusion-based generative models' impressive ability to create convincing images has garnered global attention. However, their complex internal structures and operations often pose challenges for non-experts to grasp. We introduce…

Human-Computer Interaction · Computer Science 2024-04-26 Seongmin Lee , Benjamin Hoover , Hendrik Strobelt , Zijie J. Wang , ShengYun Peng , Austin Wright , Kevin Li , Haekyu Park , Haoyang Yang , Polo Chau

The CLIP Model is Secretly an Image-to-Prompt Converter

The Stable Diffusion model is a prominent text-to-image generation model that relies on a text prompt as its input, which is encoded using the Contrastive Language-Image Pre-Training (CLIP). However, text prompts have limitations when it…

Computer Vision and Pattern Recognition · Computer Science 2024-02-16 Yuxuan Ding , Chunna Tian , Haoxuan Ding , Lingqiao Liu

The Hidden Language of Diffusion Models

Text-to-image diffusion models have demonstrated an unparalleled ability to generate high-quality, diverse images from a textual prompt. However, the internal representations learned by these models remain an enigma. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2023-10-06 Hila Chefer , Oran Lang , Mor Geva , Volodymyr Polosukhin , Assaf Shocher , Michal Irani , Inbar Mosseri , Lior Wolf

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Xuehai He , Weixi Feng , Tsu-Jui Fu , Varun Jampani , Arjun Akula , Pradyumna Narayana , Sugato Basu , William Yang Wang , Xin Eric Wang

TextDiffuser: Diffusion Models as Text Painters

Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Jingye Chen , Yupan Huang , Tengchao Lv , Lei Cui , Qifeng Chen , Furu Wei

Tutorial on Diffusion Models for Imaging and Vision

The astonishing growth of generative tools in recent years has empowered many exciting applications in text-to-image generation and text-to-video generation. The underlying principle behind these generative tools is the concept of…

Machine Learning · Computer Science 2025-01-09 Stanley H. Chan

Memory Triggers: Unveiling Memorization in Text-To-Image Generative Models through Word-Level Duplication

Diffusion-based models, such as the Stable Diffusion model, have revolutionized text-to-image synthesis with their ability to produce high-quality, high-resolution images. These advancements have prompted significant progress in image…

Cryptography and Security · Computer Science 2023-12-07 Ali Naseh , Jaechul Roh , Amir Houmansadr

DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models

With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model…

Computer Vision and Pattern Recognition · Computer Science 2023-07-07 Zijie J. Wang , Evan Montoya , David Munechika , Haoyang Yang , Benjamin Hoover , Duen Horng Chau

Evaluating a Synthetic Image Dataset Generated with Stable Diffusion

We generate synthetic images with the "Stable Diffusion" image generation model using the Wordnet taxonomy and the definitions of concepts it contains. This synthetic image database can be used as training data for data augmentation in…

Computer Vision and Pattern Recognition · Computer Science 2022-11-07 Andreas Stöckl

Boosting GUI Prototyping with Diffusion Models

GUI (graphical user interface) prototyping is a widely-used technique in requirements engineering for gathering and refining requirements, reducing development risks and increasing stakeholder engagement. However, GUI prototyping can be a…

Software Engineering · Computer Science 2023-10-05 Jialiang Wei , Anne-Lise Courbis , Thomas Lambolais , Binbin Xu , Pierre Louis Bernard , Gérard Dray

Reverse Stable Diffusion: What prompt was used to generate this image?

Text-to-image diffusion models have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Florinel-Alin Croitoru , Vlad Hondru , Radu Tudor Ionescu , Mubarak Shah

Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models

We present DiffExplainer, a novel framework that, leveraging language-vision models, enables multimodal global explainability. DiffExplainer employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Matteo Pennisi , Giovanni Bellitto , Simone Palazzo , Mubarak Shah , Concetto Spampinato

Explaining generative diffusion models via visual analysis for interpretable decision-making process

Diffusion models have demonstrated remarkable performance in generation tasks. Nevertheless, explaining the diffusion process remains challenging due to it being a sequence of denoising noisy images that are difficult for experts to…

Computer Vision and Pattern Recognition · Computer Science 2024-02-19 Ji-Hoon Park , Yeong-Joon Ju , Seong-Whan Lee

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering

The text-to-image synthesis by diffusion models has recently shown remarkable performance in generating high-quality images. Although performs well for simple texts, the models may get confused when faced with complex texts that contain…

Computer Vision and Pattern Recognition · Computer Science 2024-01-15 Chang Yu , Junran Peng , Xiangyu Zhu , Zhaoxiang Zhang , Qi Tian , Zhen Lei

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

Diffusion-based methods can generate realistic images and videos, but they struggle to edit existing objects in a video while preserving their appearance over time. This prevents diffusion models from being applied to natural video editing…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Wenhao Chai , Xun Guo , Gaoang Wang , Yan Lu

Diffusion Explorer: Interactive Exploration of Diffusion Models

Diffusion models have been central to the development of recent image, video, and even text generation systems. They posses striking geometric properties that can be faithfully portrayed in low-dimensional settings. However, existing…

Machine Learning · Computer Science 2025-07-08 Alec Helbling , Duen Horng Chau

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Text-to-image diffusion models (T2I) use a latent representation of a text prompt to guide the image generation process. However, the process by which the encoder produces the text representation is unknown. We propose the Diffusion Lens, a…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Michael Toker , Hadas Orgad , Mor Ventura , Dana Arad , Yonatan Belinkov

Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion

Recent progress in text-to-image (TTI) systems, such as StableDiffusion, Imagen, and DALL-E 2, have made it possible to create realistic images with simple text prompts. It is tempting to use these systems to eliminate the manual task of…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 David Marwood , Shumeet Baluja , Yair Alon

TextCraftor: Your Text Encoder Can be Image Quality Controller

Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation, enabling significant advancements in areas like image editing and video synthesis. Despite their formidable…

Computer Vision and Pattern Recognition · Computer Science 2024-03-29 Yanyu Li , Xian Liu , Anil Kag , Ju Hu , Yerlan Idelbayev , Dhritiman Sagar , Yanzhi Wang , Sergey Tulyakov , Jian Ren

Detecting Images Generated by Diffusers

This paper explores the task of detecting images generated by text-to-image diffusion models. To evaluate this, we consider images generated from captions in the MSCOCO and Wikimedia datasets using two state-of-the-art models: Stable…

Computer Vision and Pattern Recognition · Computer Science 2023-04-24 Davide Alessandro Coccomini , Andrea Esuli , Fabrizio Falchi , Claudio Gennaro , Giuseppe Amato