Related papers: LogoSticker: Inserting Logos into Diffusion Models…

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific instance in novel contexts, i.e.,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-28 Shengqu Cai , Eric Chan , Yunzhi Zhang , Leonidas Guibas , Jiajun Wu , Gordon Wetzstein

LogoDiffuser: Training-Free Multilingual Logo Generation and Stylization via Letter-Aware Attention Control

Recent advances in text-to-image generation have been remarkable, but generating multilingual design logos that harmoniously integrate visual and textual elements remains a challenging task. Existing methods often distort character geometry…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Mingyu Kang , Hyein Seo , Yuna Jeong , Junhyeong Park , Yong Suk Choi

Calligrapher: Freestyle Text Image Customization

We introduce Calligrapher, a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Addressing the challenges of precise style…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Yue Ma , Qingyan Bai , Hao Ouyang , Ka Leong Cheng , Qiuyu Wang , Hongyu Liu , Zichen Liu , Haofan Wang , Jingye Chen , Yujun Shen , Qifeng Chen

Identity Encoder for Personalized Diffusion

Many applications can benefit from personalized image generation models, including image enhancement, video conferences, just to name a few. Existing works achieved personalization by fine-tuning one model for each person. While being…

Computer Vision and Pattern Recognition · Computer Science 2023-04-18 Yu-Chuan Su , Kelvin C. K. Chan , Yandong Li , Yang Zhao , Han Zhang , Boqing Gong , Huisheng Wang , Xuhui Jia

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

Recently, large-scale diffusion models, e.g., Stable diffusion and DallE2, have shown remarkable results on image synthesis. On the other hand, large-scale cross-modal pre-trained models (e.g., CLIP, ALIGN, and FILIP) are competent for…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Runhui Huang , Jianhua Han , Guansong Lu , Xiaodan Liang , Yihan Zeng , Wei Zhang , Hang Xu

Beyond Inserting: Learning Identity Embedding for Semantic-Fidelity Personalized Diffusion Generation

Advanced diffusion-based Text-to-Image (T2I) models, such as the Stable Diffusion Model, have made significant progress in generating diverse and high-quality images using text prompts alone. However, when non-famous users require…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Yang Li , Songlin Yang , Wei Wang , Jing Dong

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models

Recent text-to-image personalization methods have shown great promise in teaching a diffusion model user-specified concepts given a few images for reusing the acquired concepts in a novel context. With massive efforts being dedicated to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-31 Zhengyang Yu , Zhaoyuan Yang , Jing Zhang

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Jinglong Wang , Xiawei Li , Jing Zhang , Qingyuan Xu , Qin Zhou , Qian Yu , Lu Sheng , Dong Xu

Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion

Recent progress in text-to-image (TTI) systems, such as StableDiffusion, Imagen, and DALL-E 2, have made it possible to create realistic images with simple text prompts. It is tempting to use these systems to eliminate the manual task of…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 David Marwood , Shumeet Baluja , Yair Alon

Implementing and Experimenting with Diffusion Models for Text-to-Image Generation

Taking advantage of the many recent advances in deep learning, text-to-image generative models currently have the merit of attracting the general public attention. Two of these models, DALL-E 2 and Imagen, have demonstrated that highly…

Computer Vision and Pattern Recognition · Computer Science 2022-09-23 Robin Zbinden

GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?

Large-scale graphs with node attributes are increasingly common in various real-world applications. Creating synthetic, attribute-rich graphs that mirror real-world examples is crucial, especially for sharing graph data for analysis and…

Machine Learning · Computer Science 2024-10-17 Mufei Li , Eleonora Kreačić , Vamsi K. Potluru , Pan Li

Measuring the Success of Diffusion Models at Imitating Human Artists

Modern diffusion models have set the state-of-the-art in AI image generation. Their success is due, in part, to training on Internet-scale data which often includes copyrighted work. This prompts questions about the extent to which these…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Stephen Casper , Zifan Guo , Shreya Mogulothu , Zachary Marinov , Chinmay Deshpande , Rui-Jie Yew , Zheng Dai , Dylan Hadfield-Menell

Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling. While various techniques address these computational challenges, a less-explored issue is designing an…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Huangjie Zheng , Zhendong Wang , Jianbo Yuan , Guanghan Ning , Pengcheng He , Quanzeng You , Hongxia Yang , Mingyuan Zhou

ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models

Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Jianyi Zhang , Yufan Zhou , Jiuxiang Gu , Curtis Wigington , Tong Yu , Yiran Chen , Tong Sun , Ruiyi Zhang

Diffusion Autoencoders are Scalable Image Tokenizers

Tokenizing images into compact visual representations is a key step in learning efficient and high-quality image generative models. We present a simple diffusion tokenizer (DiTo) that learns compact visual representations for image…

Computer Vision and Pattern Recognition · Computer Science 2025-01-31 Yinbo Chen , Rohit Girdhar , Xiaolong Wang , Sai Saketh Rambhatla , Ishan Misra

Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers

We introduce Diff-Tracker, a novel approach for the challenging unsupervised visual tracking task leveraging the pre-trained text-to-image diffusion model. Our main idea is to leverage the rich knowledge encapsulated within the pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Zhengbo Zhang , Li Xu , Duo Peng , Hossein Rahmani , Jun Liu

InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture

Recent advancements in image synthesis are fueled by the advent of large-scale diffusion models. Yet, integrating realistic object visualizations seamlessly into new or existing backgrounds without extensive training remains a challenge.…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Phillip Mueller , Jannik Wiese , Ioan Craciun , Lars Mikelsons

Inserting Anybody in Diffusion Models via Celeb Basis

Exquisite demand exists for customizing the pretrained large text-to-image model, $\textit{e.g.}$, Stable Diffusion, to generate innovative concepts, such as the users themselves. However, the newly-added concept from previous customization…

Computer Vision and Pattern Recognition · Computer Science 2023-06-02 Ge Yuan , Xiaodong Cun , Yong Zhang , Maomao Li , Chenyang Qi , Xintao Wang , Ying Shan , Huicheng Zheng

TexSliders: Diffusion-Based Texture Editing in CLIP Space

Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion…

Graphics · Computer Science 2024-05-02 Julia Guerrero-Viu , Milos Hasan , Arthur Roullier , Midhun Harikumar , Yiwei Hu , Paul Guerrero , Diego Gutierrez , Belen Masia , Valentin Deschaintre

DreamReader: An Interpretability Toolkit for Text-to-Image Models

Despite the rapid adoption of text-to-image (T2I) diffusion models, causal and representation-level analysis remains fragmented and largely limited to isolated probing techniques. To address this gap, we introduce DreamReader: a unified…

Machine Learning · Computer Science 2026-03-17 Nirmalendu Prakash , Narmeen Oozeer , Michael Lan , Luka Samkharadze , Phillip Howard , Roy Ka-Wei Lee , Dhruv Nathawani , Shivam Raval , Amirali Abdullah