Related papers: SwiftDiffusion: Efficient Diffusion Model Serving …

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Recently, diffusion models have made remarkable progress in text-to-image (T2I) generation, synthesizing images with high fidelity and diverse contents. Despite this advancement, latent space smoothness within diffusion models remains…

Computer Vision and Pattern Recognition · Computer Science 2023-12-08 Jiayi Guo , Xingqian Xu , Yifan Pu , Zanlin Ni , Chaofei Wang , Manushree Vasu , Shiji Song , Gao Huang , Humphrey Shi

GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads

Diffusion models have emerged as the prevailing approach for text-to-image (T2I) and text-to-video (T2V) generation, yet production platforms must increasingly serve both modalities on shared GPU clusters while meeting stringent latency…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-10 Fanjiang Ye , Zhangke Li , Xinrui Zhong , Ethan Ma , Russell Chen , Kaijian Wang , Jingwei Zuo , Desen Sun , Ye Cao , Triston Cao , Myungjin Lee , Arvind Krishnamurthy , Yuke Wang

Enhancing Text-to-Image Generation via End-Edge Collaborative Hybrid Super-Resolution

Artificial Intelligence-Generated Content (AIGC) has made significant strides, with high-resolution text-to-image (T2I) generation becoming increasingly critical for improving users' Quality of Experience (QoE). Although…

Computer Vision and Pattern Recognition · Computer Science 2026-01-22 Chongbin Yi , Yuxin Liang , Ziqi Zhou , Peng Yang

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

In this paper, we aim to enhance the performance of SwiftBrush, a prominent one-step text-to-image diffusion model, to be competitive with its multi-step Stable Diffusion counterpart. Initially, we explore the quality-diversity trade-off…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Trung Dao , Thuan Hoang Nguyen , Thanh Le , Duc Vu , Khoi Nguyen , Cuong Pham , Anh Tran

KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis

As text-to-image (T2I) synthesis models increase in size, they demand higher inference costs due to the need for more expensive GPUs with larger memory, which makes it challenging to reproduce these models in addition to the restricted…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Youngwan Lee , Kwanyong Park , Yoorhim Cho , Yong-Ju Lee , Sung Ju Hwang

ProxT2I: Efficient Reward-Guided Text-to-Image Generation via Proximal Diffusion

Diffusion models have emerged as a dominant paradigm for generative modeling across a wide range of domains, including prompt-conditional generation. The vast majority of samplers, however, rely on forward discretization of the reverse…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Zhenghan Fang , Jian Zheng , Qiaozi Gao , Xiaofeng Gao , Jeremias Sulam

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

Text-to-image (T2I) models are well known for their ability to produce highly realistic images, while multimodal large language models (MLLMs) are renowned for their proficiency in understanding and integrating multiple modalities. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Jian Ma , Qirong Peng , Xu Guo , Chen Chen , Haonan Lu , Zhenyu Yang

Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation

Text-to-image (T2I) generative diffusion models have demonstrated outstanding performance in synthesizing diverse, high-quality visuals from text captions. Several layout-to-image models have been developed to control the generation process…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Ahmad Süleyman , Göksel Biricik

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Large-scale diffusion models have achieved state-of-the-art results on text-to-image synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we observe that attribution-binding and compositional…

Computer Vision and Pattern Recognition · Computer Science 2023-03-02 Weixi Feng , Xuehai He , Tsu-Jui Fu , Varun Jampani , Arjun Akula , Pradyumna Narayana , Sugato Basu , Xin Eric Wang , William Yang Wang

A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies

The Stable Diffusion Model (SDM) is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. Despite various attempts at sampler optimization, model distillation, and network quantification, these…

Computer Vision and Pattern Recognition · Computer Science 2024-06-18 Jinchao Zhu , Yuxuan Wang , Siyuan Pan , Pengfei Wan , Di Zhang , Gao Huang

MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models

Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation as well as spatially conditioned image generation. For most applications, we can train the model end-toend with paired…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Nithin Gopalakrishnan Nair , Jeya Maria Jose Valanarasu , Vishal M Patel

PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

The most advanced text-to-image (T2I) models require significant training costs (e.g., millions of GPU hours), seriously hindering the fundamental innovation for the AIGC community while increasing CO2 emissions. This paper introduces…

Computer Vision and Pattern Recognition · Computer Science 2024-01-01 Junsong Chen , Jincheng Yu , Chongjian Ge , Lewei Yao , Enze Xie , Yue Wu , Zhongdao Wang , James Kwok , Ping Luo , Huchuan Lu , Zhenguo Li

DiffI2I: Efficient Diffusion Model for Image-to-Image Translation

The Diffusion Model (DM) has emerged as the SOTA approach for image synthesis. However, the existing DM cannot perform well on some image-to-image translation (I2I) tasks. Different from image synthesis, some I2I tasks, such as…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Bin Xia , Yulun Zhang , Shiyin Wang , Yitong Wang , Xinglong Wu , Yapeng Tian , Wenming Yang , Radu Timotfe , Luc Van Gool

A-SDM: Accelerating Stable Diffusion through Redundancy Removal and Performance Optimization

The Stable Diffusion Model (SDM) is a popular and efficient text-to-image (t2i) generation and image-to-image (i2i) generation model. Although there have been some attempts to reduce sampling steps, model distillation, and network…

Computer Vision and Pattern Recognition · Computer Science 2024-03-06 Jinchao Zhu , Yuxuan Wang , Xiaobing Tu , Siyuan Pan , Pengfei Wan , Gao Huang

STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation

In layout-to-image (L2I) synthesis, controlled complex scenes are generated from coarse information like bounding boxes. Such a task is exciting to many downstream applications because the input layouts offer strong guidance to the…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Ruyu Wang , Xuefeng Hou , Sabrina Schmedding , Marco F. Huber

PATCHEDSERVE: A Patch Management Framework for SLO-Optimized Hybrid Resolution Diffusion Serving

The Text-to-Image (T2I) diffusion model has emerged as one of the most widely adopted generative models. However, serving diffusion models at the granularity of entire images introduces significant challenges, particularly under…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-07 Desen Sun , Zepeng Zhao , Yuke Wang

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

With the advance of text-to-image (T2I) diffusion models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable…

Computer Vision and Pattern Recognition · Computer Science 2024-02-09 Yuwei Guo , Ceyuan Yang , Anyi Rao , Zhengyang Liang , Yaohui Wang , Yu Qiao , Maneesh Agrawala , Dahua Lin , Bo Dai

Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Recently, large-scale text-to-image (T2I) diffusion models have emerged as a powerful tool for image-to-image translation (I2I), allowing open-domain image translation via user-provided text prompts. This paper proposes frequency-controlled…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Xiang Gao , Zhengbo Xu , Junhan Zhao , Jiaying Liu

AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation

Text-to-Image (T2I) diffusion models have achieved remarkable success in image generation. Despite their progress, challenges remain in both prompt-following ability, image quality and lack of high-quality datasets, which are essential for…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Jingkun An , Yinghao Zhu , Zongjian Li , Enshen Zhou , Haoran Feng , Xijie Huang , Bohua Chen , Yemin Shi , Chengwei Pan

NOFT: Test-Time Noise Finetune via Information Bottleneck for Highly Correlated Asset Creation

The diffusion model has provided a strong tool for implementing text-to-image (T2I) and image-to-image (I2I) generation. Recently, topology and texture control are popular explorations, e.g., ControlNet, IP-Adapter, Ctrl-X, and DSG. These…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Jia Li , Nan Gao , Huaibo Huang , Ran He