Related papers: Approximate Caching for Efficiently Serving Diffus…

FlexCache: Flexible Approximate Cache System for Video Diffusion

Text-to-Video applications receive increasing attention from the public. Among these, diffusion models have emerged as the most prominent approach, offering impressive quality in visual content generation. However, it still suffers from…

Multimedia · Computer Science 2025-01-09 Desen Sun , Henry Tian , Tim Lu , Sihang Liu

Semantic-Aware Caching for Efficient Image Generation in Edge Computing

Text-to-image generation employing diffusion models has attained significant popularity due to its capability to produce high-quality images that adhere to textual prompts. However, the integration of diffusion models faces critical…

Networking and Internet Architecture · Computer Science 2025-12-05 Hanshuai Cui , Zhiqing Tang , Zhi Yao , Weijia Jia , Wei Zhao

Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets

Text-to-image diffusion models enable high-quality image generation but are computationally expensive. While prior work optimizes per-inference efficiency, we explore an orthogonal approach: reducing redundancy across correlated prompts.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-29 Dale Decatur , Thibault Groueix , Wang Yifan , Rana Hanocka , Vladimir Kim , Matheus Gadelha

DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling

Text-to-image generation using diffusion models has gained increasing popularity due to their ability to produce high-quality, realistic images based on text prompts. However, efficiently serving these models is challenging due to their…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-13 Sohaib Ahmad , Qizheng Yang , Haoliang Wang , Ramesh K. Sitaraman , Hui Guan

MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models

Diffusion-based text-to-image generation models trade latency for quality: small models are fast but generate lower-quality images, while large models produce better images but are slow. We present MoDM, a novel caching-based serving system…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-05 Yuchen Xia , Divyam Sharma , Yichao Yuan , Souvik Kundu , Nishil Talati

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers. However, these models are large, with complex network architectures and tens of…

Computer Vision and Pattern Recognition · Computer Science 2023-10-17 Yanyu Li , Huan Wang , Qing Jin , Ju Hu , Pavlo Chemerys , Yun Fu , Yanzhi Wang , Sergey Tulyakov , Jian Ren

Cost-Aware Routing for Efficient Text-To-Image Generation

Diffusion models are well known for their ability to generate a high-fidelity image for an input prompt through an iterative denoising process. Unfortunately, the high fidelity also comes at a high computational cost due the inherently…

Computer Vision and Pattern Recognition · Computer Science 2025-06-24 Qinchan Li , Kenneth Chen , Changyue Su , Wittawat Jitkrittum , Qi Sun , Patsorn Sangkloy

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

Diffusion models have shown unprecedented success in the task of text-to-image generation. While these models are capable of generating high-quality and realistic images, the complexity of sequential denoising has raised societal concerns…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Qinchan Li , Kenneth Chen , Changyue Su , Qi Sun

Nested Diffusion Processes for Anytime Image Generation

Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Noam Elata , Bahjat Kawar , Tomer Michaeli , Michael Elad

DiffusionX: Efficient Edge-Cloud Collaborative Image Generation with Multi-Round Prompt Evolution

Recent advances in diffusion models have driven remarkable progress in image generation. However, the generation process remains computationally intensive, and users often need to iteratively refine prompts to achieve the desired results,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yi Wei , Shunpu Tang , Liang Zhao , Qiangian Yang

Prompt-Aware Scheduling for Efficient Text-to-Image Inferencing System

Traditional ML models utilize controlled approximations during high loads, employing faster, but less accurate models in a process called accuracy scaling. However, this method is less effective for generative text-to-image models due to…

Machine Learning · Computer Science 2025-02-12 Shubham Agarwal , Saud Iqbal , Subrata Mitra

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Diffusion models have emerged as frontrunners in text-to-image generation, but their fixed image resolution during training often leads to challenges in high-resolution image generation, such as semantic deviations and object replication.…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Haoning Wu , Shaocheng Shen , Qiang Hu , Xiaoyun Zhang , Ya Zhang , Yanfeng Wang

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks. However, diffusion models often struggle to produce images that accurately reflect the intended…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Yang Zhang , Teoh Tze Tzun , Lim Wei Hern , Tiviatis Sim , Kenji Kawaguchi

Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference

Due to the recent success of diffusion models, text-to-image generation is becoming increasingly popular and achieves a wide range of applications. Among them, text-to-image editing, or continuous text-to-image generation, attracts lots of…

Computer Vision and Pattern Recognition · Computer Science 2024-01-05 Zihao Yu , Haoyang Li , Fangcheng Fu , Xupeng Miao , Bin Cui

Development and Enhancement of Text-to-Image Diffusion Models

This research focuses on the development and enhancement of text-to-image denoising diffusion models, addressing key challenges such as limited sample diversity and training instability. By incorporating Classifier-Free Guidance (CFG) and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-10 Rajdeep Roshan Sahu

The CLIP Model is Secretly an Image-to-Prompt Converter

The Stable Diffusion model is a prominent text-to-image generation model that relies on a text prompt as its input, which is encoded using the Contrastive Language-Image Pre-Training (CLIP). However, text prompts have limitations when it…

Computer Vision and Pattern Recognition · Computer Science 2024-02-16 Yuxuan Ding , Chunna Tian , Haoxuan Ding , Lingqiao Liu

DeepCache: Accelerating Diffusion Models for Free

Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities. Notwithstanding their prowess, these models often incur substantial computational costs,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-11 Xinyin Ma , Gongfan Fang , Xinchao Wang

Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models

We introduce W\"urstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to…

Computer Vision and Pattern Recognition · Computer Science 2024-05-31 Pablo Pernias , Dominic Rampas , Mats L. Richter , Christopher J. Pal , Marc Aubreville

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering

The text-to-image synthesis by diffusion models has recently shown remarkable performance in generating high-quality images. Although performs well for simple texts, the models may get confused when faced with complex texts that contain…

Computer Vision and Pattern Recognition · Computer Science 2024-01-15 Chang Yu , Junran Peng , Xiangyu Zhu , Zhaoxiang Zhang , Qi Tian , Zhen Lei

ImageRAGTurbo: Towards One-step Text-to-Image Generation with Retrieval-Augmented Diffusion Models

Diffusion models have emerged as the leading approach for text-to-image generation. However, their iterative sampling process, which gradually morphs random noise into coherent images, introduces significant latency that limits their…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Peijie Qiu , Hariharan Ramshankar , Arnau Ramisa , René Vidal , Amit Kumar K C , Vamsi Salaka , Rahul Bhagat