Related papers: PipeFusion: Patch-level Pipeline Parallelism for D…

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Muyang Li , Tianle Cai , Jiaxin Cao , Qinsheng Zhang , Han Cai , Junjie Bai , Yangqing Jia , Ming-Yu Liu , Kai Li , Song Han

SwiftFusion: Scalable Sequence Parallelism for Distributed Inference of Diffusion Transformers on GPUs

Diffusion Transformers (DiTs) have gained increasing adoption in high-quality image and video generation. As demand for higher-resolution images and longer videos increases, single-GPU inference becomes inefficient due to increased latency…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-26 Jiacheng Yang , Jun Wu , Yaoyao Ding , Zhiying Xu , Yida Wang , Gennady Pekhimenko

PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling

Video generation has been advancing rapidly, and diffusion transformer (DiT) based models have demonstrated remark- able capabilities. However, their practical deployment is of- ten hindered by slow inference speeds and high memory con-…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Sijie Wang , Qiang Wang , Shaohuai Shi

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Diffusion models are pivotal for generating high-quality images and videos. Inspired by the success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to Transformer, known as Diffusion Transformers (DiTs). However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-05 Jiarui Fang , Jinzhe Pan , Xibo Sun , Aoyu Li , Jiannan Wang

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Euisoo Jung , Byunghyun Kim , Hyunjin Kim , Seonghye Cho , Jae-Gil Lee

DisagFusion: Asynchronous Pipeline Parallelism and Elastic Scheduling for Disaggregated Diffusion Serving

Diffusion-based generation is increasingly powering production content pipelines; however, deploying these models at scale remains a significant challenge. Model weights frequently exceed the memory capacity of commodity GPUs, while the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-26 Hantian Zha , Teng Ma , Yang Yong , Haiwen Fu , Ruiyang Ma , Wei Gao , Ruihao Gong , Xianglong Liu , Wei Wang , Yunpeng Chai

Accelerating Parallel Diffusion Model Serving with Residual Compression

Diffusion models produce realistic images and videos but require substantial computational resources, necessitating multi-accelerator parallelism for real-time deployment. However, parallel inference introduces significant communication…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Jiajun Luo , Yicheng Xiao , Jianru Xu , Yangxiu You , Rongwei Lu , Chen Tang , Jingyan Jiang , Zhi Wang

Minute-Long Videos with Dual Parallelisms

Diffusion Transformer (DiT)-based video diffusion models generate high-quality videos at scale but incur prohibitive processing latency and memory costs for long videos. To address this, we propose a novel distributed inference strategy,…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Zeqing Wang , Bowen Zheng , Xingyi Yang , Zhenxiong Tan , Yuecong Xu , Xinchao Wang

Partially Conditioned Patch Parallelism for Accelerated Diffusion Model Inference

Diffusion models have exhibited exciting capabilities in generating images and are also very promising for video creation. However, the inference speed of diffusion models is limited by the slow sampling process, restricting its use cases.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 XiuYu Zhang , Zening Luo , Michelle E. Lu

Pipeline Parallelism for Inference on Heterogeneous Edge Computing

Deep neural networks with large model sizes achieve state-of-the-art results for tasks in computer vision (CV) and natural language processing (NLP). However, these large-scale models are too compute- or memory-intensive for…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-29 Yang Hu , Connor Imes , Xuanang Zhao , Souvik Kundu , Peter A. Beerel , Stephen P. Crago , John Paul N. Walters

DiP: Taming Diffusion Models in Pixel Space

Diffusion models face a fundamental trade-off between generation quality and computational efficiency. Latent Diffusion Models (LDMs) offer an efficient solution but suffer from potential information loss and non-end-to-end training. In…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Zhennan Chen , Junwei Zhu , Xu Chen , Jiangning Zhang , Xiaobin Hu , Hanzhen Zhao , Chengjie Wang , Jian Yang , Ying Tai

MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model

Transformer architectures, particularly Diffusion Transformers (DiTs), have become widely used in diffusion and flow-matching models due to their strong performance compared to convolutional UNets. However, the isotropic design of DiTs…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Quan Dao , Dimitris Metaxas

BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating Large Models Training

With the increasing scale of models, the need for efficient distributed training has become increasingly urgent. Recently, many synchronous pipeline parallelism approaches have been proposed to improve training throughput. However, these…

Machine Learning · Computer Science 2024-10-28 Houming Wu , Ling Chen , Wenjie Yu

PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing

Long-form video editing poses unique challenges due to the exponential increase in the computational cost from joint editing and Denoising Diffusion Implicit Models (DDIM) inversion across extended sequences. To address these limitations,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-01 Mustafa Munir , Md Mostafijur Rahman , Kartikeya Bhardwaj , Paul Whatmough , Radu Marculescu

Pyramidal Patchification Flow for Visual Generation

Diffusion transformers (DiTs) adopt Patchify, mapping patch representations to token representations through linear projections, to adjust the number of tokens input to DiT blocks and thus the computation cost. Instead of a single patch…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Hui Li , Baoyou Chen , Liwei Zhang , Jiaye Li , Jingdong Wang , Siyu Zhu

STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs

The escalating adoption of diffusion models for applications such as image generation demands efficient parallel inference techniques to manage their substantial computational cost. However, existing diffusion parallelism inference schemes…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-16 Han Liang , Jiahui Zhou , Zicheng Zhou , Xiaoxi Zhang , Xu Chen

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Akio Kodaira , Chenfeng Xu , Toshiki Hazama , Takanori Yoshimoto , Kohei Ohno , Shogo Mitsuhori , Soichi Sugano , Hanying Cho , Zhijian Liu , Masayoshi Tomizuka , Kurt Keutzer

Communication-Efficient Serving for Video Diffusion Models with Latent Parallelism

Video diffusion models (VDMs) perform attention computation over the 3D spatio-temporal domain. Compared to large language models (LLMs) processing 1D sequences, their memory consumption scales cubically, necessitating parallel serving…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-09 Zhiyuan Wu , Shuai Wang , Li Chen , Kaihui Gao , Dan Li , Yanyu Ren , Qiming Zhang , Yong Wang

PixelDiT: Pixel Diffusion Transformers for Image Generation

Latent-space modeling has been the standard for Diffusion Transformers (DiTs). However, it relies on a two-stage pipeline where the pretrained autoencoder introduces lossy reconstruction, leading to error accumulation while hindering joint…

Computer Vision and Pattern Recognition · Computer Science 2026-04-17 Yongsheng Yu , Wei Xiong , Weili Nie , Yichen Sheng , Shiqiu Liu , Jiebo Luo

LinFusion: 1 GPU, 1 Minute, 16K Image

Modern diffusion models, particularly those utilizing a Transformer-based UNet for denoising, rely heavily on self-attention operations to manage complex spatial relationships, thus achieving impressive generation performance. However, this…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Songhua Liu , Weihao Yu , Zhenxiong Tan , Xinchao Wang