Related papers: DistriFusion: Distributed Parallel Inference for H…

PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference

This paper presents PipeFusion, an innovative parallel methodology to tackle the high latency issues associated with generating high-resolution images using diffusion transformers (DiTs) models. PipeFusion partitions images into patches and…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Jiarui Fang , Jinzhe Pan , Aoyu Li , Xibo Sun , Jiannan Wang

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Euisoo Jung , Byunghyun Kim , Hyunjin Kim , Seonghye Cho , Jae-Gil Lee

Partially Conditioned Patch Parallelism for Accelerated Diffusion Model Inference

Diffusion models have exhibited exciting capabilities in generating images and are also very promising for video creation. However, the inference speed of diffusion models is limited by the slow sampling process, restricting its use cases.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 XiuYu Zhang , Zening Luo , Michelle E. Lu

STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs

The escalating adoption of diffusion models for applications such as image generation demands efficient parallel inference techniques to manage their substantial computational cost. However, existing diffusion parallelism inference schemes…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-16 Han Liang , Jiahui Zhou , Zicheng Zhou , Xiaoxi Zhang , Xu Chen

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency,…

Computer Vision and Pattern Recognition · Computer Science 2024-09-27 Zigeng Chen , Xinyin Ma , Gongfan Fang , Zhenxiong Tan , Xinchao Wang

Minute-Long Videos with Dual Parallelisms

Diffusion Transformer (DiT)-based video diffusion models generate high-quality videos at scale but incur prohibitive processing latency and memory costs for long videos. To address this, we propose a novel distributed inference strategy,…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Zeqing Wang , Bowen Zheng , Xingyi Yang , Zhenxiong Tan , Yuecong Xu , Xinchao Wang

DisagFusion: Asynchronous Pipeline Parallelism and Elastic Scheduling for Disaggregated Diffusion Serving

Diffusion-based generation is increasingly powering production content pipelines; however, deploying these models at scale remains a significant challenge. Model weights frequently exceed the memory capacity of commodity GPUs, while the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-26 Hantian Zha , Teng Ma , Yang Yong , Haiwen Fu , Ruiyang Ma , Wei Gao , Ruihao Gong , Xianglong Liu , Wei Wang , Yunpeng Chai

DRiffusion: Draft-and-Refine Process Parallelizes Diffusion Models with Ease

Diffusion models have achieved remarkable success in generating high-fidelity content but suffer from slow, iterative sampling, resulting in high latency that limits their use in interactive applications. We introduce DRiffusion, a parallel…

Machine Learning · Computer Science 2026-03-30 Runsheng Bai , Chengyu Zhang , Yangdong Deng

Decentralized Diffusion Models

Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic clusters can support, driving up…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 David McAllister , Matthew Tancik , Jiaming Song , Angjoo Kanazawa

SwiftFusion: Scalable Sequence Parallelism for Distributed Inference of Diffusion Transformers on GPUs

Diffusion Transformers (DiTs) have gained increasing adoption in high-quality image and video generation. As demand for higher-resolution images and longer videos increases, single-GPU inference becomes inefficient due to increased latency…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-26 Jiacheng Yang , Jun Wu , Yaoyao Ding , Zhiying Xu , Yida Wang , Gennady Pekhimenko

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Diffusion models have demonstrated remarkable performance in image and video synthesis. However, scaling them to high-resolution inputs is challenging and requires restructuring the diffusion pipeline into multiple independent components,…

Computer Vision and Pattern Recognition · Computer Science 2024-06-13 Ivan Skorokhodov , Willi Menapace , Aliaksandr Siarohin , Sergey Tulyakov

Video-Infinity: Distributed Long Video Generation

Diffusion models have recently achieved remarkable results for video generation. Despite the encouraging performances, the generated videos are typically constrained to a small number of frames, resulting in clips lasting merely a few…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zhenxiong Tan , Xingyi Yang , Songhua Liu , Xinchao Wang

LinFusion: 1 GPU, 1 Minute, 16K Image

Modern diffusion models, particularly those utilizing a Transformer-based UNet for denoising, rely heavily on self-attention operations to manage complex spatial relationships, thus achieving impressive generation performance. However, this…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Songhua Liu , Weihao Yu , Zhenxiong Tan , Xinchao Wang

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Zhendong Wang , Yifan Jiang , Huangjie Zheng , Peihao Wang , Pengcheng He , Zhangyang Wang , Weizhu Chen , Mingyuan Zhou

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data…

Computer Vision and Pattern Recognition · Computer Science 2023-10-16 Zhengxiong Luo , Dayou Chen , Yingya Zhang , Yan Huang , Liang Wang , Yujun Shen , Deli Zhao , Jingren Zhou , Tieniu Tan

Accelerating Parallel Diffusion Model Serving with Residual Compression

Diffusion models produce realistic images and videos but require substantial computational resources, necessitating multi-accelerator parallelism for real-time deployment. However, parallel inference introduces significant communication…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Jiajun Luo , Yicheng Xiao , Jianru Xu , Yangxiu You , Rongwei Lu , Chen Tang , Jingyan Jiang , Zhi Wang

Towards Consistent and Efficient Dataset Distillation via Diffusion-Driven Selection

Dataset distillation provides an effective approach to reduce memory and computational costs by optimizing a compact dataset that achieves performance comparable to the full original. However, for large-scale datasets and complex deep…

Computer Vision and Pattern Recognition · Computer Science 2025-11-14 Xinhao Zhong , Shuoyang Sun , Xulin Gu , Zhaoyang Xu , Yaowei Wang , Min Zhang , Bin Chen

GPU-Acceleration of Parallel Unconditionally Stable Group Explicit Finite Difference Method

Graphics Processing Units (GPUs) are high performance co-processors originally intended to improve the use and quality of computer graphics applications. Once, researchers and practitioners noticed the potential of using GPU for general…

Numerical Analysis · Computer Science 2016-07-12 K. Parand , Saeed Zafarvahedian , Sayyed A. Hossayni

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Diffusion models are pivotal for generating high-quality images and videos. Inspired by the success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to Transformer, known as Diffusion Transformers (DiTs). However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-05 Jiarui Fang , Jinzhe Pan , Xibo Sun , Aoyu Li , Jiannan Wang

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Diffusion models have emerged as frontrunners in text-to-image generation, but their fixed image resolution during training often leads to challenges in high-resolution image generation, such as semantic deviations and object replication.…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Haoning Wu , Shaocheng Shen , Qiang Hu , Xiaoyun Zhang , Ya Zhang , Yanfeng Wang