Related papers: StreamDiffusion: A Pipeline-level Solution for Rea…

StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

Generative models are reshaping the live-streaming industry by redefining how content is created, styled, and delivered. Previous image-based streaming diffusion models have powered efficient and creative live streaming products but have…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Tianrui Feng , Zhi Li , Shuo Yang , Haocheng Xi , Muyang Li , Xiuyu Li , Lvmin Zhang , Keting Yang , Kelly Peng , Song Han , Maneesh Agrawala , Kurt Keutzer , Akio Kodaira , Chenfeng Xu

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference

The rapid progress in artificial intelligence-generated content (AIGC), especially with diffusion models, has significantly advanced development of high-quality video generation. However, current video diffusion models exhibit demanding…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Zheng Zhan , Yushu Wu , Yifan Gong , Zichong Meng , Zhenglun Kong , Changdi Yang , Geng Yuan , Pu Zhao , Wei Niu , Yanzhi Wang

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Diffusion-based video super-resolution (VSR) methods deliver strong perceptual quality but are often unsuitable for latency-sensitive scenarios due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Hau-Shiang Shiu , Chin-Yang Lin , Zhixiang Wang , Chi-Wei Hsiao , Po-Fan Yu , Yu-Chih Chen , Yu-Lun Liu

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Euisoo Jung , Byunghyun Kim , Hyunjin Kim , Seonghye Cho , Jae-Gil Lee

RenderFlow: Single-Step Neural Rendering via Flow Matching

Conventional physically based rendering (PBR) pipelines generate photorealistic images through computationally intensive light transport simulations. Although recent deep learning approaches leverage diffusion model priors with geometry…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Shenghao Zhang , Runtao Liu , Christopher Schroers , Yang Zhang

DRiffusion: Draft-and-Refine Process Parallelizes Diffusion Models with Ease

Diffusion models have achieved remarkable success in generating high-fidelity content but suffer from slow, iterative sampling, resulting in high latency that limits their use in interactive applications. We introduce DRiffusion, a parallel…

Machine Learning · Computer Science 2026-03-30 Runsheng Bai , Chengyu Zhang , Yangdong Deng

StreamFlow: Theory, Algorithm, and Implementation for High-Efficiency Rectified Flow Generation

New technologies such as Rectified Flow and Flow Matching have significantly improved the performance of generative models in the past two years, especially in terms of control accuracy, generation quality, and generation efficiency.…

Computer Vision and Pattern Recognition · Computer Science 2026-01-09 Sen Fang , Hongbin Zhong , Yalin Feng , Yanxin Zhang , Dimitris N. Metaxas

DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation

Top-leading solutions for Video Scene Graph Generation (VSGG) typically adopt an offline pipeline. Though demonstrating promising performance, they remain unable to handle real-time video streams and consume large GPU memory. Moreover,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Mu Chen , Liulei Li , Wenguan Wang , Yi Yang

PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling

Video generation has been advancing rapidly, and diffusion transformer (DiT) based models have demonstrated remark- able capabilities. However, their practical deployment is of- ten hindered by slow inference speeds and high memory con-…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Sijie Wang , Qiang Wang , Shaohuai Shi

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the-art performances in terms of FID and CLIP-Score…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Clément Chadebec , Onur Tasar , Eyal Benaroche , Benjamin Aubin

StreamDiT: Real-Time Streaming Text-to-Video Generation

Recently, great progress has been achieved in text-to-video (T2V) generation by scaling transformer-based diffusion models to billions of parameters, which can generate high-quality videos. However, existing models typically produce only…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Akio Kodaira , Tingbo Hou , Ji Hou , Markos Georgopoulos , Felix Juefei-Xu , Masayoshi Tomizuka , Yue Zhao

Real-Time Streamable Generative Speech Restoration with Flow Matching

Diffusion-based generative models have greatly impacted the speech processing field in recent years, exhibiting high speech naturalness and spawning a new research direction. Their application in real-time communication is, however, still…

Signal Processing · Electrical Eng. & Systems 2026-04-22 Simon Welker , Bunlong Lay , Maris Hillemann , Tal Peer , Timo Gerkmann

PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference

This paper presents PipeFusion, an innovative parallel methodology to tackle the high latency issues associated with generating high-resolution images using diffusion transformers (DiTs) models. PipeFusion partitions images into patches and…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Jiarui Fang , Jinzhe Pan , Aoyu Li , Xibo Sun , Jiannan Wang

From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation

Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Zishen Qu , Xuesong Li , Haijian Gu , Hongwei Kang , Quan Meng , Tianrui Niu , Xin Yang , Ruidong Pan

SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time

Generating high-resolution images with generative models has recently been made widely accessible by leveraging diffusion models pre-trained on large-scale datasets. Various techniques, such as MultiDiffusion and SyncDiffusion, have further…

Computer Vision and Pattern Recognition · Computer Science 2025-01-08 Stanislav Frolov , Brian B. Moser , Andreas Dengel

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

High-resolution video generation, while crucial for digital media and film, is computationally bottlenecked by the quadratic complexity of diffusion models, making practical inference infeasible. To address this, we introduce HiStream, an…

Computer Vision and Pattern Recognition · Computer Science 2025-12-29 Haonan Qiu , Shikun Liu , Zijian Zhou , Zhaochong An , Weiming Ren , Zhiheng Liu , Jonas Schult , Sen He , Shoufa Chen , Yuren Cong , Tao Xiang , Ziwei Liu , Juan-Manuel Perez-Rua

MotionStream: Real-Time Video Generation with Interactive Motion Controls

Current motion-conditioned video generation methods suffer from prohibitive latency (minutes per video) and non-causal processing that prevents real-time interaction. We present MotionStream, enabling sub-second latency with up to 29 FPS…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Joonghyuk Shin , Zhengqi Li , Richard Zhang , Jun-Yan Zhu , Jaesik Park , Eli Shechtman , Xun Huang

Improving Progressive Generation with Decomposable Flow Matching

Generating high-dimensional visual modalities is a computationally intensive task. A common solution is progressive generation, where the outputs are synthesized in a coarse-to-fine spectral autoregressive manner. While diffusion models…

Computer Vision and Pattern Recognition · Computer Science 2025-06-25 Moayed Haji-Ali , Willi Menapace , Ivan Skorokhodov , Arpit Sahni , Sergey Tulyakov , Vicente Ordonez , Aliaksandr Siarohin

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Muyang Li , Tianle Cai , Jiaxin Cao , Qinsheng Zhang , Han Cai , Junjie Bai , Yangqing Jia , Ming-Yu Liu , Kai Li , Song Han

Interactive Drawing Guidance for Anime Illustrations with Diffusion Model

Creating high-quality anime illustrations presents notable challenges, particularly for beginners, due to the intricate styles and fine details inherent in anime art. We present an interactive drawing guidance system specifically designed…

Graphics · Computer Science 2025-07-15 Chuang Chen , Xiaoxuan Xie , Yongming Zhang , Tianyu Zhang , Haoran Xie