Related papers: SynerDiff: Synergetic Continuous Batching for Fast…

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency,…

Computer Vision and Pattern Recognition · Computer Science 2024-09-27 Zigeng Chen , Xinyin Ma , Gongfan Fang , Zhenxiong Tan , Xinchao Wang

InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models

Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present InvarDiff, a training-free acceleration method that…

Computer Vision and Pattern Recognition · Computer Science 2025-12-08 Zihao Wu

E-BATCH: Energy-Efficient and High-Throughput RNN Batching

Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-23 Franyell Silfa , Jose Maria Arnau , Antonio Gonzalez

Joint Batching and Scheduling for High-Throughput Multiuser Edge AI with Asynchronous Task Arrivals

In this paper, we study joint batching and (task) scheduling to maximise the throughput (i.e., the number of completed tasks) under the practical assumptions of heterogeneous task arrivals and deadlines. The design aims to optimise the…

Signal Processing · Electrical Eng. & Systems 2023-07-28 Yihan Cang , Ming Chen , Kaibin Huang

Unsupervised Medical Image Translation with Adversarial Diffusion Models

Imputation of missing images via source-to-target modality translation can improve diversity in medical imaging protocols. A pervasive approach for synthesizing target images involves one-shot mapping through generative adversarial networks…

Image and Video Processing · Electrical Eng. & Systems 2023-04-03 Muzaffer Özbey , Onat Dalmaz , Salman UH Dar , Hasan A Bedel , Şaban Özturk , Alper Güngör , Tolga Çukur

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

As deep neural networks (DNNs) are being applied to a wide range of edge intelligent applications, it is critical for edge inference platforms to have both high-throughput and low-latency at the same time. Such edge platforms with multiple…

Machine Learning · Computer Science 2023-05-03 Ziyang Zhang , Huan Li , Yang Zhao , Changyao Lin , Jie Liu

TarDiff: Target-Oriented Diffusion Guidance for Synthetic Electronic Health Record Time Series Generation

Synthetic Electronic Health Record (EHR) time-series generation is crucial for advancing clinical machine learning models, as it helps address data scarcity by providing more training data. However, most existing approaches focus primarily…

Machine Learning · Computer Science 2025-04-25 Bowen Deng , Chang Xu , Hao Li , Yuhao Huang , Min Hou , Jiang Bian

SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services

For servers incorporating parallel computing resources, batching is a pivotal technique for providing efficient and economical services at scale. Parallel computing resources exhibit heightened computational and energy efficiency when…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-07 Yaodan Xu , Sheng Zhou , Zhisheng Niu

Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation

Autoregressive(AR)-diffusion hybrid paradigms combine AR's structured modeling with diffusion's photorealistic synthesis, yet suffer from high latency due to sequential AR generation and iterative denoising. In this work, we tackle this…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Zhen Zou , Xiaoxiao Ma , Jie Huang , Zichao Yu , Feng Zhao

SynergAI: Edge-to-Cloud Synergy for Architecture-Driven High-Performance Orchestration for AI Inference

The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly heightened computational demands, particularly for inference-serving workloads. While traditional cloud-based deployments offer scalability,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-17 Foteini Stathopoulou , Aggelos Ferikoglou , Manolis Katsaragakis , Dimosthenis Masouros , Sotirios Xydis , Dimitrios Soudris

Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC

Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has been significant progress in accelerating both their training and inference using high-performance GPUs, FPGAs, and custom ASICs for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-07 Guanwen Zhong , Akshat Dubey , Tan Cheng , Tulika Mitra

Expert Streaming: Accelerating Low-Batch MoE Inference via Multi-chiplet Architecture and Dynamic Expert Trajectory Scheduling

Mixture-of-Experts is a promising approach for edge AI with low-batch inference. Yet, on-device deployments often face limited on-chip memory and severe workload imbalance; the prevalent use of offloading further incurs off-chip memory…

Hardware Architecture · Computer Science 2026-03-31 Songchen Ma , Hongyi Li , Weihao Zhang , Yonghao Tan , Pingcheng Dong , Yu Liu , Lan Liu , Yuzhong Jiao , Xuejiao Liu , Luhong Liang , Kwang-Ting Cheng

DisagFusion: Asynchronous Pipeline Parallelism and Elastic Scheduling for Disaggregated Diffusion Serving

Diffusion-based generation is increasingly powering production content pipelines; however, deploying these models at scale remains a significant challenge. Model weights frequently exceed the memory capacity of commodity GPUs, while the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-26 Hantian Zha , Teng Ma , Yang Yong , Haiwen Fu , Ruiyang Ma , Wei Gao , Ruihao Gong , Xianglong Liu , Wei Wang , Yunpeng Chai

SD-Acc: Accelerating Stable Diffusion through Phase-aware Sampling and Hardware Co-Optimizations

The emergence of diffusion models has significantly advanced generative AI, improving the quality, realism, and creativity of image and video generation. Among them, Stable Diffusion (StableDiff) stands out as a key model for text-to-image…

Hardware Architecture · Computer Science 2025-07-03 Zhican Wang , Guanghui He , Hongxiang Fan

Improving the performance of bagging ensembles for data streams through mini-batching

Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data…

Machine Learning · Computer Science 2021-12-21 Guilherme Cassales , Heitor Gomes , Albert Bifet , Bernhard Pfahringer , Hermes Senger

DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off

This paper introduces DrDiff, a novel framework for long-text generation that overcomes the efficiency-quality trade-off through three core technologies. First, we design a dynamic expert scheduling mechanism that intelligently allocates…

Computation and Language · Computer Science 2025-10-14 Jusheng Zhang , Yijia Fan , Kaitong Cai , Zimeng Huang , Xiaofei Sun , Jian Wang , Chengpei Tang , Keze Wang

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods speed up the models by caching and reusing model outputs at uniformly…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Feng Liu , Shiwei Zhang , Xiaofeng Wang , Yujie Wei , Haonan Qiu , Yuzhong Zhao , Yingya Zhang , Qixiang Ye , Fang Wan

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Akio Kodaira , Chenfeng Xu , Toshiki Hazama , Takanori Yoshimoto , Kohei Ohno , Shogo Mitsuhori , Soichi Sugano , Hanying Cho , Zhijian Liu , Masayoshi Tomizuka , Kurt Keutzer

AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation

Diffusion models achieve great success in generating diverse and high-fidelity images, yet their widespread application, especially in real-time scenarios, is hampered by their inherently slow generation speed. The slow generation stems…

Computer Vision and Pattern Recognition · Computer Science 2024-08-19 Shengkun Tang , Yaqing Wang , Caiwen Ding , Yi Liang , Yao Li , Dongkuan Xu

BADiff: Bandwidth Adaptive Diffusion Model

In this work, we propose a novel framework to enable diffusion models to adapt their generation quality based on real-time network bandwidth constraints. Traditional diffusion models produce high-fidelity images by performing a fixed number…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Xi Zhang , Hanwei Zhu , Yan Zhong , Jiamang Wang , Weisi Lin