English
Related papers

Related papers: SynerDiff: Synergetic Continuous Batching for Fast…

200 papers

Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency,…

Computer Vision and Pattern Recognition · Computer Science 2024-09-27 Zigeng Chen , Xinyin Ma , Gongfan Fang , Zhenxiong Tan , Xinchao Wang

Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present InvarDiff, a training-free acceleration method that…

Computer Vision and Pattern Recognition · Computer Science 2025-12-08 Zihao Wu

Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-23 Franyell Silfa , Jose Maria Arnau , Antonio Gonzalez

In this paper, we study joint batching and (task) scheduling to maximise the throughput (i.e., the number of completed tasks) under the practical assumptions of heterogeneous task arrivals and deadlines. The design aims to optimise the…

Signal Processing · Electrical Eng. & Systems 2023-07-28 Yihan Cang , Ming Chen , Kaibin Huang

Imputation of missing images via source-to-target modality translation can improve diversity in medical imaging protocols. A pervasive approach for synthesizing target images involves one-shot mapping through generative adversarial networks…

Image and Video Processing · Electrical Eng. & Systems 2023-04-03 Muzaffer Özbey , Onat Dalmaz , Salman UH Dar , Hasan A Bedel , Şaban Özturk , Alper Güngör , Tolga Çukur

As deep neural networks (DNNs) are being applied to a wide range of edge intelligent applications, it is critical for edge inference platforms to have both high-throughput and low-latency at the same time. Such edge platforms with multiple…

Machine Learning · Computer Science 2023-05-03 Ziyang Zhang , Huan Li , Yang Zhao , Changyao Lin , Jie Liu

Synthetic Electronic Health Record (EHR) time-series generation is crucial for advancing clinical machine learning models, as it helps address data scarcity by providing more training data. However, most existing approaches focus primarily…

Machine Learning · Computer Science 2025-04-25 Bowen Deng , Chang Xu , Hao Li , Yuhao Huang , Min Hou , Jiang Bian

For servers incorporating parallel computing resources, batching is a pivotal technique for providing efficient and economical services at scale. Parallel computing resources exhibit heightened computational and energy efficiency when…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-07 Yaodan Xu , Sheng Zhou , Zhisheng Niu

Autoregressive(AR)-diffusion hybrid paradigms combine AR's structured modeling with diffusion's photorealistic synthesis, yet suffer from high latency due to sequential AR generation and iterative denoising. In this work, we tackle this…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Zhen Zou , Xiaoxiao Ma , Jie Huang , Zichao Yu , Feng Zhao

The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly heightened computational demands, particularly for inference-serving workloads. While traditional cloud-based deployments offer scalability,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-17 Foteini Stathopoulou , Aggelos Ferikoglou , Manolis Katsaragakis , Dimosthenis Masouros , Sotirios Xydis , Dimitrios Soudris

Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has been significant progress in accelerating both their training and inference using high-performance GPUs, FPGAs, and custom ASICs for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-07 Guanwen Zhong , Akshat Dubey , Tan Cheng , Tulika Mitra

Mixture-of-Experts is a promising approach for edge AI with low-batch inference. Yet, on-device deployments often face limited on-chip memory and severe workload imbalance; the prevalent use of offloading further incurs off-chip memory…

Hardware Architecture · Computer Science 2026-03-31 Songchen Ma , Hongyi Li , Weihao Zhang , Yonghao Tan , Pingcheng Dong , Yu Liu , Lan Liu , Yuzhong Jiao , Xuejiao Liu , Luhong Liang , Kwang-Ting Cheng

Diffusion-based generation is increasingly powering production content pipelines; however, deploying these models at scale remains a significant challenge. Model weights frequently exceed the memory capacity of commodity GPUs, while the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-26 Hantian Zha , Teng Ma , Yang Yong , Haiwen Fu , Ruiyang Ma , Wei Gao , Ruihao Gong , Xianglong Liu , Wei Wang , Yunpeng Chai

The emergence of diffusion models has significantly advanced generative AI, improving the quality, realism, and creativity of image and video generation. Among them, Stable Diffusion (StableDiff) stands out as a key model for text-to-image…

Hardware Architecture · Computer Science 2025-07-03 Zhican Wang , Guanghui He , Hongxiang Fan

Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data…

Machine Learning · Computer Science 2021-12-21 Guilherme Cassales , Heitor Gomes , Albert Bifet , Bernhard Pfahringer , Hermes Senger

This paper introduces DrDiff, a novel framework for long-text generation that overcomes the efficiency-quality trade-off through three core technologies. First, we design a dynamic expert scheduling mechanism that intelligently allocates…

Computation and Language · Computer Science 2025-10-14 Jusheng Zhang , Yijia Fan , Kaitong Cai , Zimeng Huang , Xiaofei Sun , Jian Wang , Chengpei Tang , Keze Wang

As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods speed up the models by caching and reusing model outputs at uniformly…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Feng Liu , Shiwei Zhang , Xiaofeng Wang , Yujie Wei , Haonan Qiu , Yuzhong Zhao , Yingya Zhang , Qixiang Ye , Fang Wan

We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Akio Kodaira , Chenfeng Xu , Toshiki Hazama , Takanori Yoshimoto , Kohei Ohno , Shogo Mitsuhori , Soichi Sugano , Hanying Cho , Zhijian Liu , Masayoshi Tomizuka , Kurt Keutzer

Diffusion models achieve great success in generating diverse and high-fidelity images, yet their widespread application, especially in real-time scenarios, is hampered by their inherently slow generation speed. The slow generation stems…

Computer Vision and Pattern Recognition · Computer Science 2024-08-19 Shengkun Tang , Yaqing Wang , Caiwen Ding , Yi Liang , Yao Li , Dongkuan Xu

In this work, we propose a novel framework to enable diffusion models to adapt their generation quality based on real-time network bandwidth constraints. Traditional diffusion models produce high-fidelity images by performing a fixed number…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Xi Zhang , Hanwei Zhu , Yan Zhong , Jiamang Wang , Weisi Lin
‹ Prev 1 2 3 10 Next ›