English
Related papers

Related papers: FastUSP: A Multi-Level Collaborative Acceleration …

200 papers

Diffusion Transformers (DiTs) have gained increasing adoption in high-quality image and video generation. As demand for higher-resolution images and longer videos increases, single-GPU inference becomes inefficient due to increased latency…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-26 Jiacheng Yang , Jun Wu , Yaoyao Ding , Zhiying Xu , Yida Wang , Gennady Pekhimenko

Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Euisoo Jung , Byunghyun Kim , Hyunjin Kim , Seonghye Cho , Jae-Gil Lee

Diffusion Transformers (DiTs) are increasingly adopted in scientific computing, yet growing model sizes and resolutions make distributed multi-GPU inference essential. Ulysses sequence parallelism scales DiT inference but introduces…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-22 Bin Ma , Xingjian Ding , Tekin Bicer , Pengfei Su , Dong Li

Efficiently processing long sequences with Transformer models usually requires splitting the computations across accelerators via context parallelism. The dominant approaches in this family of methods, such as Ring Attention or DeepSpeed…

Machine Learning · Computer Science 2026-02-25 Ravi Ghadia , Maksim Abraham , Sergei Vorobyov , Max Ryabinin

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Muyang Li , Tianle Cai , Jiaxin Cao , Qinsheng Zhang , Han Cai , Junjie Bai , Yangqing Jia , Ming-Yu Liu , Kai Li , Song Han

Modern diffusion models, particularly those utilizing a Transformer-based UNet for denoising, rely heavily on self-attention operations to manage complex spatial relationships, thus achieving impressive generation performance. However, this…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Songhua Liu , Weihao Yu , Zhenxiong Tan , Xinchao Wang

Diffusion models have exhibited exciting capabilities in generating images and are also very promising for video creation. However, the inference speed of diffusion models is limited by the slow sampling process, restricting its use cases.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 XiuYu Zhang , Zening Luo , Michelle E. Lu

The scalability of high-fidelity video diffusion models (VDMs) is constrained by two key sources of redundancy: the quadratic complexity of global spatio-temporal attention and the computational overhead of long iterative denoising…

Computer Vision and Pattern Recognition · Computer Science 2025-12-08 Xinjian Wu , Hongmei Wang , Yuan Zhou , Qinglin Lu

Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique…

Real-time Deep Neural Network (DNN) inference with low-latency requirement has become increasingly important for numerous applications in both cloud computing (e.g., Apple's Siri) and edge computing (e.g., Google/Waymo's driverless car).…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-11 Weiwen Jiang , Edwin H. -M. Sha , Xinyi Zhang , Lei Yang , Qingfeng Zhuge , Yiyu Shi , Jingtong Hu

This paper presents PipeFusion, an innovative parallel methodology to tackle the high latency issues associated with generating high-resolution images using diffusion transformers (DiTs) models. PipeFusion partitions images into patches and…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Jiarui Fang , Jinzhe Pan , Aoyu Li , Xibo Sun , Jiannan Wang

Sequence parallelism (SP), which divides the sequence dimension of input tensors across multiple computational devices, is becoming key to unlocking the long-context capabilities of generative AI models. This paper investigates the…

Machine Learning · Computer Science 2024-07-03 Jiarui Fang , Shangchun Zhao

Sequence parallelism (SP) serves as a prevalent strategy to handle long sequences that exceed the memory limit of a single device. However, for linear sequence modeling methods like linear attention, existing SP approaches do not take…

Machine Learning · Computer Science 2025-05-19 Weigao Sun , Zhen Qin , Dong Li , Xuyang Shen , Yu Qiao , Yiran Zhong

Scaling Diffusion Transformer (DiT) inference via sequence parallelism is critical for reducing latency in visual generation, but is severely hampered by workload imbalance when applied to models employing block-wise sparse attention. The…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Siqi Chen , Ke Hong , Tianchen Zhao , Ruiqi Xie , Zhenhua Zhu , Xudong Zhang , Yu Wang

Sparse attention is a core building block in many leading neural network models, from graph-structured learning to sparse sequence modeling. It can be decomposed into a sequence of three sparse matrix operations (3S): sampled dense-dense…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-14 Zitong Li , Aparna Chandramowlishwaran

Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic…

Artificial Intelligence · Computer Science 2018-01-12 Ferdinando Fioretto , Enrico Pontelli , William Yeoh , Rina Dechter

Efficient parallelization of Large Language Models (LLMs) with long sequences is essential but challenging due to their significant computational and memory demands, particularly stemming from communication bottlenecks in attention…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-31 Zongwu Wang , Fangxin Liu , Mingshuai Li , Li Jiang

Diffusion models have emerged as a powerful class of generative models across various modalities, including image, video, and audio synthesis. However, their deployment is often limited by significant inference latency, primarily due to the…

Machine Learning · Computer Science 2025-10-14 Kunyun Wang , Bohan Li , Kai Yu , Minyi Guo , Jieru Zhao

While long-context inference is crucial for advancing large language model (LLM) applications, its prefill speed remains a significant bottleneck. Current approaches, including sequence parallelism strategies and compute reduction through…

Machine Learning · Computer Science 2025-05-27 Yuxiang Huang , Mingye Li , Xu Han , Chaojun Xiao , Weilin Zhao , Sun Ao , Hao Zhou , Jie Zhou , Zhiyuan Liu , Maosong Sun

Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length. Until now, system works for accelerating LLM training have focused on the…

Machine Learning · Computer Science 2023-10-05 Sam Ade Jacobs , Masahiro Tanaka , Chengming Zhang , Minjia Zhang , Shuaiwen Leon Song , Samyam Rajbhandari , Yuxiong He
‹ Prev 1 2 3 10 Next ›