English
Related papers

Related papers: Model-Based Warp Overlapped Tiling for Image Proce…

200 papers

The strategy of using CUDA-compatible GPUs as a parallel computation solution to improve the performance of programs has been more and more widely approved during the last two years since the CUDA platform was released. Its benefit extends…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-01-12 Chang Xu , Steven R. Kirk , Samantha Jenkins

We propose a new hybrid topology optimization algorithm based on multigrid approach that combines the parallelization strategy of CPU using OpenMP and heavily multithreading capabilities of modern Graphics Processing Units (GPU). In…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-01 Arya Prakash Padhi , Souvik Chakraborty , Anupam Chakrabarti , Rajib Chowdhury

Consecutive matrix multiplications are commonly used in graph neural networks and sparse linear solvers. These operations frequently access the same matrices for both reading and writing. While reusing these matrices improves data locality,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-02 Mohammad Mahdi Salehi Dezfuli , Kazem Cheshmi

With the rapid innovation of GPUs, heterogeneous GPU clusters in both public clouds and on-premise data centers have become increasingly commonplace. In this paper, we demonstrate how pipeline parallelism, a technique wellstudied for…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-28 Z. Jonny Kong , Qiang Xu , Y. Charlie Hu

We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or…

Programming Languages · Computer Science 2023-08-29 Luke Anderson , Andrew Adams , Karima Ma , Tzu-Mao Li , Tian Jin , Jonathan Ragan-Kelley

This paper presents PipeFusion, an innovative parallel methodology to tackle the high latency issues associated with generating high-resolution images using diffusion transformers (DiTs) models. PipeFusion partitions images into patches and…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Jiarui Fang , Jinzhe Pan , Aoyu Li , Xibo Sun , Jiannan Wang

Polyhedral compilers perform optimizations such as tiling and parallelization; when doing both, they usually generate code that executes "barrier-synchronized wavefronts" of tiles. We present a system to express and generate code for hybrid…

Programming Languages · Computer Science 2016-10-25 Tian Jin , Nirmal Prajapati , Waruna Ranasinghe , Guillaume Iooss , Yun Zou , Sanjay Rajopadhye , David Wonnacott

Stochastic simulations need multiple replications in order to build confidence intervals for their results. Even if we do not need a large amount of replications, it is a good practice to speed-up the whole simulation time using the…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-01-08 Jonathan Passerat-Palmbach , Jonathan Caux , Pridi Siregar , Claude Mazel , David Hill

The rapid growth in machine learning models, especially in natural language processing and computer vision, has led to challenges when running these models on hardware with limited resources. This paper introduces Superpipeline, a new…

Machine Learning · Computer Science 2024-10-14 Reza Abbasi , Sernam Lim

Asynchronous tasks, when created with over-decomposition, enable automatic computation-communication overlap which can substantially improve performance and scalability. This is not only applicable to traditional CPU-based systems, but also…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-23 Jaemin Choi , David F. Richards , Laxmikant V. Kale

The main objective of this work consists in analyzing sub-structuring method for the parallel solution of sparse linear systems with matrices arising from the discretization of partial differential equations such as finite element, finite…

Numerical Analysis · Mathematics 2021-08-31 Abal-Kassim Cheik Ahamed , Frédéric Magoulès

This paper presents a Graphics Processing Units (GPUs) acceleration method of an iterative scheme for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work features a fast converging…

Computational Physics · Physics 2020-01-08 Lianhua Zhu , Peng Wang , Songze Chen , Zhaoli Guo , Yonghao Zhang

Pipeline Parallelism (PP) serves as a crucial technique for training Large Language Models (LLMs), owing to its capability to alleviate memory pressure from model states with relatively low communication overhead. However, in long-context…

Machine Learning · Computer Science 2025-04-22 Zhouyang Li , Yuliang Liu , Wei Zhang , Tailing Yuan , Bin Chen , Chengru Song , Di Zhang

Sparse tiling is a technique to fuse loops that access common data, thus increasing data locality. Unlike traditional loop fusion or blocking, the loops may have different iteration spaces and access shared datasets through indirect memory…

Computational Engineering, Finance, and Science · Computer Science 2019-06-20 Fabio Luporini , Michael Lange , Christian T. Jacobs , Gerard J. Gorman , J. Ramanujam , Paul H. J. Kelly

This paper introduces a framework for distributed parallel image signal extrapolation. Since high-quality image signal processing often comes along with a high computational complexity, a parallel execution is desirable. The proposed…

Image and Video Processing · Electrical Eng. & Systems 2022-07-04 Jürgen Seiler , André Kaup

Contemporary GPUs are designed to handle long-latency operations effectively; however, challenges such as core occupancy (number of warps in a core) and pipeline width can impede their latency management. This is particularly evident in…

Hardware Architecture · Computer Science 2024-04-10 Diya Joseph , Juan Luis Aragón , Joan-Manuel Parcerisa , Antonio Gonzalez

The simplex algorithm has been successfully used for many years in solving linear programming (LP) problems. Due to the intensive computations required (especially for the solution of large LP problems), parallel approaches have also…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-22 Basilis Mamalis , Marios Perlitis

Furthering our understanding of many of today's interesting problems in plasma physics---including plasma based acceleration and magnetic reconnection with pair production due to quantum electrodynamic effects---requires large-scale kinetic…

Computational Physics · Physics 2020-10-28 Kyle G. Miller , Roman P. Lee , Adam Tableman , Anton Helm , Ricardo A. Fonseca , Viktor K. Decyk , Warren B. Mori

As inference workloads for large language models (LLMs) scale to meet growing user demand, pipeline parallelism (PP) has become a widely adopted strategy for multi-GPU deployment, particularly in cross-node setups, to improve key-value (KV)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-30 Yongchao He , Bohan Zhao , Zheng Cao

This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle…

Performance · Computer Science 2025-01-07 Esmail Abdul Fattah , Hatem Ltaief , Havard Rue , David Keyes
‹ Prev 1 2 3 10 Next ›