English
Related papers

Related papers: Kitsune: Enabling Dataflow Execution on GPUs

200 papers

GPUs are now used for a wide range of problems within HPC. However, making efficient use of the computational power available with multiple GPUs is challenging. The main challenges in achieving good performance are memory layout, affecting…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-20 Robert Clucas , Philip Blakely , Nikolaos Nikiforakis

GPUs have been widely used to accelerate computations exhibiting simple patterns of parallelism - such as flat or two-level parallelism - and a degree of parallelism that can be statically determined based on the size of the input dataset.…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-18 Hancheng Wu , Da Li , Michela Becchi

GPUs are uniquely suited to accelerate (SQL) analytics workloads thanks to their massive compute parallelism and High Bandwidth Memory (HBM) -- when datasets fit in the GPU HBM, performance is unparalleled. Unfortunately, GPU HBMs remain…

The progress made in accelerating simulations of fluid flow using GPUs, and the challenges that remain, are surveyed. The review first provides an introduction to GPU computing and programming, and discusses various considerations for…

Fluid Dynamics · Physics 2014-02-04 Kyle E Niemeyer , Chih-Jen Sung

Heterogeneous computing platforms consisting of general purpose processors (GPPs) and graphics processing units (GPUs) have become commonplace in personal mobile devices and embedded systems. For years, programming of these platforms was…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-11 Jani Boutellier , Ilkka Hautala

GPUs are critical for compute-intensive applications, yet emerging workloads such as recommender systems, graph analytics, and data analytics often exceed GPU memory capacity. Existing solutions allow GPUs to use CPU DRAM or SSDs as…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-27 Zhuoping Yang , Jinming Zhuang , Xingzhen Chen , Alex K. Jones , Peipei Zhou

Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Tsung-Wei Huang , Dian-Lun Lin , Chun-Xun Lin , Yibo Lin

Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference and training. Ideally, DL frameworks should be able to fully utilize the computation power of GPUs such that the running time depends on the amount of…

Machine Learning · Computer Science 2020-12-07 Woosuk Kwon , Gyeong-In Yu , Eunji Jeong , Byung-Gon Chun

In this work, we survey the role of GPUs in real-time systems. Originally designed for parallel graphics workloads, GPUs are now widely used in time-critical applications such as machine learning, autonomous vehicles, and robotics due to…

The recent trend of using Graphics Processing Units (GPU's) for high performance computations is driven by the high ratio of price performance for these units, complemented by their cost effectiveness. At first glance, computational fluid…

Computational Engineering, Finance, and Science · Computer Science 2018-02-13 Kiril S. Shterev

GPUs offer massive compute parallelism and high-bandwidth memory accesses. GPU database systems seek to exploit those capabilities to accelerate data analytics. Although modern GPUs have more resources (e.g., higher DRAM bandwidth) than…

Databases · Computer Science 2023-02-03 Jiashen Cao , Rathijit Sen , Matteo Interlandi , Joy Arulraj , Hyesoon Kim

Deep learning models can take weeks to train on a single GPU-equipped machine, necessitating scaling out DL training to a GPU-cluster. However, current distributed DL implementations can scale poorly due to substantial parameter…

Machine Learning · Computer Science 2017-06-13 Hao Zhang , Zeyu Zheng , Shizhen Xu , Wei Dai , Qirong Ho , Xiaodan Liang , Zhiting Hu , Jinliang Wei , Pengtao Xie , Eric P. Xing

Processing large graphs with memory-limited GPU needs to resolve issues of host-GPU data transfer, which is a key performance bottleneck. Existing GPU-accelerated graph processing frameworks reduce the data transfers by managing the active…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-01 Qiange Wang , Xin Ai , Yanfeng Zhang , Jing Chen , Ge Yu

While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-01 Quentin Anthony , Jacob Hatef , Deepak Narayanan , Stella Biderman , Stas Bekman , Junqi Yin , Aamir Shafi , Hari Subramoni , Dhabaleswar Panda

Performance optimization is the art of continuous seeking a harmonious mapping between the application domain and hardware. Recent years have witnessed a surge of deep learning (DL) applications in industry. Conventional wisdom for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-27 Guoping Long , Jun Yang , Wei Lin

One major technical challenge for modern analytical database systems is how to leverage GPU to exploit their massive parallelism and high bandwidth. Yet, existing GPU-driven database engines suffer from inefficiencies caused by frequent…

Databases · Computer Science 2026-05-12 Tsuyoshi Ozawa , Kazuo Goda

Tensor parallelism (TP) enables large language models (LLMs) to scale inference efficiently across multiple GPUs, but its tight coupling makes systems fragile: a single GPU failure can halt execution, trigger costly KVCache recomputation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-19 Ziyi Xu , Zhiqiang Xie , Swapnil Gandhi , Christos Kozyrakis

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-01 Christian Feichtinger , Johannes Habich , Harald Koestler , Georg Hager , Ulrich Ruede , Gerhard Wellein

Leveraging Graphics Processing Units (GPUs) to accelerate scientific software has proven to be highly successful, but in order to extract more performance, GPU programmers must overcome the high latency costs associated with their use. One…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-03 Jacob Faibussowitsch , Mark F. Adams , Richard Tran Mills , Stefano Zampini , Junchao Zhang

Tremendous advances in parallel computing and graphics hardware opened up several novel real-time GPU applications in the fields of computer vision, computer graphics as well as augmented reality (AR) and virtual reality (VR). Although…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-19 Patrick Stotko
‹ Prev 1 2 3 10 Next ›