English
Related papers

Related papers: Improving GPU Performance Through Resource Sharing

200 papers

GPGPU applications exploit on-chip scratchpad memory available in the Graphics Processing Units (GPUs) to improve performance. The amount of thread level parallelism present in the GPU is limited by the number of resident threads, which in…

Hardware Architecture · Computer Science 2017-02-14 Vishwesh Jatala , Jayvant Anantpur , Amey Karkare

GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage of on-chip resources, such as registers and the programmer-managed shared memory. Higher resource demand means lower effective parallel thread count,…

Performance · Computer Science 2019-07-08 Putt Sakdhnagool , Amit Sabne , Rudolf Eigenmann

The High Performance Computing (HPC) field is witnessing a widespread adoption of Graphics Processing Units (GPUs) as co-processors for conventional homogeneous clusters. The adoption of prevalent Single- Program Multiple-Data (SPMD)…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-25 Teng Li , Vikram K. Narayana , Tarek El-Ghazawi

The sizes of GPU applications are rapidly growing. They are exhausting the compute and memory resources of a single GPU, and are demanding the move to multiple GPUs. However, the performance of these applications scales sub-linearly with…

Hardware Architecture · Computer Science 2020-08-11 Saiful A. Mojumder , Yifan Sun , Leila Delshadtehrani , Yenai Ma , Trinayan Baruah , José L. Abellán , John Kim , David Kaeli , Ajay Joshi

The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun

Modern computing platforms tend to deploy multiple GPUs (2, 4, or more) on a single node to boost system performance, with each GPU having a large capacity of global memory and streaming multiprocessors (SMs). GPUs are an expensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-20 Chao Chen , Chris Porter , Santosh Pande

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly improved peak memory bandwidth, the memory becomes a bottleneck of GPU's performance and…

Hardware Architecture · Computer Science 2019-06-17 Bing Li , Mengjie Mao , Xiaoxiao Liu , Tao Liu , Zihao Liu , Wujie Wen , Yiran Chen , Hai , Li

High level programming languages and GPU accelerators are powerful enablers for a wide range of applications. Achieving scalable vertical (within a compute node), horizontal (across compute nodes), and temporal (over different generations…

GPUs in High-Performance Computing systems remain under-utilised due to the unavailability of schedulers that can safely schedule multiple applications to share the same GPU. The research reported in this paper is motivated to improve the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-12-14 Carlos Reano , Federico Silla , Dimitrios S. Nikolopoulos , Blesson Varghese

GPU-based heterogeneous architectures are now commonly used in HPC clusters. Due to their architectural simplicity specialized for data-level parallelism, GPUs can offer much higher computational throughput and memory bandwidth than CPUs in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Urvij Saroliya , Eishi Arima , Dai Liu , Martin Schulz

Advances in GPU compute throughput and memory capacity brings significant opportunities to a wide range of workloads. However, efficiently utilizing these resources remains challenging, particularly because diverse application…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-10 Gabin Schieffer , Ruimin Shi , Jie Ren , Ivy Peng

Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-15 Javier Prades , Blesson Varghese , Carlos Reano , Federico Silla

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

When multiple processor cores (CPUs) and a GPU integrated together on the same chip share the off-chip DRAM, requests from the GPU can heavily interfere with requests from the CPUs, leading to low system performance and starvation of cores.…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun , Gabriel H. Loh , Lavanya Subramanian , Kevin Chang , Onur Mutlu

Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-18 Samer Al-Kiswany , Abdullah Gharaibeh , Matei Ripeanu

Processing-in-Memory (PIM) enhances memory with computational capabilities, potentially solving energy and latency issues associated with data transfer between memory and processors. However, managing concurrent computation and data flow…

Hardware Architecture · Computer Science 2025-05-09 Ahmed Mamdouh , Haoran Geng , Michael Niemier , Xiaobo Sharon Hu , Dayane Reis

Graphics Processing Units (GPUs) are widely used by various applications in a broad variety of fields to accelerate their computation but remain susceptible to transient hardware faults (soft errors) that can easily compromise application…

Software Engineering · Computer Science 2021-03-30 Lishan Yang , Bin Nie , Adwait Jog , Evgenia Smirni

GPUs rely on large register files to unlock thread-level parallelism for high throughput. Unfortunately, large register files are power hungry, making it important to seek for new approaches to improve their utilization. This paper…

Hardware Architecture · Computer Science 2020-12-10 Alexandra Angerd , Erik Sintorn , Per Stenström

Large-scale GPU clusters are widely-used to speed up both latency-critical (online) and best-effort (offline) deep learning (DL) workloads. However, most DL clusters either dedicate each GPU to one workload or share workloads in time,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-27 Yihao Zhao , Xin Liu , Shufan Liu , Xiang Li , Yibo Zhu , Gang Huang , Xuanzhe Liu , Xin Jin

GPUs offer massive compute parallelism and high-bandwidth memory accesses. GPU database systems seek to exploit those capabilities to accelerate data analytics. Although modern GPUs have more resources (e.g., higher DRAM bandwidth) than…

Databases · Computer Science 2023-02-03 Jiashen Cao , Rathijit Sen , Matteo Interlandi , Joy Arulraj , Hyesoon Kim
‹ Prev 1 2 3 10 Next ›