English
Related papers

Related papers: A GPU Register File using Static Data Compression

200 papers

Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high…

GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage of on-chip resources, such as registers and the programmer-managed shared memory. Higher resource demand means lower effective parallel thread count,…

Performance · Computer Science 2019-07-08 Putt Sakdhnagool , Amit Sabne , Rudolf Eigenmann

In the future, embedded processors must process more computation-intensive network applications and internet traffic and packet-processing tasks become heavier and sophisticated. Since the processor performance is severely related to the…

Hardware Architecture · Computer Science 2012-05-10 Mehdi Alipour , Mostafa E. Salehi , Hesamodin shojaei baghini

The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations. Reducing the supply voltage beyond its safe…

Hardware Architecture · Computer Science 2021-05-11 Yamilka Toca-Díaz , Alejandro Valero , Rubén Gran-Tejero , Darío Suárez-Gracia

Data compression and decompression have become vital components of big-data applications to manage the exponential growth in the amount of data collected and stored. Furthermore, big-data applications have increasingly adopted GPUs due to…

Graphics Processing Units (GPUs) consisting of Streaming Multiprocessors (SMs) achieve high throughput by running a large number of threads and context switching among them to hide execution latencies. The number of thread blocks, and hence…

Hardware Architecture · Computer Science 2015-06-08 Vishwesh Jatala , Jayvant Anantpur , Amey Karkare

Modern GPUs require an enormous register file (RF) to store the context of thousands of active threads. It consumes considerable energy and contains multiple large banks to provide enough throughput. Thus, a RF caching mechanism can…

Hardware Architecture · Computer Science 2023-10-27 Mojtaba Abaie Shoushtary , Jose Maria Arnau , Jordi Tubella Murgadas , Antonio Gonzalez

Registers are the fastest memory components within the GPU's complex memory hierarchy, accessed by names rather than addresses. They are managed entirely by the compiler through a process called register allocation, during which the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-28 Deniz Elbek , Kamer Kaya

Graph processing on GPUs is gaining momentum due to the high throughputs observed compared to traditional CPUs, attributed to the vast number of processing cores on GPUs that can exploit parallelism in graph analytics. This paper discusses…

Data Structures and Algorithms · Computer Science 2023-07-27 Rohith Krishnan S , Venkata Kalyan Tavva , Rupesh Nasre

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

Graphics Processing Units (GPUs) maintain a large register file to increase the thread level parallelism (TLP). To increase the TLP further, recent GPUs have increased the number of on-chip registers in every generation. However, with the…

Hardware Architecture · Computer Science 2018-03-30 Vishwesh Jatala , Jayvant Anantpur , Amey Karkare

The torrential influx of floating-point data from domains like IoT and HPC necessitates high-performance lossless compression to mitigate storage costs while preserving absolute data fidelity. Leveraging GPU parallelism for this task…

Databases · Computer Science 2025-11-12 Zheng Li , Weiyan Wang , Ruiyuan Li , Chao Chen , Xianlei Long , Linjiang Zheng , Quanqing Xu , Chuanhui Yang

Embedded applications are widely used in portable devices such as wireless phones, personal digital assistants, laptops, etc. High throughput and real time requirements are especially important in such data-intensive tasks. Therefore,…

Hardware Architecture · Computer Science 2012-04-13 Mehdi Alipour , Mostafa E. Salehi

There is a growing interest in leveraging GPUs for tasks beyond ML, especially in database systems. Despite the existing extensive work on GPU-based database operators, several questions are still open. For instance, the performance of…

Databases · Computer Science 2025-02-13 Bowen Wu , Dimitrios Koutsoukos , Gustavo Alonso

Graphics Processing Units allow for running massively parallel applications offloading the CPU from computationally intensive resources, however GPUs have a limited amount of memory. In this paper a trie compression algorithm for massively…

Data Structures and Algorithms · Computer Science 2017-02-20 Xavier Bellekens , Amar Seeam , Christos Tachtatzis , Robert Atkinson

GPUs offer orders-of-magnitude higher memory bandwidth than traditional CPU-only systems. However, GPU device memory tends to be relatively small and the memory capacity can not be increased by the user. This paper describes Buddy…

Hardware Architecture · Computer Science 2019-04-17 Esha Choukse , Michael Sullivan , Mike O'Connor , Mattan Erez , Jeff Pool , David Nellans , Steve Keckler

Dynamic parallelism on GPUs allows GPU threads to dynamically launch other GPU threads. It is useful in applications with nested parallelism, particularly where the amount of nested parallelism is irregular and cannot be predicted…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-11 Mhd Ghaith Olabi , Juan Gómez Luna , Onur Mutlu , Wen-mei Hwu , Izzat El Hajj

GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pipeline designs based on architectures…

Hardware Architecture · Computer Science 2025-10-30 Rodrigo Huerta , Mojtaba Abaie Shoushtary , José-Lorenzo Cruz , Antonio González

Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-18 Samer Al-Kiswany , Abdullah Gharaibeh , Matei Ripeanu

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative…

Data Structures and Algorithms · Computer Science 2018-06-28 Mo Sha , Yuchen Li , Bingsheng He , Kian-Lee Tan
‹ Prev 1 2 3 10 Next ›