Related papers: A GPU Register File using Static Data Compression

Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation

Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high…

Hardware Architecture · Computer Science 2020-10-20 Mohammad Sadrosadati , Amirhossein Mirhosseini , Ali Hajiabadi , Seyed Borna Ehsani , Hajar Falahati , Hamid Sarbazi-Azad , Mario Drumond , Babak Falsafi , Rachata Ausavarungnirun , Onur Mutlu

RegDem: Increasing GPU Performance via Shared Memory Register Spilling

GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage of on-chip resources, such as registers and the programmer-managed shared memory. Higher resource demand means lower effective parallel thread count,…

Performance · Computer Science 2019-07-08 Putt Sakdhnagool , Amit Sabne , Rudolf Eigenmann

Design Space Exploration to Find the Optimum Cache and Register File Size for Embedded Applications

In the future, embedded processors must process more computation-intensive network applications and internet traffic and packet-processing tasks become heavier and sophisticated. Since the processor performance is severely related to the…

Hardware Architecture · Computer Science 2012-05-10 Mehdi Alipour , Mostafa E. Salehi , Hesamodin shojaei baghini

RRCD: Redirecci\'on de Registros Basada en Compresi\'on de Datos para Tolerar FallosPermanentes en una GPU

The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations. Reducing the supply voltage beyond its safe…

Hardware Architecture · Computer Science 2021-05-11 Yamilka Toca-Díaz , Alejandro Valero , Rubén Gran-Tejero , Darío Suárez-Gracia

CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs

Data compression and decompression have become vital components of big-data applications to manage the exponential growth in the amount of data collected and stored. Furthermore, big-data applications have increasingly adopted GPUs due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-11 Jeongmin Park , Zaid Qureshi , Vikram Mailthody , Andrew Gacek , Shunfan Shao , Mohammad AlMasri , Isaac Gelado , Jinjun Xiong , Chris Newburn , I-hsin Chung , Michael Garland , Nikolay Sakharnykh , Wen-mei Hwu

Improving GPU Performance Through Resource Sharing

Graphics Processing Units (GPUs) consisting of Streaming Multiprocessors (SMs) achieve high throughput by running a large number of threads and context switching among them to hide execution latencies. The number of thread blocks, and hence…

Hardware Architecture · Computer Science 2015-06-08 Vishwesh Jatala , Jayvant Anantpur , Amey Karkare

A Lightweight, Compiler-Assisted Register File Cache for GPGPU

Modern GPUs require an enormous register file (RF) to store the context of thousands of active threads. It consumes considerable energy and contains multiple large banks to provide enough throughput. Thus, a RF caching mechanism can…

Hardware Architecture · Computer Science 2023-10-27 Mojtaba Abaie Shoushtary , Jose Maria Arnau , Jordi Tubella Murgadas , Antonio Gonzalez

Fully-Automated Code Generation for Efficient Computation of Sparse Matrix Permanents on GPUs

Registers are the fastest memory components within the GPU's complex memory hierarchy, accessed by names rather than addresses. They are managed entirely by the compiler through a process called register allocation, during which the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-28 Deniz Elbek , Kamer Kaya

GraphVine: A Data Structure to Optimize Dynamic Graph Processing on GPUs

Graph processing on GPUs is gaining momentum due to the high throughputs observed compared to traditional CPUs, attributed to the vast number of processing cores on GPUs that can exploit parallelism in graph analytics. This paper discusses…

Data Structures and Algorithms · Computer Science 2023-07-27 Rohith Krishnan S , Venkata Kalyan Tavva , Rupesh Nasre

A Graph-based Model for GPU Caching Problems

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

GREENER: A Tool for Improving Energy Efficiency of Register Files

Graphics Processing Units (GPUs) maintain a large register file to increase the thread level parallelism (TLP). To increase the TLP further, recent GPUs have increased the number of on-chip registers in every generation. However, with the…

Hardware Architecture · Computer Science 2018-03-30 Vishwesh Jatala , Jayvant Anantpur , Amey Karkare

A High-Throughput GPU Framework for Adaptive Lossless Compression of Floating-Point Data

The torrential influx of floating-point data from domains like IoT and HPC necessitates high-performance lossless compression to mitigate storage costs while preserving absolute data fidelity. Leveraging GPU parallelism for this task…

Databases · Computer Science 2025-11-12 Zheng Li , Weiyan Wang , Ruiyuan Li , Chao Chen , Xianlei Long , Linjiang Zheng , Quanqing Xu , Chuanhui Yang

Performance-Optimum Superscalar Architecture for Embedded Applications

Embedded applications are widely used in portable devices such as wireless phones, personal digital assistants, laptops, etc. High throughput and real time requirements are especially important in such data-intensive tasks. Therefore,…

Hardware Architecture · Computer Science 2012-04-13 Mehdi Alipour , Mostafa E. Salehi

Efficiently Processing Joins and Grouped Aggregations on GPUs

There is a growing interest in leveraging GPUs for tasks beyond ML, especially in database systems. Despite the existing extensive work on GPU-based database operators, several questions are still open. For instance, the performance of…

Databases · Computer Science 2025-02-13 Bowen Wu , Dimitrios Koutsoukos , Gustavo Alonso

Trie Compression for GPU Accelerated Multi-Pattern Matching

Graphics Processing Units allow for running massively parallel applications offloading the CPU from computationally intensive resources, however GPUs have a limited amount of memory. In this paper a trie compression algorithm for massively…

Data Structures and Algorithms · Computer Science 2017-02-20 Xavier Bellekens , Amar Seeam , Christos Tachtatzis , Robert Atkinson

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs

GPUs offer orders-of-magnitude higher memory bandwidth than traditional CPU-only systems. However, GPU device memory tends to be relatively small and the memory capacity can not be increased by the user. This paper describes Buddy…

Hardware Architecture · Computer Science 2019-04-17 Esha Choukse , Michael Sullivan , Mike O'Connor , Mattan Erez , Jeff Pool , David Nellans , Steve Keckler

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

Dynamic parallelism on GPUs allows GPU threads to dynamically launch other GPU threads. It is useful in applications with nested parallelism, particularly where the amount of nested parallelism is irregular and cannot be predicted…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-11 Mhd Ghaith Olabi , Juan Gómez Luna , Onur Mutlu , Wen-mei Hwu , Izzat El Hajj

Analyzing Modern NVIDIA GPU cores

GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pipeline designs based on architectures…

Hardware Architecture · Computer Science 2025-10-30 Rodrigo Huerta , Mojtaba Abaie Shoushtary , José-Lorenzo Cruz , Antonio González

GPUs as Storage System Accelerators

Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-18 Samer Al-Kiswany , Abdullah Gharaibeh , Matei Ripeanu

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative…

Data Structures and Algorithms · Computer Science 2018-06-28 Mo Sha , Yuchen Li , Bingsheng He , Kian-Lee Tan