Related papers: Inside VOLT: Designing an Open-Source GPU Compiler
The importance of open-source hardware and software has been increasing. However, despite GPUs being one of the more popular accelerators across various applications, there is very little open-source GPU infrastructure in the public domain.…
The current challenges in technology scaling are pushing the semiconductor industry towards hardware specialization, creating a proliferation of heterogeneous systems-on-chip, delivering orders of magnitude performance and power benefits…
The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance…
The rapid emergence of edge computing platforms and large-scale data centers has made power efficiency a primary design constraint, particularly for data-intensive and AI-driven workloads. Field-programmable gate arrays (FPGAs) are…
Despite the high computational throughput of GPUs, limited memory capacity and bandwidth-limited CPU-GPU communication via PCIe links remain significant bottlenecks for accelerating large-scale data analytics workloads. This paper…
With the growing number of data-intensive workloads, GPU, which is the state-of-the-art single-instruction-multiple-thread (SIMT) processor, is hindered by the memory bandwidth wall. To alleviate this bottleneck, previously proposed…
Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility. Changes in algorithms,…
Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision. Unlike traditional approaches, ViTs employ the self-attention mechanism, which has been widely used in natural language…
Modern day applications have grown in size and require more computational power. The rise of machine learning and AI increased the need for parallel computation, which has increased the need for GPGPUs. With the increasing demand for…
Vortex, a newly proposed open-source GPGPU platform based on the RISC-V ISA, offers a valid alternative for GPGPU research over the broadly-used modeling platforms based on commercial GPUs. Similarly to the push originating from the RISC-V…
Coarse-grained modeling and efficient computer simulations are critical to the study of complex molecular processes with many degrees of freedom and multiple spatiotemporal scales. Variational implicit-solvent model (VISM) for biomolecular…
Developing soft circuits from individual soft logic gates poses a unique challenge: with increasing numbers of logic gates, the design and implementation of circuits leads to inefficiencies due to mathematically unoptimized circuits and…
We present volkit, an open source library with high performance implementations of image manipulation and computer vision algorithms that focus on 3D volumetric representations. Volkit implements a cross-platform, performance-portable API…
Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting attention for their ability to handle variable input sizes in real-time applications. However, existing compilation optimization methods for such networks often rely…
GPU-accelerated Self-Organizing Map (SOM) implementations are among the most competitive options for large-scale SOM analysis, but growing dataset sizes increasingly challenge their practical use because workloads no longer fit cleanly…
This paper presents a novel, non-standard set of vector instruction types for exploring custom SIMD instructions in a softcore. The new types allow simultaneous access to a relatively high number of operands, reducing the instruction count…
This study evaluates AoS-to-SoA transformations over reduced-precision data layouts for a particle simulation code on several GPU platforms: We hypothesize that SoA fits particularly well to SIMT, while AoS is the preferred storage format…
Various kinds of applications take advantage of GPUs through automation tools that attempt to automatically exploit the available performance of the GPU's parallel architecture. Directive-based programming models, such as OpenACC, are one…
We present the first open-source, GPU-based code for complex plasmas. The code, OpenDust, aims to provide researchers both experimenters and theorists user-friendly and high-performance tool for self-consistent calculation forces, acting on…
The integration of converter-interfaced generation (CIG) from renewable energy sources poses challenges to the stability and transient behavior of electric power systems. Understanding the dynamic behavior of low-inertia power systems is…