English
Related papers

Related papers: Inside VOLT: Designing an Open-Source GPU Compiler

200 papers

The importance of open-source hardware and software has been increasing. However, despite GPUs being one of the more popular accelerators across various applications, there is very little open-source GPU infrastructure in the public domain.…

Hardware Architecture · Computer Science 2021-10-22 Blaise Tine , Fares Elsabbagh , Krishna Yalamarthy , Hyesoon Kim

The current challenges in technology scaling are pushing the semiconductor industry towards hardware specialization, creating a proliferation of heterogeneous systems-on-chip, delivering orders of magnitude performance and power benefits…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-28 Fares Elsabbagh , Blaise Tine , Priyadarshini Roshan , Ethan Lyons , Euna Kim , Da Eun Shim , Lingjun Zhu , Sung Kyu Lim , Hyesoon kim

The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-28 Guei-Yuan Lueh , Kaiyu Chen , Gang Chen , Joel Fuentes , Wei-Yu Chen , Fangwen Fu , Hong Jiang , Hongzheng Li , Daniel Rhee

The rapid emergence of edge computing platforms and large-scale data centers has made power efficiency a primary design constraint, particularly for data-intensive and AI-driven workloads. Field-programmable gate arrays (FPGAs) are…

Hardware Architecture · Computer Science 2026-03-30 Akram Ben Ahmed , Takahiro Hirofuchi , Takaaki Fukai

Despite the high computational throughput of GPUs, limited memory capacity and bandwidth-limited CPU-GPU communication via PCIe links remain significant bottlenecks for accelerating large-scale data analytics workloads. This paper…

Databases · Computer Science 2025-02-14 Yichao Yuan , Advait Iyer , Lin Ma , Nishil Talati

With the growing number of data-intensive workloads, GPU, which is the state-of-the-art single-instruction-multiple-thread (SIMT) processor, is hindered by the memory bandwidth wall. To alleviate this bottleneck, previously proposed…

Hardware Architecture · Computer Science 2021-03-12 Xinfeng Xie , Peng Gu , Yufei Ding , Dimin Niu , Hongzhong Zheng , Yuan Xie

Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility. Changes in algorithms,…

Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision. Unlike traditional approaches, ViTs employ the self-attention mechanism, which has been widely used in natural language…

Computer Vision and Pattern Recognition · Computer Science 2025-06-12 Mohammad Erfan Sadeghi , Arash Fayyazi , Suhas Somashekar , Armin Abdollahi , Massoud Pedram

Modern day applications have grown in size and require more computational power. The rise of machine learning and AI increased the need for parallel computation, which has increased the need for GPGPUs. With the increasing demand for…

Hardware Architecture · Computer Science 2025-03-25 Injae Shin , Blaise Tine

Vortex, a newly proposed open-source GPGPU platform based on the RISC-V ISA, offers a valid alternative for GPGPU research over the broadly-used modeling platforms based on commercial GPUs. Similarly to the push originating from the RISC-V…

Hardware Architecture · Computer Science 2025-12-02 Giuseppe M. Sarda , Nimish Shah , Abubakr Nada , Debjyoti Bhattacharjee , Marian Verhelst

Coarse-grained modeling and efficient computer simulations are critical to the study of complex molecular processes with many degrees of freedom and multiple spatiotemporal scales. Variational implicit-solvent model (VISM) for biomolecular…

Chemical Physics · Physics 2022-10-26 Shuang Liu , Zirui Zhang , Li-Tien Cheng , Bo Li

Developing soft circuits from individual soft logic gates poses a unique challenge: with increasing numbers of logic gates, the design and implementation of circuits leads to inefficiencies due to mathematically unoptimized circuits and…

We present volkit, an open source library with high performance implementations of image manipulation and computer vision algorithms that focus on 3D volumetric representations. Volkit implements a cross-platform, performance-portable API…

Computer Vision and Pattern Recognition · Computer Science 2022-03-22 Stefan Zellmann , Giovanni Aguirre , Jürgen P. Schulze

Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting attention for their ability to handle variable input sizes in real-time applications. However, existing compilation optimization methods for such networks often rely…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-04 Yangjie Zhou , Honglin Zhu , Qian Qiu , Weihao Cui , Zihan Liu , Cong Guo , Siyuan Feng , Jintao Meng , Haidong Lan , Jingwen Leng , Wenxi Zhu , Minwen Deng

GPU-accelerated Self-Organizing Map (SOM) implementations are among the most competitive options for large-scale SOM analysis, but growing dataset sizes increasingly challenge their practical use because workloads no longer fit cleanly…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-30 Tony Xu , Sarah Klamt , Katherine Turner , Anne Brustle , Felix Marsh-Wakefield , Givanna Putri

This paper presents a novel, non-standard set of vector instruction types for exploring custom SIMD instructions in a softcore. The new types allow simultaneous access to a relatively high number of operands, reducing the instruction count…

Hardware Architecture · Computer Science 2021-06-15 Philippos Papaphilippou , Paul H. J. Kelly , Wayne Luk

This study evaluates AoS-to-SoA transformations over reduced-precision data layouts for a particle simulation code on several GPU platforms: We hypothesize that SoA fits particularly well to SIMT, while AoS is the preferred storage format…

Programming Languages · Computer Science 2025-12-08 Pawel K. Radtke , Tobias Weinzierl

Various kinds of applications take advantage of GPUs through automation tools that attempt to automatically exploit the available performance of the GPU's parallel architecture. Directive-based programming models, such as OpenACC, are one…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-30 Kazuaki Matsumura , Simon Garcia De Gonzalo , Antonio J. Peña

We present the first open-source, GPU-based code for complex plasmas. The code, OpenDust, aims to provide researchers both experimenters and theorists user-friendly and high-performance tool for self-consistent calculation forces, acting on…

Plasma Physics · Physics 2023-05-03 D. Kolotinskii , A. Timofeev

The integration of converter-interfaced generation (CIG) from renewable energy sources poses challenges to the stability and transient behavior of electric power systems. Understanding the dynamic behavior of low-inertia power systems is…

Systems and Control · Electrical Eng. & Systems 2020-03-09 Rodrigo Henriquez-Auba , Jose D. Lara , Ciaran Roberts , Nathan Pallo , Duncan S. Callaway
‹ Prev 1 2 3 10 Next ›