Related papers: MGSim + MGMark: A Framework for Multi-GPU System R…
MGSim is an open source discrete event simulator for on-chip hardware components, developed at the University of Amsterdam. It is intended to be a research and teaching vehicle to study the fine-grained hardware/software interactions on…
The design space exploration of scaled-out manycores for communication-intensive applications (e.g., graph analytics and sparse linear algebra) is hampered due to either lack of scalability or accuracy of existing frameworks at simulating…
The sizes of GPU applications are rapidly growing. They are exhausting the compute and memory resources of a single GPU, and are demanding the move to multiple GPUs. However, the performance of these applications scales sub-linearly with…
GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU…
As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is becoming more important. However, existing…
Architectural simulation has become the critical bottleneck limiting design space exploration for high-performance computing systems. Modern GPUs and AI accelerators -- with hundreds to thousands of tightly-coupled components -- demand…
This paper explores the impact of simulator accuracy on architecture design decisions in the general-purpose graphics processing unit (GPGPU) space. We perform a detailed, quantitative analysis of the most popular publicly available GPU…
High-performance, multi-core processors are the key to accelerating workloads in several application domains. To continue to scale performance at the limit of Moore's Law and Dennard scaling, software and hardware designers have turned to…
Molecular dynamics facilitates the simulation of a complex system to be analyzed at molecular and atomic levels. Simulations can last a long period of time, even months. Due to this cause the graphics processing units (GPUs) and multi-core…
Large-scale distributed computing infrastructures such as the Worldwide LHC Computing Grid (WLCG) require comprehensive simulation tools for evaluating performance, testing new algorithms, and optimizing resource allocation strategies.…
Cloud Computing has established itself as an efficient and cost-effective paradigm for the execution of web-based applications, and scientific workloads, that need elasticity and on-demand scalability capabilities. However, the evaluation…
A modern graphics processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two dimensional Ising model [T. Preis et al., J. Comp.…
The simulation of the two-dimensional Ising model is used as a benchmark to show the computational capabilities of Graphic Processing Units (GPUs). The rich programming environment now available on GPUs and flexible hardware capabilities…
Simulators are a primary tool in computer architecture research but are extremely computationally intensive. Simulating modern architectures with increased core counts and recent workloads can be challenging, even on modern hardware. This…
Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications. However, prior evaluations have largely relied on simulators or small prototypes,…
Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…
Multi-tenant machine learning services have become emerging data-intensive workloads in data centers with heavy usage of GPU resources. Due to the large scale, many tuning parameters and heavy resource usage, it is usually impractical to…
Design of next generation computer systems should be supported by simulation infrastructure that must achieve a few contradictory goals such as fast execution time, high accuracy, and enough flexibility to allow comparison between large…
A micromagnetic simulator running on graphics processing unit (GPU) is presented. It achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude for large input…
The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the…