Related papers: MGSim + MGMark: A Framework for Multi-GPU System R…

MGSim - Simulation tools for multi-core processor architectures

MGSim is an open source discrete event simulator for on-chip hardware components, developed at the University of Amsterdam. It is intended to be a research and teaching vehicle to study the fine-grained hardware/software interactions on…

Hardware Architecture · Computer Science 2013-02-07 Mike Lankamp , Raphael Poss , Qiang Yang , Jian Fu , Irfan Uddin , Chris R. Jesshope

Muchisim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems

The design space exploration of scaled-out manycores for communication-intensive applications (e.g., graph analytics and sparse linear algebra) is hampered due to either lack of scalability or accuracy of existing frameworks at simulating…

Hardware Architecture · Computer Science 2024-04-23 Marcelo Orenes-Vera , Esin Tureci , Margaret Martonosi , David Wentzlaff

MGPU-TSM: A Multi-GPU System with Truly Shared Memory

The sizes of GPU applications are rapidly growing. They are exhausting the compute and memory resources of a single GPU, and are demanding the move to multiple GPUs. However, the performance of these applications scales sub-linearly with…

Hardware Architecture · Computer Science 2020-08-11 Saiful A. Mojumder , Yifan Sun , Leila Delshadtehrani , Yenai Ma , Trinayan Baruah , José L. Abellán , John Kim , David Kaeli , Ajay Joshi

Analyzing and Improving Hardware Modeling of Accel-Sim

GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU…

Hardware Architecture · Computer Science 2024-01-19 Rodrigo Huerta , Mojtaba Abaie Shoushtary , Antonio González

ONNXim: A Fast, Cycle-level Multi-core NPU Simulator

As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is becoming more important. However, existing…

Hardware Architecture · Computer Science 2024-06-13 Hyungkyu Ham , Wonhyuk Yang , Yunseon Shin , Okkyun Woo , Guseul Heo , Sangyeop Lee , Jongse Park , Gwangsun Kim

ACALSim: A Scalable Parallel Simulation Framework for High-Performance System Design Space Exploration

Architectural simulation has become the critical bottleneck limiting design space exploration for high-performance computing systems. Modern GPUs and AI accelerators -- with hundreds to thousands of tightly-coupled components -- demand…

Hardware Architecture · Computer Science 2026-05-25 Wei-Fen Lin , Jen-Chien Chang , Yen-Po Chen , Zi-Yi Tai , Yu-Cheng Chang , Chia-Pao Chiang , Yu-Yang Lee , Yu-Jie Wan

Exploring Modern GPU Memory System Design Challenges through Accurate Modeling

This paper explores the impact of simulator accuracy on architecture design decisions in the general-purpose graphics processing unit (GPGPU) space. We perform a detailed, quantitative analysis of the most popular publicly available GPU…

Hardware Architecture · Computer Science 2020-06-04 Mahmoud Khairy , Jain Akshay , Tor Aamodt , Timothy G. Rogers

Pac-Sim: Simulation of Multi-threaded Workloads using Intelligent, Live Sampling

High-performance, multi-core processors are the key to accelerating workloads in several application domains. To continue to scale performance at the limit of Moore's Law and Dennard scaling, software and hardware designers have turned to…

Hardware Architecture · Computer Science 2023-10-27 Changxi Liu , Alen Sabu , Akanksha Chaudhari , Qingxuan Kang , Trevor E. Carlson

Benchmarking MD systems simulations on the Graphics Processing Unit and Multi-Core Systems

Molecular dynamics facilitates the simulation of a complex system to be analyzed at molecular and atomic levels. Simulations can last a long period of time, even months. Due to this cause the graphics processing units (GPUs) and multi-core…

Computational Physics · Physics 2021-02-02 Iuliana Marin , Nicolae Goga , Maria Goga

CGSim: A Simulation Framework for Large Scale Distributed Computing Environment

Large-scale distributed computing infrastructures such as the Worldwide LHC Computing Grid (WLCG) require comprehensive simulation tools for evaluating performance, testing new algorithms, and optimizing resource allocation strategies.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-02 Sairam Sri Vatsavai , Raees Khan , Kuan-Chieh Hsu , Ozgur O. Kilic , Paul Nilsson , Tatiana Korchuganova , David K. Park , Sankha Dutta , Yihui Ren , Joseph Boudreau , Tasnuva Chowdhury , Shengyu Feng , Jaehyung Kim , Scott Klasky , Tadashi Maeno , Verena Ingrid Martinez , Norbert Podhorszki , Frédéric Suter , Wei Yang , Yiming Yang , Shinjae Yoo , Alexei Klimentov , Adolfy Hoisie

CloudSim 7G: An Integrated Toolkit for Modeling and Simulation of Future Generation Cloud Computing Environments

Cloud Computing has established itself as an efficient and cost-effective paradigm for the execution of web-based applications, and scientific workloads, that need elasticity and on-demand scalability capabilities. However, the evaluation…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-22 Remo Andreoli , Jie Zhao , Tommaso Cucinotta , Rajkumar Buyya

Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model

A modern graphics processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two dimensional Ising model [T. Preis et al., J. Comp.…

Computational Physics · Physics 2010-07-22 Benjamin Block , Peter Virnau , Tobias Preis

A Performance Study of the 2D Ising Model on GPUs

The simulation of the two-dimensional Ising model is used as a benchmark to show the computational capabilities of Graphic Processing Units (GPUs). The rich programming environment now available on GPUs and flexible hardware capabilities…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-26 Joshua Romero , Mauro Bisson , Massimiliano Fatica , Massimo Bernaschi

Parallelizing a modern GPU simulator

Simulators are a primary tool in computer architecture research but are extremely computationally intensive. Simulating modern architectures with increased core counts and recent workloads can be challenging, even on modern hardware. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-27 Rodrigo Huerta , Antonio González

Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device

Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications. However, prior evaluations have largely relied on simulators or small prototypes,…

Hardware Architecture · Computer Science 2025-09-09 Niansong Zhang , Wenbo Zhu , Courtney Golden , Dan Ilan , Hongzheng Chen , Christopher Batten , Zhiru Zhang

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs

Multi-tenant machine learning services have become emerging data-intensive workloads in data centers with heavy usage of GPU resources. Due to the large scale, many tuning parameters and heavy resource usage, it is usually impractical to…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-11 Ruofan Liang , Bingsheng He , Shengen Yan , Peng Sun

ScaleSimulator: A Fast and Cycle-Accurate Parallel Simulator for Architectural Exploration

Design of next generation computer systems should be supported by simulation infrastructure that must achieve a few contradictory goals such as fast execution time, high accuracy, and enough flexibility to allow comparison between large…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-02 Ori Chalak , Cai Weiguang , Li Wei , Fang Lei , Zheng Libing , Wang Jintang , Wu Zuguang , Gu Xiongli , Wang Haibin , Avi Mendelson

Grace: a Cross-platform Micromagnetic Simulator On Graphics Processing Units

A micromagnetic simulator running on graphics processing unit (GPU) is presented. It achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude for large input…

Computational Engineering, Finance, and Science · Computer Science 2014-11-11 Ru Zhu

MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms

The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-28 Yuke Wang , Boyuan Feng , Zheng Wang , Tong Geng , Kevin Barker , Ang Li , Yufei Ding