English
Related papers

Related papers: MGSim + MGMark: A Framework for Multi-GPU System R…

200 papers

MGSim is an open source discrete event simulator for on-chip hardware components, developed at the University of Amsterdam. It is intended to be a research and teaching vehicle to study the fine-grained hardware/software interactions on…

Hardware Architecture · Computer Science 2013-02-07 Mike Lankamp , Raphael Poss , Qiang Yang , Jian Fu , Irfan Uddin , Chris R. Jesshope

The design space exploration of scaled-out manycores for communication-intensive applications (e.g., graph analytics and sparse linear algebra) is hampered due to either lack of scalability or accuracy of existing frameworks at simulating…

Hardware Architecture · Computer Science 2024-04-23 Marcelo Orenes-Vera , Esin Tureci , Margaret Martonosi , David Wentzlaff

The sizes of GPU applications are rapidly growing. They are exhausting the compute and memory resources of a single GPU, and are demanding the move to multiple GPUs. However, the performance of these applications scales sub-linearly with…

Hardware Architecture · Computer Science 2020-08-11 Saiful A. Mojumder , Yifan Sun , Leila Delshadtehrani , Yenai Ma , Trinayan Baruah , José L. Abellán , John Kim , David Kaeli , Ajay Joshi

GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU…

Hardware Architecture · Computer Science 2024-01-19 Rodrigo Huerta , Mojtaba Abaie Shoushtary , Antonio González

As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is becoming more important. However, existing…

Hardware Architecture · Computer Science 2024-06-13 Hyungkyu Ham , Wonhyuk Yang , Yunseon Shin , Okkyun Woo , Guseul Heo , Sangyeop Lee , Jongse Park , Gwangsun Kim

Architectural simulation has become the critical bottleneck limiting design space exploration for high-performance computing systems. Modern GPUs and AI accelerators -- with hundreds to thousands of tightly-coupled components -- demand…

Hardware Architecture · Computer Science 2026-05-25 Wei-Fen Lin , Jen-Chien Chang , Yen-Po Chen , Zi-Yi Tai , Yu-Cheng Chang , Chia-Pao Chiang , Yu-Yang Lee , Yu-Jie Wan

This paper explores the impact of simulator accuracy on architecture design decisions in the general-purpose graphics processing unit (GPGPU) space. We perform a detailed, quantitative analysis of the most popular publicly available GPU…

Hardware Architecture · Computer Science 2020-06-04 Mahmoud Khairy , Jain Akshay , Tor Aamodt , Timothy G. Rogers

High-performance, multi-core processors are the key to accelerating workloads in several application domains. To continue to scale performance at the limit of Moore's Law and Dennard scaling, software and hardware designers have turned to…

Hardware Architecture · Computer Science 2023-10-27 Changxi Liu , Alen Sabu , Akanksha Chaudhari , Qingxuan Kang , Trevor E. Carlson

Molecular dynamics facilitates the simulation of a complex system to be analyzed at molecular and atomic levels. Simulations can last a long period of time, even months. Due to this cause the graphics processing units (GPUs) and multi-core…

Computational Physics · Physics 2021-02-02 Iuliana Marin , Nicolae Goga , Maria Goga

Large-scale distributed computing infrastructures such as the Worldwide LHC Computing Grid (WLCG) require comprehensive simulation tools for evaluating performance, testing new algorithms, and optimizing resource allocation strategies.…

Cloud Computing has established itself as an efficient and cost-effective paradigm for the execution of web-based applications, and scientific workloads, that need elasticity and on-demand scalability capabilities. However, the evaluation…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-22 Remo Andreoli , Jie Zhao , Tommaso Cucinotta , Rajkumar Buyya

A modern graphics processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two dimensional Ising model [T. Preis et al., J. Comp.…

Computational Physics · Physics 2010-07-22 Benjamin Block , Peter Virnau , Tobias Preis

The simulation of the two-dimensional Ising model is used as a benchmark to show the computational capabilities of Graphic Processing Units (GPUs). The rich programming environment now available on GPUs and flexible hardware capabilities…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-26 Joshua Romero , Mauro Bisson , Massimiliano Fatica , Massimo Bernaschi

Simulators are a primary tool in computer architecture research but are extremely computationally intensive. Simulating modern architectures with increased core counts and recent workloads can be challenging, even on modern hardware. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-27 Rodrigo Huerta , Antonio González

Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications. However, prior evaluations have largely relied on simulators or small prototypes,…

Hardware Architecture · Computer Science 2025-09-09 Niansong Zhang , Wenbo Zhu , Courtney Golden , Dan Ilan , Hongzheng Chen , Christopher Batten , Zhiru Zhang

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

Multi-tenant machine learning services have become emerging data-intensive workloads in data centers with heavy usage of GPU resources. Due to the large scale, many tuning parameters and heavy resource usage, it is usually impractical to…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-11 Ruofan Liang , Bingsheng He , Shengen Yan , Peng Sun

Design of next generation computer systems should be supported by simulation infrastructure that must achieve a few contradictory goals such as fast execution time, high accuracy, and enough flexibility to allow comparison between large…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-02 Ori Chalak , Cai Weiguang , Li Wei , Fang Lei , Zheng Libing , Wang Jintang , Wu Zuguang , Gu Xiongli , Wang Haibin , Avi Mendelson

A micromagnetic simulator running on graphics processing unit (GPU) is presented. It achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude for large input…

Computational Engineering, Finance, and Science · Computer Science 2014-11-11 Ru Zhu

The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-28 Yuke Wang , Boyuan Feng , Zheng Wang , Tong Geng , Kevin Barker , Ang Li , Yufei Ding
‹ Prev 1 2 3 10 Next ›