Related papers: Parallelizing a modern GPU simulator

Analyzing and Improving Hardware Modeling of Accel-Sim

GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU…

Hardware Architecture · Computer Science 2024-01-19 Rodrigo Huerta , Mojtaba Abaie Shoushtary , Antonio González

ACALSim: A Scalable Parallel Simulation Framework for High-Performance System Design Space Exploration

Architectural simulation has become the critical bottleneck limiting design space exploration for high-performance computing systems. Modern GPUs and AI accelerators -- with hundreds to thousands of tightly-coupled components -- demand…

Hardware Architecture · Computer Science 2026-05-25 Wei-Fen Lin , Jen-Chien Chang , Yen-Po Chen , Zi-Yi Tai , Yu-Cheng Chang , Chia-Pao Chiang , Yu-Yang Lee , Yu-Jie Wan

Acceleration for Timing-Aware Gate-Level Logic Simulation with One-Pass GPU Parallelism

Witnessing the advancing scale and complexity of chip design and benefiting from high-performance computation technologies, the simulation of Very Large Scale Integration (VLSI) Circuits imposes an increasing requirement for acceleration…

Data Structures and Algorithms · Computer Science 2023-04-27 Weijie Fang , Yanggeng Fu , Jiaquan Gao , Longkun Guo , Gregory Gutin , Xiaoyan Zhang

High-Performance Physics Simulations Using Multi-Core CPUs and GPGPUs in a Volunteer Computing Context

This paper presents two conceptually simple methods for parallelizing a Parallel Tempering Monte Carlo simulation in a distributed volunteer computing context, where computers belonging to the general public are used. The first method uses…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-03-31 Kamran Karimi , Neil G. Dickson , Firas Hamze

Massively parallel simulations for disordered systems

Simulations of systems with quenched disorder are extremely demanding, suffering from the combined effect of slow relaxation and the need of performing the disorder average. As a consequence, new algorithms, improved implementations, and…

Computational Physics · Physics 2020-05-20 Ravinder Kumar , Jonathan Gross , Wolfhard Janke , Martin Weigel

A Survey on Agent-based Simulation using Hardware Accelerators

Due to decelerating gains in single-core CPU performance, computationally expensive simulations are increasingly executed on highly parallel hardware platforms. Agent-based simulations, where simulated entities act with a certain degree of…

Multiagent Systems · Computer Science 2018-07-04 Jiajian Xiao , Philipp Andelfinger , David Eckhoff , Wentong Cai , Alois Knoll

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Benchmarking MD systems simulations on the Graphics Processing Unit and Multi-Core Systems

Molecular dynamics facilitates the simulation of a complex system to be analyzed at molecular and atomic levels. Simulations can last a long period of time, even months. Due to this cause the graphics processing units (GPUs) and multi-core…

Computational Physics · Physics 2021-02-02 Iuliana Marin , Nicolae Goga , Maria Goga

Integrating Per-Stream Stat Tracking into Accel-Sim

Accel-Sim is a widely used computer architecture simulator that models the behavior of modern NVIDIA GPUs in great detail. However, although Accel-Sim and the underlying GPGPU-Sim model many of the features of real GPUs, thus far it has not…

Hardware Architecture · Computer Science 2023-09-06 Shichen Qiao , Xin Su , Matthew D. Sinclair

Parallel training of linear models without compromising convergence

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks,…

Machine Learning · Computer Science 2018-12-20 Nikolas Ioannou , Celestine Dünner , Kornilios Kourtis , Thomas Parnell

Efficient Parallelization of Short-Range Molecular Dynamics Simulations on Many-Core Systems

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly…

Computational Physics · Physics 2013-11-20 R. Meyer

Experimenting with Constraint Programming on GPU

The focus of my PhD thesis is on exploring parallel approaches to efficiently solve problems modeled by constraints and presenting a new proposal. Current solvers are very advanced; they are carefully designed to effectively manage the…

Artificial Intelligence · Computer Science 2019-09-23 Fabio Tardivo

GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

GPU architectural simulation is orders of magnitude slower than native execution, necessitating workload sampling for practical speedups. Existing methods rely on hand-crafted features with limited expressiveness, yielding either aggressive…

Performance · Computer Science 2026-03-03 Jiaqi Wang , Jingwei Sun , Jiyu Luo , Han Li , Guangzhong Sun

Towards a Linear-Algebraic Hypervisor

Many techniques in program synthesis, superoptimization, and array programming require parallel rollouts of general-purpose programs. GPUs, while capable targets for domain-specific parallelism, are traditionally underutilized by such…

Programming Languages · Computer Science 2026-04-15 Breandan Considine

Linear Run Time of Persistent Homology Computation with GPU Parallelization

Persistent homology is a crucial invariant that is used in many areas to understand data. The $O(N^4)$ run time is a hindrance to its use on most large datasets. We give a parallelization method to utilize multi-core machines and clusters.…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-10 Michael G. Rawson

Exploring Modern GPU Memory System Design Challenges through Accurate Modeling

This paper explores the impact of simulator accuracy on architecture design decisions in the general-purpose graphics processing unit (GPGPU) space. We perform a detailed, quantitative analysis of the most popular publicly available GPU…

Hardware Architecture · Computer Science 2020-06-04 Mahmoud Khairy , Jain Akshay , Tor Aamodt , Timothy G. Rogers

Pac-Sim: Simulation of Multi-threaded Workloads using Intelligent, Live Sampling

High-performance, multi-core processors are the key to accelerating workloads in several application domains. To continue to scale performance at the limit of Moore's Law and Dennard scaling, software and hardware designers have turned to…

Hardware Architecture · Computer Science 2023-10-27 Changxi Liu , Alen Sabu , Akanksha Chaudhari , Qingxuan Kang , Trevor E. Carlson

ScaleSimulator: A Fast and Cycle-Accurate Parallel Simulator for Architectural Exploration

Design of next generation computer systems should be supported by simulation infrastructure that must achieve a few contradictory goals such as fast execution time, high accuracy, and enough flexibility to allow comparison between large…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-02 Ori Chalak , Cai Weiguang , Li Wei , Fang Lei , Zheng Libing , Wang Jintang , Wu Zuguang , Gu Xiongli , Wang Haibin , Avi Mendelson

cellGPU: massively parallel simulations of dynamic vertex models

Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cell interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on…

Biological Physics · Physics 2017-09-13 Daniel M. Sussman

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari