English
Related papers

Related papers: Parallelizing a modern GPU simulator

200 papers

GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU…

Hardware Architecture · Computer Science 2024-01-19 Rodrigo Huerta , Mojtaba Abaie Shoushtary , Antonio González

Architectural simulation has become the critical bottleneck limiting design space exploration for high-performance computing systems. Modern GPUs and AI accelerators -- with hundreds to thousands of tightly-coupled components -- demand…

Hardware Architecture · Computer Science 2026-05-25 Wei-Fen Lin , Jen-Chien Chang , Yen-Po Chen , Zi-Yi Tai , Yu-Cheng Chang , Chia-Pao Chiang , Yu-Yang Lee , Yu-Jie Wan

Witnessing the advancing scale and complexity of chip design and benefiting from high-performance computation technologies, the simulation of Very Large Scale Integration (VLSI) Circuits imposes an increasing requirement for acceleration…

Data Structures and Algorithms · Computer Science 2023-04-27 Weijie Fang , Yanggeng Fu , Jiaquan Gao , Longkun Guo , Gregory Gutin , Xiaoyan Zhang

This paper presents two conceptually simple methods for parallelizing a Parallel Tempering Monte Carlo simulation in a distributed volunteer computing context, where computers belonging to the general public are used. The first method uses…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-03-31 Kamran Karimi , Neil G. Dickson , Firas Hamze

Simulations of systems with quenched disorder are extremely demanding, suffering from the combined effect of slow relaxation and the need of performing the disorder average. As a consequence, new algorithms, improved implementations, and…

Computational Physics · Physics 2020-05-20 Ravinder Kumar , Jonathan Gross , Wolfhard Janke , Martin Weigel

Due to decelerating gains in single-core CPU performance, computationally expensive simulations are increasingly executed on highly parallel hardware platforms. Agent-based simulations, where simulated entities act with a certain degree of…

Multiagent Systems · Computer Science 2018-07-04 Jiajian Xiao , Philipp Andelfinger , David Eckhoff , Wentong Cai , Alois Knoll

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Molecular dynamics facilitates the simulation of a complex system to be analyzed at molecular and atomic levels. Simulations can last a long period of time, even months. Due to this cause the graphics processing units (GPUs) and multi-core…

Computational Physics · Physics 2021-02-02 Iuliana Marin , Nicolae Goga , Maria Goga

Accel-Sim is a widely used computer architecture simulator that models the behavior of modern NVIDIA GPUs in great detail. However, although Accel-Sim and the underlying GPGPU-Sim model many of the features of real GPUs, thus far it has not…

Hardware Architecture · Computer Science 2023-09-06 Shichen Qiao , Xin Su , Matthew D. Sinclair

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks,…

Machine Learning · Computer Science 2018-12-20 Nikolas Ioannou , Celestine Dünner , Kornilios Kourtis , Thomas Parnell

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly…

Computational Physics · Physics 2013-11-20 R. Meyer

The focus of my PhD thesis is on exploring parallel approaches to efficiently solve problems modeled by constraints and presenting a new proposal. Current solvers are very advanced; they are carefully designed to effectively manage the…

Artificial Intelligence · Computer Science 2019-09-23 Fabio Tardivo

GPU architectural simulation is orders of magnitude slower than native execution, necessitating workload sampling for practical speedups. Existing methods rely on hand-crafted features with limited expressiveness, yielding either aggressive…

Performance · Computer Science 2026-03-03 Jiaqi Wang , Jingwei Sun , Jiyu Luo , Han Li , Guangzhong Sun

Many techniques in program synthesis, superoptimization, and array programming require parallel rollouts of general-purpose programs. GPUs, while capable targets for domain-specific parallelism, are traditionally underutilized by such…

Programming Languages · Computer Science 2026-04-15 Breandan Considine

Persistent homology is a crucial invariant that is used in many areas to understand data. The $O(N^4)$ run time is a hindrance to its use on most large datasets. We give a parallelization method to utilize multi-core machines and clusters.…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-10 Michael G. Rawson

This paper explores the impact of simulator accuracy on architecture design decisions in the general-purpose graphics processing unit (GPGPU) space. We perform a detailed, quantitative analysis of the most popular publicly available GPU…

Hardware Architecture · Computer Science 2020-06-04 Mahmoud Khairy , Jain Akshay , Tor Aamodt , Timothy G. Rogers

High-performance, multi-core processors are the key to accelerating workloads in several application domains. To continue to scale performance at the limit of Moore's Law and Dennard scaling, software and hardware designers have turned to…

Hardware Architecture · Computer Science 2023-10-27 Changxi Liu , Alen Sabu , Akanksha Chaudhari , Qingxuan Kang , Trevor E. Carlson

Design of next generation computer systems should be supported by simulation infrastructure that must achieve a few contradictory goals such as fast execution time, high accuracy, and enough flexibility to allow comparison between large…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-02 Ori Chalak , Cai Weiguang , Li Wei , Fang Lei , Zheng Libing , Wang Jintang , Wu Zuguang , Gu Xiongli , Wang Haibin , Avi Mendelson

Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cell interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on…

Biological Physics · Physics 2017-09-13 Daniel M. Sussman

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari
‹ Prev 1 2 3 10 Next ›