English
Related papers

Related papers: Pretty Fast Analysis: An embarrassingly parallel a…

200 papers

The performance of biomolecular molecular dynamics simulations has steadily increased on modern high performance computing resources but acceleration of the analysis of the output trajectories has lagged behind so that analyzing simulations…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-31 Mahzad Khoshlessan , Ioannis Paraskevakos , Geoffrey C. Fox , Shantenu Jha , Oliver Beckstein

Binary code analysis is widely used to assess a program's correctness, performance, and provenance. Binary analysis applications often construct control flow graphs, analyze data flow, and use debugging information to understand how machine…

Prior work on Automatically Scalable Computation (ASC) suggests that it is possible to parallelize sequential computation by building a model of whole-program execution, using that model to predict future computations, and then…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-21 Peter Kraft , Amos Waterland , Daniel Y Fu , Anitha Gollamudi , Shai Szulanski , Margo Seltzer

Developing efficient parallel applications is critical to advancing scientific development but requires significant performance analysis and optimization. Performance analysis tools help developers manage the increasing complexity and scale…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-25 Onur Cankur , Aditya Tomar , Daniel Nichols , Connor Scully-Allison , Katherine E. Isaacs , Abhinav Bhatele

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly…

Computational Physics · Physics 2013-11-20 R. Meyer

Comprehending the performance bottlenecks at the core of the intricate hardware-software interactions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-06 Ayesha Afzal , Georg Hager , Stefano Markidis , Gerhard Wellein

Discovering causal relationships from observational data is a crucial problem and it has applications in many research areas. The PC algorithm is the state-of-the-art constraint based method for causal discovery. However, runtime of the PC…

Artificial Intelligence · Computer Science 2016-11-11 Thuc Duy Le , Tao Hoang , Jiuyong Li , Lin Liu , Huawen Liu

Researchers working on the automatic parallelization of programs have long known that too much parallelism can be even worse for performance than too little, because spawning a task to be run on another CPU incurs overheads.…

Programming Languages · Computer Science 2011-09-08 Paul Bone , Zoltan Somogyi , Peter Schachte

The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Mustafa Hajij , Basem Assiri , Paul Rosen

A parallel numerical simulation algorithm is presented for fractional-order systems involving Caputo-type derivatives, based on the Adams-Bashforth-Moulton (ABM) predictor-corrector scheme. The parallel algorithm is implemented using…

Mathematical Software · Computer Science 2017-10-04 Cosmin Bonchis , Eva Kaslik , Florin Rosu

This work presents a comprehensive performance analysis and optimization of a multiscale agent-based cellular simulation. The optimizations applied are guided by detailed performance analysis and include memory management, load balance, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-28 Marc Clascà , Marta Garcia-Gasulla , Arnau Montagud , Jose Carbonell Caballero , Alfonso Valencia

Particle tracking in large-scale numerical simulations of turbulent flows presents one of the major bottlenecks in parallel performance and scaling efficiency. Here, we describe a particle tracking algorithm for large-scale parallel…

Fluid Dynamics · Physics 2022-05-31 Cristian C. Lalescu , Bérenger Bramas , Markus Rampp , Michael Wilczek

The library PRAND for pseudorandom number generation for modern CPUs and GPUs is presented. It contains both single-threaded and multi-threaded realizations of a number of modern and most reliable generators recently proposed and studied in…

Computational Physics · Physics 2014-02-18 L. Yu. Barash , L. N. Shchur

A range of computational biology software (GROMACS, AMBER, NAMD, LAMMPS, OpenMM, Psi4 and RELION) was benchmarked on a representative selection of HPC hardware, including AMD EPYC 7742 CPU nodes, NVIDIA V100 and AMD MI250X GPU nodes, and an…

A definition for a class of asynchronous cellular arrays is proposed. An example of such asynchrony would be independent Poisson arrivals of cell iterations. The Ising model in the continuous time formulation of Glauber falls into this…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Boris D. Lubachevsky

AI accelerator processing capabilities and memory constraints largely dictate the scale in which machine learning workloads (e.g., training and inference) can be executed within a desirable time frame. Training a state of the art,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-12 Michael Benington , Leo Phan , Chris Pierre Paul , Evan Shoemaker , Priyanka Ranade , Torstein Collett , Grant Hodgson Perez , Christopher Krieger

Several methods exist today to accelerate Machine Learning(ML) or Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search…

Machine Learning · Computer Science 2023-08-23 Srinjoy Das , Lawrence Rauchwerger

The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Yen-Hsiang Chang , Aydın Buluç , James Demmel

Simulators are a primary tool in computer architecture research but are extremely computationally intensive. Simulating modern architectures with increased core counts and recent workloads can be challenging, even on modern hardware. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-27 Rodrigo Huerta , Antonio González

The bulk-synchronous parallel (BSP) model provides a framework for writing parallel programs with predictable performance. In this paper we extend the BSP model to support what we will call pseudo-streaming algorithms for accelerators. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-24 Jan-Willem Buurlage , Tom Bannink , Abe Wits
‹ Prev 1 2 3 10 Next ›