Related papers: SIMD Parallel MCMC Sampling with Applications for …

Vector operations for accelerating expensive Bayesian computations -- a tutorial guide

Many applications in Bayesian statistics are extremely computationally intensive. However, they are often inherently parallel, making them prime targets for modern massively parallel processors. Multi-core and distributed computing is…

Computation · Statistics 2021-05-10 David J. Warne , Scott A. Sisson , Christopher Drovandi

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Parallel Approaches to Accelerate Bayesian Decision Trees

Markov Chain Monte Carlo (MCMC) is a well-established family of algorithms primarily used in Bayesian statistics to sample from a target distribution when direct sampling is challenging. Existing work on Bayesian decision trees uses MCMC.…

Computation · Statistics 2023-01-24 Efthyvoulos Drousiotis , Paul G. Spirakis , Simon Maskell

Parallel algorithms for problems of cluster analysis with very large amount of data

In this paper we solve on GPUs massive problems with large amount of data, which are not appropriate for solution with the SIMD technology. For the given problem we consider a three-level parallelization. The multithreading of CPU is used…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-02-18 Natalya Litvinenko

Optimization and parallelization of B-spline based orbital evaluations in QMC on multi/many-core shared memory processors

B-spline based orbital representations are widely used in Quantum Monte Carlo (QMC) simulations of solids, historically taking as much as 50% of the total run time. Random accesses to a large four-dimensional array make it challenging to…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-10 Amrita Mathuriya , Ye Luo , Anouar Benali , Luke Shulenburger , Jeongnim Kim

Concurrent Processing Memory

A theoretical memory with limited processing power and internal connectivity at each element is proposed. This memory carries out parallel processing within itself to solve generic array problems. The applicability of this in-memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-09-28 Chengpu Wang

SIMD-X: Programming and Processing of Graph Algorithms on GPUs

With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-12 Hang Liu , H. Howie Huang

A hybrid algorithm for parallel molecular dynamics simulations

This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization…

Materials Science · Physics 2017-09-13 Chris M. Mangiardi , Ralf Meyer

Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models

Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), that excel at accelerating parallel…

Neural and Evolutionary Computing · Computer Science 2023-11-09 Jan Finkbeiner , Thomas Gmeinder , Mark Pupilli , Alexander Titterton , Emre Neftci

Massively Parallel Graph Drawing and Representation Learning

To fully exploit the performance potential of modern multi-core processors, machine learning and data mining algorithms for big data must be parallelized in multiple ways. Today's CPUs consist of multiple cores, each following an…

Machine Learning · Computer Science 2020-11-09 Christian Böhm , Claudia Plant

Speculative Parallel Evaluation Of Classification Trees On GPGPU Compute Engines

We examine the problem of optimizing classification tree evaluation for on-line and real-time applications by using GPUs. Looking at trees with continuous attributes often used in image segmentation, we first put the existing algorithms for…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-11-08 Jason Spencer

Accelerating a fluvial incision and landscape evolution model with parallelism

Solving inverse problems and achieving statistical rigour in landscape evolution models requires running many model realizations. Parallel computation is necessary to achieve this in a reasonable time. However, no previous algorithm is…

Computational Engineering, Finance, and Science · Computer Science 2019-01-23 Richard Barnes

Sampling Parallelism for Fast and Efficient Bayesian Learning

Machine learning models, and deep neural networks in particular, are increasingly deployed in risk-sensitive domains such as healthcare, environmental forecasting, and finance, where reliable quantification of predictive uncertainty is…

Machine Learning · Computer Science 2026-04-07 Asena Karolin Özdemir , Lars H. Heyen , Arvid Weyrauch , Achim Streit , Markus Götz , Charlotte Debus

Effective GPU Parallelization of Distributed and Localized Model Predictive Control

To effectively control large-scale distributed systems online, model predictive control (MPC) has to swiftly solve the underlying high-dimensional optimization. There are multiple techniques applied to accelerate the solving process in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-30 Carmen Amo Alonso , Shih-Hao Tseng

Large-Scale Geospatial Processing on Multi-Core and Many-Core Processors: Evaluations on CPUs, GPUs and MICs

Geospatial Processing, such as queries based on point-to-polyline shortest distance and point-in-polygon test, are fundamental to many scientific and engineering applications, including post-processing large-scale environmental and climate…

Databases · Computer Science 2014-03-05 Jianting Zhang Simin You

Memory-constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU Platforms

The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task and data-level parallelism to meet throughput requirements…

Signal Processing · Electrical Eng. & Systems 2017-12-01 Shuoxin Lin , Jiahao Wu , Shuvra S. Bhattacharyya

Benchmarking mixed-mode PETSc performance on high-performance architectures

The trend towards highly parallel multi-processing is ubiquitous in all modern computer architectures, ranging from handheld devices to large-scale HPC systems; yet many applications are struggling to fully utilise the multiple levels of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-07-19 Michael Lange , Gerard Gorman , Michele Weiland , Lawrence Mitchell , Xiaohu Guo , James Southern

Shared-Memory Parallel Maximal Clique Enumeration

We present shared-memory parallel methods for Maximal Clique Enumeration (MCE) from a graph. MCE is a fundamental and well-studied graph analytics task, and is a widely used primitive for identifying dense structures in a graph. Due to its…

Data Structures and Algorithms · Computer Science 2020-01-30 Apurba Das , Seyed-Vahid Sanei-Mehri , Srikanta Tirthapura

A Multi-signal Variant for the GPU-based Parallelization of Growing Self-Organizing Networks

Among the many possible approaches for the parallelization of self-organizing networks, and in particular of growing self-organizing networks, perhaps the most common one is producing an optimized, parallel implementation of the standard…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-31 Giacomo Parigi , Angelo Stramieri , Danilo Pau , Marco Piastra

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing the resulting implicit memory allocations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-14 Fabian Knorr , Philip Salzmann , Peter Thoman , Thomas Fahringer