Related papers: Parallelizing the Approximate Minimum Degree Order…

On the improvement of the in-place merge algorithm parallelization

In this paper, we present several improvements in the parallelization of the in-place merge algorithm, which merges two contiguous sorted arrays into one with an O(T) space complexity (where T is the number of threads). The approach divides…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-27 Berenger Bramas , Quentin Bramas

A Scalable Shared-Memory Parallel Simplex for Large-Scale Linear Programming

The Simplex tableau has been broadly used and investigated in the industry and academia. With the advent of the big data era, ever larger problems are posed to be solved in ever larger machines whose architecture type did not exist in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-29 Demetrios Coutinho , Felipe O. Lins e Silva , Daniel Aloise , Samuel , Xavier-de-Souza

Parallel Adaptive Sampling with almost no Synchronization

Approximation via sampling is a widespread technique whenever exact solutions are too expensive. In this paper, we present techniques for an efficient parallelization of adaptive (a. k. a. progressive) sampling algorithms on multi-threaded…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-25 Alexander van der Grinten , Eugenio Angriman , Henning Meyerhenke

Efficient Parallelization of Short-Range Molecular Dynamics Simulations on Many-Core Systems

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly…

Computational Physics · Physics 2013-11-20 R. Meyer

Parallelizing Query Optimization on Shared-Nothing Architectures

Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query…

Databases · Computer Science 2015-11-06 Immanuel Trummer , Christoph Koch

Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning

We present a novel characterization of the mapping of multiple parallelism forms (e.g. data and model parallelism) onto hierarchical accelerator systems that is hierarchy-aware and greatly reduces the space of software-to-hardware mapping.…

Programming Languages · Computer Science 2021-11-17 Ningning Xie , Tamara Norman , Dominik Grewe , Dimitrios Vytiniotis

Fast multiplication of random dense matrices with fixed sparse matrices

This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation…

Computational Engineering, Finance, and Science · Computer Science 2024-05-14 Tianyu Liang , Riley Murray , Aydın Buluç , James Demmel

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…

Machine Learning · Computer Science 2025-03-13 Ruifeng She , Bowen Pang , Kai Li , Zehua Liu , Tao Zhong

A distributed-memory hierarchical solver for general sparse linear systems

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it…

Numerical Analysis · Mathematics 2017-12-21 Chao Chen , Hadi Pouransari , Sivasankaran Rajamanickam , Erik G. Boman , Eric Darve

Parallel image thinning through topological operators on shared memory parallel machines

In this paper, we present a concurrent implementation of a powerful topological thinning operator. This operator is able to act directly over grayscale images without modifying their topology. We introduce an adapted parallelization…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-31 Ramzi Mahmoudi , Mohamed Akil , Petr Matas

Highly Parallel Sparse Matrix-Matrix Multiplication

Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-09 Aydın Buluç , John R. Gilbert

Synthesis of signal processing algorithms with constraints on minimal parallelism and memory space

This thesis develops signal-processing algorithms and implementation schemes under constraints of minimal parallelism and memory space, with the goal of improving energy efficiency of low-power computing hardware. We propose (i) a…

Signal Processing · Electrical Eng. & Systems 2025-12-30 Sergey Salishev

Parallelizing Optimal Multiple Sequence Alignment by Dynamic Programming

Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-processor systems, due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-30 Manal Helal , Hossam El-Gindy , Lenore Mullin , Bruno Gaeta

Parallel Vertex Approximate Gradient discretization of hybrid dimensional Darcy flow and transport in discrete fracture networks

This paper proposes a parallel numerical algorithm to simulate the flow and the transport in a discrete fracture network taking into account the mass exchanges with the surrounding matrix. The discretization of the Darcy fluxes is based on…

Numerical Analysis · Mathematics 2016-11-18 Feng Xing , Roland Masson , Simon Lopez

Parallel Delta-Stepping Algorithm for Shared Memory Architectures

We present a shared memory implementation of a parallel algorithm, called delta-stepping, for solving the single source shortest path problem for directed and undirected graphs. In order to reduce synchronization costs we make some…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-21 M. Kranjčević , D. Palossi , S. Pintarelli

Parallel Approximation Algorithms for Facility-Location Problems

This paper presents the design and analysis of parallel approximation algorithms for facility-location problems, including $\NC$ and $\RNC$ algorithms for (metric) facility location, $k$-center, $k$-median, and $k$-means. These problems…

Data Structures and Algorithms · Computer Science 2010-06-11 Guy E. Blelloch , Kanat Tangwongsan

A Fast Minimum Degree Algorithm and Matching Lower Bound

The minimum degree algorithm is one of the most widely-used heuristics for reducing the cost of solving large sparse systems of linear equations. It has been studied for nearly half a century and has a rich history of bridging techniques…

Data Structures and Algorithms · Computer Science 2023-04-11 Robert Cummings , Matthew Fahrbach , Animesh Fatehpuria

Accelerated Parallel and Distributed Algorithm using Limited Internal Memory for Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a powerful technique for dimension reduction, extracting latent factors and learning part-based representation. For large datasets, NMF performance depends on some major issues: fast algorithms,…

Optimization and Control · Mathematics 2015-07-01 Duy-Khuong Nguyen , Tu-Bao Ho

A Discussion on Parallelization Schemes for Stochastic Vector Quantization Algorithms

This paper studies parallelization schemes for stochastic Vector Quantization algorithms in order to obtain time speed-ups using distributed resources. We show that the most intuitive parallelization scheme does not lead to better…

Machine Learning · Statistics 2012-05-14 Matthieu Durut , Benoît Patra , Fabrice Rossi

Parallel Random Search Algorithm of Constrained Pseudo-Boolean Optimization for Some Distinctive Large-Scale Problems

In this paper, we consider an approach to the parallelizing of the algorithms realizing the modified probability changigng method with adaptation and partial rollback procedure for constrained pseudo-Boolean optimization problems. Existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-09-03 Lev Kazakovtsev