Related papers: A Scalable Shared-Memory Parallel Simplex for Larg…

A Hybrid Multi-GPU Implementation of Simplex Algorithm with CPU Collaboration

The simplex algorithm has been successfully used for many years in solving linear programming (LP) problems. Due to the intensive computations required (especially for the solution of large LP problems), parallel approaches have also…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-22 Basilis Mamalis , Marios Perlitis

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

Applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization. While prior work with prefetching,…

Hardware Architecture · Computer Science 2023-05-05 Marcelo Orenes-Vera , Esin Tureci , David Wentzlaff , Margaret Martonosi

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

Effective Implementation of GPU-based Revised Simplex algorithm applying new memory management and cycle avoidance strategies

Graphics Processing Units (GPUs) with high computational capabilities used as modern parallel platforms to deal with complex computational problems. We use this platform to solve large-scale linear programing problems by revised simplex…

Optimization and Control · Mathematics 2018-03-14 Arash Raeisi Gahrouei , Mehdi Ghatee

Emulating a large memory with a collection of small ones

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

Parallel training of linear models without compromising convergence

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks,…

Machine Learning · Computer Science 2018-12-20 Nikolas Ioannou , Celestine Dünner , Kornilios Kourtis , Thomas Parnell

Efficient Parallelization of Short-Range Molecular Dynamics Simulations on Many-Core Systems

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly…

Computational Physics · Physics 2013-11-20 R. Meyer

Parallelizing the Approximate Minimum Degree Ordering Algorithm: Strategies and Evaluation

The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Yen-Hsiang Chang , Aydın Buluç , James Demmel

A distributed-memory hierarchical solver for general sparse linear systems

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it…

Numerical Analysis · Mathematics 2017-12-21 Chao Chen , Hadi Pouransari , Sivasankaran Rajamanickam , Erik G. Boman , Eric Darve

A Parallel Task-based Approach to Linear Algebra

Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-07 Ashkan Tousimojarad , Wim Vanderbauwhede

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…

Machine Learning · Computer Science 2025-03-13 Ruifeng She , Bowen Pang , Kai Li , Zehua Liu , Tao Zhong

Shared memory parallelism in Modern C++ and HPX

Parallel programming remains a daunting challenge, from the struggle to express a parallel algorithm without cluttering the underlying synchronous logic, to describing which devices to employ in a calculation, to correctness. Over the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-10 Patrick Diehl , Steven R. Brandt , Hartmut Kaiser

Automatic Parallelization of Sequential Programs

Prior work on Automatically Scalable Computation (ASC) suggests that it is possible to parallelize sequential computation by building a model of whole-program execution, using that model to predict future computations, and then…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-21 Peter Kraft , Amos Waterland , Daniel Y Fu , Anitha Gollamudi , Shai Szulanski , Margo Seltzer

Parallelizing Optimal Multiple Sequence Alignment by Dynamic Programming

Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-processor systems, due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-30 Manal Helal , Hossam El-Gindy , Lenore Mullin , Bruno Gaeta

A Review of Literature on Parallel Constraint Solving

As multicore computing is now standard, it seems irresponsible for constraints researchers to ignore the implications of it. Researchers need to address a number of issues to exploit parallelism, such as: investigating which constraint…

Artificial Intelligence · Computer Science 2018-03-30 Ian P. Gent , Ciaran McCreesh , Ian Miguel , Neil C. A. Moore , Peter Nightingale , Patrick Prosser , Chris Unsworth

Scheduling optimization of parallel linear algebra algorithms using Supervised Learning

Linear algebra algorithms are used widely in a variety of domains, e.g machine learning, numerical physics and video games graphics. For all these applications, loop-level parallelism is required to achieve high performance. However,…

Machine Learning · Computer Science 2020-01-24 G. Laberge , S. Shirzad , P. Diehl , H. Kaiser , S. Prudhomme , A. Lemoine

A Parallel Linear Temporal Logic Tableau

For many applications, we are unable to take full advantage of the potential massive parallelisation offered by supercomputers or cloud computing because it is too hard to work out how to divide up the computation task between processors in…

Logic in Computer Science · Computer Science 2017-09-08 John C. McCabe-Dansted , Mark Reynolds

An Easy-to-use Scalable Framework for Parallel Recursive Backtracking

Supercomputers are equipped with an increasingly large number of cores to use computational power as a way of solving problems that are otherwise intractable. Unfortunately, getting serial algorithms to run in parallel to take advantage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-31 Faisal N. Abu-Khzam , Khuzaima Daudjee , Amer E. Mouawad , Naomi Nishimura

High-Quality Shared-Memory Graph Partitioning

Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation in processing graphs. Recently, size, variety, and structural complexity of these networks has grown dramatically.…

Data Structures and Algorithms · Computer Science 2018-10-16 Yaroslav Akhremtsev , Peter Sanders , Christian Schulz