Related papers: The Glasgow Parallel Reduction Machine: Programmin…

A Parallel Task-based Approach to Linear Algebra

Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-07 Ashkan Tousimojarad , Wim Vanderbauwhede

Quantum multi-programming for Grover's search

Quantum multi-programming is a method utilizing contemporary noisy intermediate-scale quantum computers by executing multiple quantum circuits concurrently. Despite early research on it, the research remains on quantum gates or small-size…

Quantum Physics · Physics 2023-08-09 Gilchan Park , Kun Zhang , Kwangmin Yu , Vladimir Korepin

GSPMD: General and Scalable Parallelization for ML Computation Graphs

We present GSPMD, an automatic, compiler-based parallelization system for common machine learning computations. It allows users to write programs in the same way as for a single device, then give hints through a few annotations on how to…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-28 Yuanzhong Xu , HyoukJoong Lee , Dehao Chen , Blake Hechtman , Yanping Huang , Rahul Joshi , Maxim Krikun , Dmitry Lepikhin , Andy Ly , Marcello Maggioni , Ruoming Pang , Noam Shazeer , Shibo Wang , Tao Wang , Yonghui Wu , Zhifeng Chen

Time Parallel Scalable Multiphysics/Multiscale Framework

We propose a new computational framework that combines the recently developed time-parallel (TP) and the compound wavelet matrix (CWM) methods. The framework, termed tpCWM, offers significant computational acceleration by making…

Computational Physics · Physics 2009-09-29 George Frantziskonis , Krishna Muralidharan , Pierre Deymier , Srdjan Simunovic , Sreekanth Pannala

A Fast and Generic GPU-Based Parallel Reduction Implementation

Reduction operations are extensively employed in many computational problems. A reduction consists of, given a finite set of numeric elements, combining into a single value all elements in that set, using for this a combiner function. A…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-23 Walid Jradi , Hugo do Nascimento , Wellington Martins

PaREM: A Novel Approach for Parallel Regular Expression Matching

Regular expression matching is essential for many applications, such as finding patterns in text, exploring substrings in large DNA sequences, or lexical analysis. However, sequential regular expression matching may be time-prohibitive for…

Formal Languages and Automata Theory · Computer Science 2015-06-30 Suejb Memeti , Sabri Pllana

Optimizing Distributed ML Communication with Fused Computation-Collective Operations

In order to satisfy their ever increasing capacity and compute requirements, machine learning models are distributed across multiple nodes using numerous parallelism strategies. As a result, collective communications are often on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-24 Kishore Punniyamurthy , Khaled Hamidouche , Bradford M. Beckmann

Multi-objective integer programming: Synergistic parallel approaches

Exactly solving multi-objective integer programming (MOIP) problems is often a very time consuming process, especially for large and complex problems. Parallel computing has the potential to significantly reduce the time taken to solve such…

Optimization and Control · Mathematics 2018-11-02 William Pettersson , Melih Ozlen

Framework for the hybrid parallelisation of simulation codes

Writing efficient hybrid parallel code is tedious, error-prone, and requires good knowledge of both parallel programming and multithreading such as MPI and OpenMP, resp. Therefore, we present a framework which is based on a job model that…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-03 Ralf-Peter Mundani , Marko Ljucović , Ernst Rank

Parallelizing Workload Execution in Embedded and High-Performance Heterogeneous Systems

In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-12 Jose Nunez-Yanez , Mohammad Hosseinabady , Moslem Amiri , Andrés Rodríguez , Rafael Asenjo , Angeles Navarro , Rubén Gran-Tejero , Darío Suárez-Gracia

Efficient and Portable Support for Overdecomposition on Distributed Memory GPGPU Platforms

Overdecomposition has emerged as a powerful and sometimes essential technique in parallel programming. Many application domains or frameworks, including those based on adaptive mesh refinements, or tree codes use it. Charm++ is a parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-14 Aditya Bhosale , Anant Jain , Shourya Goel , Ritvik Rao , Peddoju Sateesh Kumar , Laxmikant Kale

New Parallel computing framework for radiation transport codes

A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of…

Accelerator Physics · Physics 2012-02-13 M. A. Kostin , N. V. Mokhov , K. Niita

mplrs: A scalable parallel vertex/facet enumeration code

We describe a new parallel implementation, mplrs, of the vertex enumeration code lrs that uses the MPI parallel environment and can be run on a network of computers. The implementation makes use of a C wrapper that essentially uses the…

Mathematical Software · Computer Science 2017-10-13 David Avis , Charles Jordan

Decompose, Plan in Parallel, and Merge: A Novel Paradigm for Large Language Models based Planning with Multiple Constraints

Despite significant advances in Large Language Models (LLMs), planning tasks still present challenges for LLM-based agents. Existing planning methods face two key limitations: heavy constraints and cascading errors. To address these…

Computation and Language · Computer Science 2025-06-04 Zhengdong Lu , Weikai Lu , Yiling Tao , Yun Dai , ZiXuan Chen , Huiping Zhuang , Cen Chen , Hao Peng , Ziqian Zeng

Probabilistic Graphical Models on Multi-Core CPUs using Java 8

In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic…

Artificial Intelligence · Computer Science 2017-07-10 Andres R. Masegosa , Ana M. Martinez , Hanen Borchani

A Many-core Machine Model for Designing Algorithms with Minimum Parallelism Overheads

We present a model of multithreaded computation, combining fork-join and single-instruction-multiple-data parallelisms, with an emphasis on estimating parallelism overheads of programs written for modern many-core architectures. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-02-04 Sardar Anisul Haque , Marc Moreno Maza , Ning Xie

Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication

Sparse General Matrix Multiply (SpGEMM) is key for various High-Performance Computing (HPC) applications such as genomics and graph analytics. Using the semiring abstraction, many algorithms can be formulated as SpGEMM, allowing…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-23 Thomas McFarland , Julian Bellavita , Giulia Guidi

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions…

Programming Languages · Computer Science 2018-12-21 Riyadh Baghdadi , Jessica Ray , Malek Ben Romdhane , Emanuele Del Sozzo , Abdurrahman Akkas , Yunming Zhang , Patricia Suriana , Shoaib Kamil , Saman Amarasinghe

Parallelizing the Approximate Minimum Degree Ordering Algorithm: Strategies and Evaluation

The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Yen-Hsiang Chang , Aydın Buluç , James Demmel

Human-inspired Global-to-Parallel Multi-scale Encoding for Lightweight Vision Models

Lightweight vision networks have witnessed remarkable progress in recent years, yet achieving a satisfactory balance among parameter scale, computational overhead, and task performance remains difficult. Although many existing lightweight…

Computer Vision and Pattern Recognition · Computer Science 2026-01-16 Wei Xu