English
Related papers

Related papers: The Glasgow Parallel Reduction Machine: Programmin…

200 papers

Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-07 Ashkan Tousimojarad , Wim Vanderbauwhede

Quantum multi-programming is a method utilizing contemporary noisy intermediate-scale quantum computers by executing multiple quantum circuits concurrently. Despite early research on it, the research remains on quantum gates or small-size…

Quantum Physics · Physics 2023-08-09 Gilchan Park , Kun Zhang , Kwangmin Yu , Vladimir Korepin

We present GSPMD, an automatic, compiler-based parallelization system for common machine learning computations. It allows users to write programs in the same way as for a single device, then give hints through a few annotations on how to…

We propose a new computational framework that combines the recently developed time-parallel (TP) and the compound wavelet matrix (CWM) methods. The framework, termed tpCWM, offers significant computational acceleration by making…

Computational Physics · Physics 2009-09-29 George Frantziskonis , Krishna Muralidharan , Pierre Deymier , Srdjan Simunovic , Sreekanth Pannala

Reduction operations are extensively employed in many computational problems. A reduction consists of, given a finite set of numeric elements, combining into a single value all elements in that set, using for this a combiner function. A…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-23 Walid Jradi , Hugo do Nascimento , Wellington Martins

Regular expression matching is essential for many applications, such as finding patterns in text, exploring substrings in large DNA sequences, or lexical analysis. However, sequential regular expression matching may be time-prohibitive for…

Formal Languages and Automata Theory · Computer Science 2015-06-30 Suejb Memeti , Sabri Pllana

In order to satisfy their ever increasing capacity and compute requirements, machine learning models are distributed across multiple nodes using numerous parallelism strategies. As a result, collective communications are often on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-24 Kishore Punniyamurthy , Khaled Hamidouche , Bradford M. Beckmann

Exactly solving multi-objective integer programming (MOIP) problems is often a very time consuming process, especially for large and complex problems. Parallel computing has the potential to significantly reduce the time taken to solve such…

Optimization and Control · Mathematics 2018-11-02 William Pettersson , Melih Ozlen

Writing efficient hybrid parallel code is tedious, error-prone, and requires good knowledge of both parallel programming and multithreading such as MPI and OpenMP, resp. Therefore, we present a framework which is based on a job model that…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-03 Ralf-Peter Mundani , Marko Ljucović , Ernst Rank

In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-12 Jose Nunez-Yanez , Mohammad Hosseinabady , Moslem Amiri , Andrés Rodríguez , Rafael Asenjo , Angeles Navarro , Rubén Gran-Tejero , Darío Suárez-Gracia

Overdecomposition has emerged as a powerful and sometimes essential technique in parallel programming. Many application domains or frameworks, including those based on adaptive mesh refinements, or tree codes use it. Charm++ is a parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-14 Aditya Bhosale , Anant Jain , Shourya Goel , Ritvik Rao , Peddoju Sateesh Kumar , Laxmikant Kale

A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of…

Accelerator Physics · Physics 2012-02-13 M. A. Kostin , N. V. Mokhov , K. Niita

We describe a new parallel implementation, mplrs, of the vertex enumeration code lrs that uses the MPI parallel environment and can be run on a network of computers. The implementation makes use of a C wrapper that essentially uses the…

Mathematical Software · Computer Science 2017-10-13 David Avis , Charles Jordan

Despite significant advances in Large Language Models (LLMs), planning tasks still present challenges for LLM-based agents. Existing planning methods face two key limitations: heavy constraints and cascading errors. To address these…

Computation and Language · Computer Science 2025-06-04 Zhengdong Lu , Weikai Lu , Yiling Tao , Yun Dai , ZiXuan Chen , Huiping Zhuang , Cen Chen , Hao Peng , Ziqian Zeng

In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic…

Artificial Intelligence · Computer Science 2017-07-10 Andres R. Masegosa , Ana M. Martinez , Hanen Borchani

We present a model of multithreaded computation, combining fork-join and single-instruction-multiple-data parallelisms, with an emphasis on estimating parallelism overheads of programs written for modern many-core architectures. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-02-04 Sardar Anisul Haque , Marc Moreno Maza , Ning Xie

Sparse General Matrix Multiply (SpGEMM) is key for various High-Performance Computing (HPC) applications such as genomics and graph analytics. Using the semiring abstraction, many algorithms can be formulated as SpGEMM, allowing…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-23 Thomas McFarland , Julian Bellavita , Giulia Guidi

This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions…

The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Yen-Hsiang Chang , Aydın Buluç , James Demmel

Lightweight vision networks have witnessed remarkable progress in recent years, yet achieving a satisfactory balance among parameter scale, computational overhead, and task performance remains difficult. Although many existing lightweight…

Computer Vision and Pattern Recognition · Computer Science 2026-01-16 Wei Xu
‹ Prev 1 2 3 10 Next ›