Related papers: Parallelized sequential composition, pipelines, an…

A high-level operational semantics for hardware weak memory models

Modern processors deploy a variety of weak memory models, which for efficiency reasons may execute instructions in an order different to that specified by the program text. The consequences of instruction reordering can be complex and…

Logic in Computer Science · Computer Science 2018-12-05 Robert J. Colvin , Graeme Smith

Emulating a large memory with a collection of small ones

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

A wide-spectrum language for verification of programs on weak memory models

Modern processors deploy a variety of weak memory models, which for efficiency reasons may (appear to) execute instructions in an order different to that specified by the program text. The consequences of instruction reordering can be…

Programming Languages · Computer Science 2018-12-04 Robert J. Colvin , Graeme Smith

Weak Memory Model Formalisms: Introduction and Survey

Memory consistency models define the order in which accesses to shared memory in a concurrent system may be observed to occur. Such models are a necessity since program order is not a reliable indicator of execution order, due to…

Programming Languages · Computer Science 2026-03-16 Roger C. Su , Robert J. Colvin

Restructuring a concurrent refinement algebra

The concurrent refinement algebra has been developed to support rely/guarantee reasoning about concurrent programs. The algebra supports atomic commands and defines parallel composition as a synchronous operation, as in Milner's SCCS. In…

Logic in Computer Science · Computer Science 2024-05-10 Ian J. Hayes , Larissa A. Meinicke , Naso Evangelou-Oost

Parallelizing Program Execution on Distributed Quantum Systems via Compiler/Hardware Co-Design

As quantum computers continue to improve and support larger, more complex computations, smart control hardware and compilers are needed to efficiently leverage the capabilities of these systems. This paper introduces a novel approach to…

Quantum Physics · Physics 2025-11-19 Folkert de Ronde , Alexander Knapen , Stephan Wong , Sebastian Feld

The Parallel Persistent Memory Model

We consider a parallel computational model that consists of $P$ processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-15 Guy E. Blelloch , Phillip B. Gibbons , Yan Gu , Charles McGuffey , Julian Shun

The Impact of Memory Models on Software Reliability in Multiprocessors

The memory consistency model is a fundamental system property characterizing a multiprocessor. The relative merits of strict versus relaxed memory models have been widely debated in terms of their impact on performance, hardware complexity…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-04-07 Alexander Jaffe , Thomas Moscibroda , Laura Effinger-Dean , Luis Ceze , Karin Strauss

The Virtues of Conflict: Analyzing Modern Concurrency

Modern shared memory multiprocessors permit reordering of memory operations for performance reasons. These reorderings are often a source of subtle bugs in programs written for such architectures. Traditional approaches to verify weak…

Software Engineering · Computer Science 2016-02-29 Ganesh Narayanaswamy , Saurabh Joshi , Daniel Kroening

Fully Parallel Particle Learning for GPGPUs and Other Parallel Devices

We develop a novel parallel resampling algorithm for fully parallelized particle filters, which is designed with GPUs (graphics processing units) or similar parallel computing devices in mind. With our new algorithm, a full cycle of…

Computation · Statistics 2016-08-17 Kenichiro McAlinn , Teruo Nakatsuma

Performance Analysis of Sequential Experimental Design for Calibration in Parallel Computing Environments

The unknown parameters of simulation models often need to be calibrated using observed data. When simulation models are expensive, calibration is usually carried out with an emulator. The effectiveness of the calibration process can be…

Computation · Statistics 2024-12-03 Özge Sürer , Stefan M. Wild

Mitigating Power Attacks through Fine-Grained Instruction Reordering

Side-channel attacks are a security exploit that take advantage of information leakage. They use measurement and analysis of physical parameters to reverse engineer and extract secrets from a system. Power analysis attacks in particular,…

Cryptography and Security · Computer Science 2021-07-26 Yun Chen , Ali Hajiabadi , Romain Poussier , Andreas Diavastos , Shivam Bhasin , Trevor E. Carlson

Memory DisOrder: Memory Re-orderings as a Timerless Side-channel

To improve efficiency, nearly all parallel processing units (CPUs and GPUs) implement relaxed memory models in which memory operations may be re-ordered, i.e., executed out-of-order. Prior testing work in this area found that memory…

Cryptography and Security · Computer Science 2026-01-14 Sean Siddens , Sanya Srivastava , Reese Levine , Josiah Dykstra , Tyler Sorensen

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…

Programming Languages · Computer Science 2016-04-13 Alcides Fonseca , Bruno Cabral , João Rafael , Ivo Correia

Weak Memory Models with Matching Axiomatic and Operational Definitions

Memory consistency models are notorious for being difficult to define precisely, to reason about, and to verify. More than a decade of effort has gone into nailing down the definitions of the ARM and IBM Power memory models, and yet there…

Programming Languages · Computer Science 2019-04-11 Sizhuo Zhang , Muralidaran Vijayaraghavan , Dan Lustig , Arvind

Embarrassingly Parallel Time Series Analysis for Large Scale Weak Memory Systems

Second order stationary models in time series analysis are based on the analysis of essential statistics whose computations follow a common pattern. In particular, with a map-reduce nomenclature, most of these operations can be modeled as…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-23 Francois Belletti , Evan Sparks , Michael Franklin , Alexandre M. Bayen

Balancing Pipeline Parallelism with Vocabulary Parallelism

Pipeline parallelism is widely used to scale the training of transformer-based large language models, various works have been done to improve its throughput and memory footprint. In this paper, we address a frequently overlooked issue: the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-06 Man Tsung Yeung , Penghui Qi , Min Lin , Xinyi Wan

Pipeline Parallelism with Controllable Memory

Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block, and show that the lifespan of the…

Machine Learning · Computer Science 2024-11-05 Penghui Qi , Xinyi Wan , Nyamdavaa Amar , Min Lin

A new kind of parallelism and its programming in the Explicitly Many-Processor Approach

The processor accelerators are effective because they are working not (completely) on principles of stored program computers. They use some kind of parallelism, and it is rather hard to program them effectively: a parallel architecture by…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-26 János Végh

Program Execution on Reconfigurable Multicore Architectures

Based on the two observations that diverse applications perform better on different multicore architectures, and that different phases of an application may have vastly different resource requirements, Pal et al. proposed a novel…

Programming Languages · Computer Science 2016-06-21 Sanjiva Prasad