Related papers: Practical Parallel External Memory Algorithms via …

On Parallel External-Memory Bidirectional Search

Parallelization and External Memory (PEM) techniques have significantly enhanced the capabilities of search algorithms when solving large-scale problems. Previous research on PEM has primarily centered on unidirectional algorithms, with…

Artificial Intelligence · Computer Science 2025-01-06 Lior Siag , Shahaf S. Shperberg , Ariel Felner , Nathan R. Sturtevant

The Efficiency of MapReduce in Parallel External Memory

Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-12-19 Gero Greiner , Riko Jacob

Software Alchemy: Turning Complex Statistical Computations into Embarrassingly-Parallel Ones

The growth in the use of computationally intensive statistical procedures, especially with Big Data, has necessitated the usage of parallel computation on diverse platforms such as multicore, GPU, clusters and clouds. However, slowdown due…

Computation · Statistics 2014-09-23 Norman Matloff

Semi-External Memory Sparse Matrix Multiplication for Billion-Node Graphs

Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Da Zheng , Disa Mhembere , Vince Lyzinski , Joshua Vogelstein , Carey E. Priebe , Randal Burns

Bulk-synchronous pseudo-streaming algorithms for many-core accelerators

The bulk-synchronous parallel (BSP) model provides a framework for writing parallel programs with predictable performance. In this paper we extend the BSP model to support what we will call pseudo-streaming algorithms for accelerators. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-24 Jan-Willem Buurlage , Tom Bannink , Abe Wits

Emulating a large memory with a collection of small ones

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even…

Data Structures and Algorithms · Computer Science 2019-08-22 Laxman Dhulipala , Guy E. Blelloch , Julian Shun

Understanding Bulk-Bitwise Processing In-Memory Through Database Analytics

Bulk-bitwise processing-in-memory (PIM), where large bitwise operations are performed in parallel by the memory array itself, is an emerging form of computation with the potential to mitigate the memory wall problem. This paper examines the…

Hardware Architecture · Computer Science 2023-09-29 Ben Perach , Ronny Ronen , Benny Kimelfeld , Shahar Kvatinsky

Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures

In this paper, we perform an empirical evaluation of the Parallel External Memory (PEM) model in the context of geometric problems. In particular, we implement the parallel distribution sweeping framework of Ajwani, Sitchinava and Zeh to…

Data Structures and Algorithms · Computer Science 2013-06-20 Deepak Ajwani , Nodari Sitchinava

Parallel Computing Environments and Methods for Power Distribution System Simulation

The development of cost-effective highperformance parallel computing on multi-processor supercomputers makes it attractive to port excessively time consuming simulation software from personal computers (PC) to super computes. The power…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Ning Lu , Z. Todd Taylor , David P. Chassin , Ross T. Guttromson , R. Scott Studham

Efficient 2D Tensor Network Simulation of Quantum Systems

Simulation of quantum systems is challenging due to the exponential size of the state space. Tensor networks provide a systematically improvable approximation for quantum states. 2D tensor networks such as Projected Entangled Pair States…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-04 Yuchen Pang , Tianyi Hao , Annika Dugad , Yiqing Zhou , Edgar Solomonik

BCIM: Efficient Implementation of Binary Neural Network Based on Computation in Memory

Applications of Binary Neural Networks (BNNs) are promising for embedded systems with hard constraints on computing power. Contrary to conventional neural networks with the floating-point datatype, BNNs use binarized weights and activations…

Emerging Technologies · Computer Science 2022-11-14 Mahdi Zahedi , Taha Shahroodi , Stephan Wong , Said Hamdioui

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…

Hardware Architecture · Computer Science 2023-04-04 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

Efficient Distributed Quantum Computing

We provide algorithms for efficiently addressing quantum memory in parallel. These imply that the standard circuit model can be simulated with low overhead by the more realistic model of a distributed quantum computer. As a result, the…

Quantum Physics · Physics 2013-03-13 Robert Beals , Stephen Brierley , Oliver Gray , Aram Harrow , Samuel Kutin , Noah Linden , Dan Shepherd , Mark Stather

A new kind of parallelism and its programming in the Explicitly Many-Processor Approach

The processor accelerators are effective because they are working not (completely) on principles of stored program computers. They use some kind of parallelism, and it is rather hard to program them effectively: a parallel architecture by…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-26 János Végh

BSP Sorting: An experimental Study

The Bulk-Synchronous Parallel model of computation has been used for the architecture independent design and analysis of parallel algorithms whose performance is expressed not only in terms of problem size n but also in terms of parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-29 Alexandros V. Gerbessiotis , Constantinos J. Siniolakis

Elastic Bulk Synchronous Parallel Model for Distributed Deep Learning

The bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallel computing that has successfully been employed for distributed training of machine learning models. A prevalent shortcoming of the BSP is…

Machine Learning · Computer Science 2020-01-07 Xing Zhao , Manos Papagelis , Aijun An , Bao Xin Chen , Junfeng Liu , Yonggang Hu

Parallel Processing of Large Graphs

More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-06-04 Tomasz Kajdanowicz , Przemyslaw Kazienko , Wojciech Indyk

MGPU-TSM: A Multi-GPU System with Truly Shared Memory

The sizes of GPU applications are rapidly growing. They are exhausting the compute and memory resources of a single GPU, and are demanding the move to multiple GPUs. However, the performance of these applications scales sub-linearly with…

Hardware Architecture · Computer Science 2020-08-11 Saiful A. Mojumder , Yifan Sun , Leila Delshadtehrani , Yenai Ma , Trinayan Baruah , José L. Abellán , John Kim , David Kaeli , Ajay Joshi

A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems

For the last thirty years, several Dynamic Memory Managers (DMMs) have been proposed. Such DMMs include first fit, best fit, segregated fit and buddy systems. Since the performance, memory usage and energy consumption of each DMM differs,…

Neural and Evolutionary Computing · Computer Science 2024-07-16 José L. Risco-Martín , David Atienza , J. Manuel Colmenar , Oscar Garnica