Related papers: The spatial computer: A model for energy-efficient…

Low-Depth Spatial Tree Algorithms

Contemporary accelerator designs exhibit a high degree of spatial localization, wherein two-dimensional physical distance determines communication costs between processing elements. This situation presents considerable algorithmic…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-09 Yves Baumann , Tal Ben-Nun , Maciej Besta , Lukas Gianinazzi , Torsten Hoefler , Piotr Luczynski

A New Model for Massively Parallel Computation Considering both Communication and IO Cost

In the research area of parallel computation, the communication cost has been extensively studied, while the IO cost has been neglected. For big data computation, the assumption that the data fits in main memory no longer holds, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-25 Hengzhao Ma , Xiangyu Gao , Jianzhong Li , Tianpeng Gao

Efficient On-Chip Communication for Parallel Graph-Analytics on Spatial Architectures

Large-scale graph processing has drawn great attention in recent years. Most of the modern-day datacenter workloads can be represented in the form of Graph Processing such as MapReduce etc. Consequently, a lot of designs for Domain-Specific…

Hardware Architecture · Computer Science 2022-09-07 Khushal Sethi

Improving Performance Models for Irregular Point-to-Point Communication

Parallel applications are often unable to take full advantage of emerging parallel architectures due to scaling limitations, which arise due to inter-process communication. Performance models are used to analyze the sources of communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-07 Amanda Bienz , William D. Gropp , Luke N. Olson

Upper and Lower Bounds on the Cost of a Map-Reduce Computation

In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-06-21 Foto N. Afrati , Anish Das Sarma , Semih Salihoglu , Jeffrey D. Ullman

Energy-Efficient Sensing and Communication of Parallel Gaussian Sources

Energy efficiency is a key requirement in the design of wireless sensor networks. While most theoretical studies only account for the energy requirements of communication, the sensing process, which includes measurements and compression,…

Information Theory · Computer Science 2012-05-23 Xi Liu , Osvaldo Simeone , Elza Erkip

Modeling Data Movement Performance on Heterogeneous Architectures

The cost of data movement on parallel systems varies greatly with machine architecture, job partition, and nearby jobs. Performance models that accurately capture the cost of data movement provide a tool for analysis, allowing for…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-20 Amanda Bienz , Luke N. Olson , William D. Gropp , Shelby Lockhart

When Distributed Computation is Communication Expensive

We consider a number of fundamental statistical and graph problems in the message-passing model, where we have $k$ machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the…

Data Structures and Algorithms · Computer Science 2013-07-29 David P. Woodruff , Qin Zhang

A Parallel Processing Algorithm for Computing Short-Range Particle Forces with Inhomogeneous Particle Distributions

We present a computational algorithm for computing short range forces between particles. The algorithm has two distinguishing features. First, it is optimized for multi-processor computers, and will use as many processors as are available.…

Astrophysics · Physics 2008-02-03 Robert C. Ferrell , Edmund Bertschinger

PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware Computation

Bit-serial Processing-In-Memory (PIM) is an attractive paradigm for accelerator architectures, for parallel workloads such as Deep Learning (DL), because of its capability to achieve massive data parallelism at a low area overhead and…

Hardware Architecture · Computer Science 2023-11-21 Aman Arora , Jian Weng , Siyuan Ma , Tony Nowatzki , Lizy K. John

A local parallel communication algorithm for polydisperse rigid body dynamics

The simulation of large ensembles of particles is usually parallelized by partitioning the domain spatially and using message passing to communicate between the processes handling neighboring subdomains. The particles are represented as…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-03 Sebastian Eibl , Ulrich Rüde

Computation vs. Communication Scaling for Future Transformers on Future Hardware

Scaling neural network models has delivered dramatic quality gains across ML problems. However, this scaling has increased the reliance on efficient distributed training techniques. Accordingly, as with other distributed computing…

Hardware Architecture · Computer Science 2023-05-04 Suchita Pati , Shaizeen Aga , Mahzabeen Islam , Nuwan Jayasena , Matthew D. Sinclair

Comparing spatial networks: A 'one size fits all' efficiency-driven approach

Spatial networks are a powerful framework for studying a large variety of systems belonging to a broad diversity of contexts: from transportation to biology, from epidemiology to communications, and migrations, to cite a few. Spatial…

Physics and Society · Physics 2020-04-08 Ignacio Morer , Alessio Cardillo , Albert Diaz-Guilera , Luce Prignano , Sergi Lozano

Parallel inference for massive distributed spatial data using low-rank models

Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to…

Computation · Statistics 2016-02-08 Matthias Katzfuss , Dorit Hammerling

Emulating a large memory with a collection of small ones

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

Modelling Energy Consumption based on Resource Utilization

Power management is an expensive and important issue for large computational infrastructures such as datacenters, large clusters, and computational grids. However, measuring energy consumption of scalable systems may be impractical due to…

Machine Learning · Computer Science 2017-09-20 Lucas Venezian Povoa , Cesar Marcondes , Hermes Senger

Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication

We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication (SpGEMM), a key computational kernel in scientific computing and data analysis whose performance is often communication bound. This model correctly describes…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-18 Grey Ballard , Alex Druinsky , Nicholas Knight , Oded Schwartz

A Survey of Spatial Memory Representations for Efficient Robot Navigation

As vision-based robots navigate larger environments, their spatial memory grows without bound, eventually exhausting computational resources, particularly on embedded platforms (8-16GB shared memory, $<$30W) where adding hardware is not an…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Ma. Madecheen S. Pangaliman , Steven S. Sison , Erwin P. Quilloy , Rowel Atienza

Energy-Efficient High-Throughput Data Transfers via Dynamic CPU Frequency and Core Scaling

The energy footprint of global data movement has surpassed 100 terawatt hours, costing more than 20 billion US dollars to the world economy. Depending on the number of switches, routers, and hubs between the source and destination nodes,…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-12 Luigi Di Tacchio , Zulkar Nine , Tevfik Kosar , Fatih M. Bulut , Jinho Hwang

Node Aware Sparse Matrix-Vector Multiplication

The sparse matrix-vector multiply (SpMV) operation is a key computational kernel in many simulations and linear solvers. The large communication requirements associated with a reference implementation of a parallel SpMV result in poor…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-16 Amanda Bienz , William D. Gropp , Luke N. Olson