English
Related papers

Related papers: Streaming Message Interface: High-Performance Dist…

200 papers

Serverless functions provide elastic scaling and a fine-grained billing model, making Function-as-a-Service (FaaS) an attractive programming model. However, for distributed jobs that benefit from large-scale and dynamic parallelism, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-16 Marcin Copik , Roman Böhringer , Alexandru Calotoiu , Torsten Hoefler

Modern heterogeneous supercomputing systems are comprised of CPUs, GPUs, and high-speed network interconnects. Communication libraries supporting efficient data transfers involving memory buffers from the GPU memory typically require the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-29 Naveen Namashivayam , Krishna Kandalla , James B White , Larry Kaplan , Mark Pagel

For several decades, the CPU has been the standard model to use in the majority of computing. While the CPU does excel in some areas, heterogeneous computing, such as reconfigurable hardware, is showing increasing potential in areas like…

Hardware Architecture · Computer Science 2021-04-21 Carl-Johannes Johnsen , Alberte Thegler , Kenneth Skovhede , Brian Vinter

Optimizing communication performance is imperative for large-scale computing because communication overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized for data…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-20 Torsten Hoefler , Salvatore Di Girolamo , Konstantin Taranov , Ryan E. Grant , Ron Brightwell

The current trend of multicore architectures on shared memory systems underscores the need of parallelism. While there are some programming model to express parallelism, thread programming model has become a standard to support these system…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-12-13 D. T. Hasta , A. B. Mutiara

Memory system is often the main bottleneck in chipmultiprocessor (CMP) systems in terms of latency, bandwidth and efficiency, and recently additionally facing capacity and power problems in an era of big data. A lot of research works have…

Hardware Architecture · Computer Science 2014-04-10 Licheng Chen , Tianyue Lu , Yanan Wang , Mingyu Chen , Yuan Ruan , Zehan Cui , Yongbing Huang , Mingyang Chen , Jiutian Zhang , Yungang Bao

PetscSF, the communication component of the Portable, Extensible Toolkit for Scientific Computation (PETSc), is designed to provide PETSc's communication infrastructure suitable for exascale computers that utilize GPUs and other…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-25 Junchao Zhang , Jed Brown , Satish Balay , Jacob Faibussowitsch , Matthew Knepley , Oana Marin , Richard Tran Mills , Todd Munson , Barry F. Smith , Stefano Zampini

MLI is an Application Programming Interface designed to address the challenges of building Machine Learn- ing algorithms in a distributed setting based on data-centric computing. Its primary goal is to simplify the development of…

The Message Passing Interface (MPI) is the most commonly used application programming interface for process communication on current large-scale parallel systems. Due to the scale and complexity of modern parallel architectures, it is…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-09-05 Sascha Hunold , Alexandra Carpen-Amarie , Felix Donatus Lübbe , Jesper Larsson Träff

Distributed shared memory (DSM) allows to implement and deploy applications onto distributed architectures using the convenient shared memory programming model in which a set of tasks are able to allocate and access data despite their…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-04 Loïc Cudennec

Application development for distributed computing "Grids" can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues,…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 N. T. Karonis , B. Toonen , I. Foster

As more and more devices connect to Internet of Things, unbounded streams of data will be generated, which have to be processed "on the fly" in order to trigger automated actions and deliver real-time services. Spark Streaming is a popular…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-12 Jia-Chun Lin , Ming-Chang Lee , Ingrid Chieh Yu , Einar Broch Johnsen

One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip…

This paper introduces an effort to incorporate reconfigurable logic (FPGA) components into a software programming model. For this purpose, we have implemented a hardware engine for remote memory communication between hardware computation…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-22 Ruediger Willenberg , Paul Chow

This paper presents a comprehensive comparison of three dominant parallel programming models in High Performance Computing (HPC): Message Passing Interface (MPI), Open Multi-Processing (OpenMP), and Compute Unified Device Architecture…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-19 Nizar ALHafez , Ahmad Kurdi

Message aggregation is often used with a goal to reduce communication cost in HPC applications. The difference in the order of overhead of sending a message and cost of per byte transferred motivates the need for message aggregation, for…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-07 Kavitha Chandrasekar , Laxmikant Kale

Conventional wisdom holds that an efficient interface between an OS running on a CPU and a high-bandwidth I/O device should use Direct Memory Access (DMA) to offload data transfer, descriptor rings for buffering and queuing, and interrupts…

Hardware Architecture · Computer Science 2025-04-25 Anastasiia Ruzhanskaia , Pengcheng Xu , David Cock , Timothy Roscoe

The Message Passing Interface (MPI) has been extremely successful as a portable way to program high-performance parallel computers. This success has occurred in spite of the view of many that message passing is difficult and that other…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 William D. Gropp

In this paper, we introduce a new user-level DSM system which has the ability to directly interact with underlying interconnection networks. The DSM system provides the application programmer a flexible API to program parallel applications…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Bharath Ramesh , Srinidhi Varadarajan

Data-intensive applications like distributed AI-training may require multi-terabytes memory capacity with multi-terabits bandwidth. We directly attach the memory to the ethernet controller with some programable logic to design an efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-29 Kevin Fang , David Peng
‹ Prev 1 2 3 10 Next ›