English
Related papers

Related papers: Rethinking Inter-Process Communication with Memory…

200 papers

The increasing demand for artificial intelligence (AI) workloads across diverse computing environments has driven the need for more efficient data management strategies. Traditional cloud-based architectures struggle to handle the sheer…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-03 Alex Barceló , Sebastián A. Cajas Ordoñez , Jaydeep Samanta , Andrés L. Suárez-Cetrulo , Romila Ghosh , Ricardo Simón Carbajo , Anna Queralt

Heterogeneous multi-core architectures combine on a single chip a few large, general-purpose host cores, optimized for single-thread performance, with (many) clusters of small, specialized, energy-efficient accelerator cores for…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-12 Luca Colagrande , Luca Benini

Multi-core architectures feature an intricate hierarchy of cache memories, with multiple levels and sizes. To adequately decompose an application according to the traits of a particular memory hierarchy is a cumbersome task that may be…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-20 Hervé Paulino , Nuno Delgado

With the ever-growing need of data in HPC applications, the congestion at the I/O level becomes critical in super-computers. Architectural enhancement such as burst-buffers and pre-fetching are added to machines, but are not sufficient to…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-23 Guillaume Aupy , Ana Gainaru , Valentin Le Fèvre

Heterogeneous multi-core architectures combine a few "host" cores, optimized for single-thread performance, with many small energy-efficient "accelerator" cores for data-parallel processing, on a single chip. Offloading a computation to the…

Hardware Architecture · Computer Science 2025-11-11 Luca Colagrande , Luca Benini

The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun

Modern heterogeneous supercomputing systems are comprised of CPUs, GPUs, and high-speed network interconnects. Communication libraries supporting efficient data transfers involving memory buffers from the GPU memory typically require the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-29 Naveen Namashivayam , Krishna Kandalla , James B White , Larry Kaplan , Mark Pagel

In this work, we consider the integration of MPI one-sided communication and non-blocking I/O in HPC-centric MapReduce frameworks. Using a decoupled strategy, we aim to overlap the Map and Reduce phases of the algorithm by allowing…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-10 Sergio Rivas-Gomez , Sai Narasimhamurthy , Keeran Brabazon , Oliver Perks , Erwin Laure , Stefano Markidis

The exponential growth of data traffic and the increasing complexity of networked applications demand effective solutions capable of passively inspecting and analysing the network traffic for monitoring and security purposes. Implementing…

Networking and Internet Architecture · Computer Science 2024-07-24 Luca Deri , Alfredo Cardigliano , Francesco Fusco

Memory latencies and bandwidth are major factors, limiting system performance and scalability. Modern CPUs aim at hiding latencies by employing large caches, out-of-order execution, or complex hardware prefetchers. However, software-based…

Databases · Computer Science 2025-06-23 Arthur Bernhardt , Sajjad Tamimi , Florian Stock , Andreas Koch , Ilia Petrov

The convergence of IoT, Edge, Cloud, and HPC technologies creates a compute continuum that merges cloud scalability and flexibility with HPC's computational power and specialized optimizations. However, integrating cloud and HPC resources…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-20 Aasish Kumar Sharma , Christian Boehme , Patrick Gelß , Ramin Yahyapour , Julian Kunkel

Interprocess communication, IPC, is one of the most fundamental functions of a modern operating system, playing an essential role in the fabric of contemporary applications. This report conducts an investigation in FreeBSD of the real world…

Operating Systems · Computer Science 2020-08-06 A. H. Bell-Thomas

Reducing the average memory access time is crucial for improving the performance of applications running on multi-core architectures. With workload consolidation this becomes increasingly challenging due to shared resource contention.…

Hardware Architecture · Computer Science 2021-02-24 Nadja Ramhöj Holtryd , Madhavan Manivannan , Per Stenström , Miquel Pericàs

Transformers and LLMs have seen rapid adoption in all domains. Their sizes have exploded to hundreds of billions of parameters and keep increasing. Under these circumstances, the training of transformers is slow and often takes in the order…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-18 Avinash Maurya , Jie Ye , M. Mustafa Rafique , Franck Cappello , Bogdan Nicolae

Many HPC applications suffer from a bottleneck in the shared caches, instruction execution units, I/O or memory bandwidth, even though the remaining resources may be underutilized. It is hard for developers and runtime systems to ensure…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-17 Felippe V. Zacarias , Vinicius Petrucci , Rajiv Nishtala , Paul Carpenter , Daniel Mossé

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…

Hardware Architecture · Computer Science 2023-04-04 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

Memory disaggregation addresses memory imbalance in a cluster by decoupling CPU and memory allocations of applications while also increasing the effective memory capacity for (memory-intensive) applications beyond the local memory limit…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-07 Anil Yelam

SmartNICs have been increasingly utilized across various applications to offload specific computational tasks, thereby enhancing overall system performance. However, this offloading process introduces several communication challenges that…

Networking and Internet Architecture · Computer Science 2025-07-08 Mohammed Zain Farooqi , Masoud Hemmatpour , Tore Heide Larsen

Conventional cache models are not suited for real-time parallel processing because tasks may flush each other's data out of the cache in an unpredictable manner. In this way the system is not compositional so the overall performance is…

Hardware Architecture · Computer Science 2011-11-09 A. M. Molnos , M. J. M. Heijligers , S. D. Cotofana , J. T. J. Van Eijndhoven

Cloud mobile computing enables the offloading of computation-intensive applications from a mobile device to a cloud processor via a wireless interface. In light of the strong interplay between offloading decisions at the application layer…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-29 Shahrouz Khalili , Osvaldo Simeone
‹ Prev 1 2 3 10 Next ›