Related papers: GCS: Generalized Cache Coherence For Efficient Syn…

Cache Coherence Over Disaggregated Memory

Disaggregating memory from compute offers the opportunity to better utilize stranded memory in cloud data centers. It is important to cache data in the compute nodes and maintain cache coherence across multiple compute nodes. However, the…

Databases · Computer Science 2026-01-14 Ruihong Wang , Jianguo Wang , Walid G. Aref

Cache Where you Want! Reconciling Predictability and Coherent Caching

Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-29 Ayoosh Bansal , Jayati Singh , Yifan Hao , Jen-Yang Wen , Renato Mancuso , Marco Caccamo

The Dawn of Disaggregation and the Coherence Conundrum: A Call for Federated Coherence

Disaggregated memory is an upcoming data center technology that will allow nodes (servers) to share data efficiently. Sharing data creates a debate on the level of cache coherence the system should provide. While current proposals aim to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-24 Jaewan Hong , Marcos K. Aguilera , Emmanuel Amaro , Vincent Liu , Aurojit Panda , Ion Stoica

DiFache: Efficient and Scalable Caching on Disaggregated Memory using Decentralized Coherence

The disaggregated memory (DM) architecture offers high resource elasticity at the cost of data access performance. While caching frequently accessed data in compute nodes (CNs) reduces access overhead, it requires costly centralized…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-26 Hanze Zhang , Kaiming Wang , Rong Chen , Xingda Wei , Haibo Chen

Global-Local View: Scalable Consistency for Concurrent Data Types

Concurrent linearizable access to shared objects can be prohibitively expensive in a high contention workload. Many applications apply ad-hoc techniques to eliminate the need of synchronous atomic updates, which may result in…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-11 Deepthi Devaki Akkoorath , José Brandão , Annette Bieniusa , Carlos Baquero

Tardis 2.0: Optimized Time Traveling Coherence for Relaxed Consistency Models

Cache coherence scalability is a big challenge in shared memory systems. Traditional protocols do not scale due to the storage and traffic overhead of cache invalidation. Tardis, a recently proposed coherence protocol, removes cache…

Hardware Architecture · Computer Science 2016-07-28 Xiangyao Yu , Hongzhe Liu , Ethan Zou , Srinivas Devadas

Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms

Communication-efficient SGD algorithms, which allow nodes to perform local updates and periodically synchronize local models, are highly effective in improving the speed and scalability of distributed SGD. However, a rigorous convergence…

Machine Learning · Computer Science 2019-01-28 Jianyu Wang , Gauri Joshi

DLS: Directoryless Shared Last-level Cache

Directory-based protocols have been the de facto solution for maintaining cache coherence in shared-memory parallel systems comprising multi/many cores, where each store instruction is eagerly made globally visible by invalidating the…

Hardware Architecture · Computer Science 2012-10-09 Daofu Liu , Yunji Chen , Qi Guo , Tianshi Chen , Ling Li , Qunfeng Dong , Weiwu Hu

Design and Evaluation of a Rack-Scale Disaggregated Memory Architecture For Data Centers

Memory disaggregation is being considered as a strong alternative to traditional architecture to deal with the memory under-utilization in data centers. Disaggregated memory can adapt to dynamically changing memory requirements for the data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-11 Amit Puri , John Jose , Tamarapalli Venkatesh

A Resolution for Shared Memory Conflict in Multiprocessor System-on-a-Chip

Now days, manufacturers are focusing on increasing the concurrency in multiprocessor system-on-a-chip (MPSoC) architecture instead of increasing clock speed, for embedded systems. Traditionally lock-based synchronization is provided to…

Hardware Architecture · Computer Science 2012-02-06 Shaily Mittal , Nitin

Gradient Clock Synchronization with Practically Constant Local Skew

Gradient Clock Synchronization (GCS) is the task of minimizing the \emph{local skew,} i.e., the clock offset between neighboring clocks, in a larger network. While asymptotically optimal bounds are known, from a practical perspective they…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-13 Christoph Lenzen

Scalable Graph Convolutional Network Training on Distributed-Memory Systems

Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. The large data sizes of graphs and their vertex features make scalable training algorithms and distributed memory systems necessary. Since the…

Machine Learning · Computer Science 2022-12-14 Gunduz Vehbi Demirci , Aparajita Haldar , Hakan Ferhatosmanoglu

An Effective Early Multi-core System Shared Cache Design Method Based on Reuse-distance Analysis

In this paper, we proposed an effective and efficient multi-core shared-cache design optimization approach based on reuse-distance analysis of the data traces of target applications. Since data traces are independent of system hardware…

Performance · Computer Science 2021-09-13 Hsin-Yu Ho , Ren-Song Tsay

Generalized Spatially-Coupled Parallel Concatenated Codes With Partial Repetition

A new class of spatially-coupled turbo-like codes (SC-TCs), dubbed generalized spatially coupled parallel concatenated codes (GSC-PCCs), is introduced. These codes are constructed by applying spatial coupling on parallel concatenated codes…

Information Theory · Computer Science 2022-02-25 Min Qiu , Xiaowei Wu , Jinhong Yuan , Alexandre Graell i Amat

Efficient Distributed Data Structures for Future Many-core Architectures

We study general techniques for implementing distributed data structures on top of future many-core architectures with non cache-coherent or partially cache-coherent memory. With the goal of contributing towards what might become, in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-09 Panagiota Fatourou , Nikolaos D. Kallimanis , Eleni Kanellou , Odysseas Makridakis , Christi Symeonidou

PULSE: Accelerating Distributed Pointer-Traversals on Disaggregated Memory (Extended Version)

Caches at CPU nodes in disaggregated memory architectures amortize the high data access latency over the network. However, such caches are fundamentally unable to improve performance for workloads requiring pointer traversals across linked…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-17 Yupeng Tang , Seung-seob Lee , Abhishek Bhattacharjee , Anurag Khandelwal

Regional Consistency: Programmability and Performance for Non-Cache-Coherent Systems

Parallel programmers face the often irreconcilable goals of programmability and performance. HPC systems use distributed memory for scalability, thereby sacrificing the programmability advantages of shared memory programming models.…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-01-21 Bharath Ramesh , Calvin J. Ribbens , Srinidhi Varadarajan

Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters

The growing scale of data requires efficient memory subsystems with large memory capacity and high memory performance. Disaggregated architecture has become a promising solution for today's cloud and edge computing for its scalability and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-28 Jing Wang , Chao Li , Taolei Wang , Jinyang Guo , Hanzhang Yang , Yiming Zhuansun , Minyi Guo

Accurate, Efficient and Scalable Graph Embedding

The Graph Convolutional Network (GCN) model and its variants are powerful graph embedding tools for facilitating classification and clustering on graphs. However, a major challenge is to reduce the complexity of layered GCNs and make them…

Machine Learning · Computer Science 2020-08-06 Hanqing Zeng , Hongkuan Zhou , Ajitesh Srivastava , Rajgopal Kannan , Viktor Prasanna

Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters

Bandwidth-starved multicore chips have become ubiquitous. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-06-17 Markus Wittmann , Georg Hager , Jan Treibig , Gerhard Wellein