Related papers: Phase-Priority based Directory Coherence for Multi…

Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories

Phase-change memory (PCM) devices have multiple banks to serve memory requests in parallel. Unfortunately, if two requests go to the same bank, they have to be served one after another, leading to lower system performance. We observe that a…

Hardware Architecture · Computer Science 2019-08-22 Shihao Song , Anup Das , Onur Mutlu , Nagarajan Kandasamy

Rainbow: A Composable Coherence Protocol for Multi-Chip Servers

The use of multi-chip modules (MCM) and/or multi-socket boards is the most suitable approach to increase the computation density of servers while keep chip yield attained. This paper introduces a new coherence protocol suitable, in terms of…

Hardware Architecture · Computer Science 2024-05-06 Lucia G. Menezo , Valentin Puente , Jose A. Gregorio

Guidelines for Building Indexes on Partially Cache-Coherent CXL Shared Memory

The \emph{Partial Cache-Coherence (PCC)} model maintains hardware cache coherence only within subsets of cores, enabling large-scale memory sharing with emerging memory interconnect technologies like Compute Express Link (CXL). However,…

Operating Systems · Computer Science 2025-11-11 Fangnuo Wu , Mingkai Dong , Wenjun Cai , Jingsheng Yan , Haibo Chen

CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling

Reducing the average memory access time is crucial for improving the performance of applications running on multi-core architectures. With workload consolidation this becomes increasingly challenging due to shared resource contention.…

Hardware Architecture · Computer Science 2021-02-24 Nadja Ramhöj Holtryd , Madhavan Manivannan , Per Stenström , Miquel Pericàs

A Prudent-Precedence Concurrency Control Protocol for High Data Contention Database Enviornments

In this paper, we propose a concurrency control protocol, called the Prudent-Precedence Concurrency Control (PPCC) protocol, for high data contention database environments. PPCC is prudently more aggressive in permitting more serializable…

Databases · Computer Science 2016-11-18 Weidong Xiong , Feng Yu , Mohammed Hamdi , Wen-Chi Hou

Cache Where you Want! Reconciling Predictability and Coherent Caching

Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-29 Ayoosh Bansal , Jayati Singh , Yifan Hao , Jen-Yang Wen , Renato Mancuso , Marco Caccamo

Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems

In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can generate many parallel memory requests at a time. The processing of these parallel requests in the DRAM controller greatly affects the memory interference delay…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-07-29 Heechul Yun

Load-Balancing for Improving User Responsiveness on Multicore Embedded Systems

Most commercial embedded devices have been deployed with a single processor architecture. The code size and complexity of applications running on embedded devices are rapidly increasing due to the emergence of application business models…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-26 Geunsik Lim , Changwoo Min , YoungIk Eom

Co-Optimizing Cache Partitioning and Multi-Core Task Scheduling: Exploit Cache Sensitivity or Not?

Cache partitioning techniques have been successfully adopted to mitigate interference among concurrently executing real-time tasks on multi-core processors. Considering that the execution time of a cache-sensitive task strongly depends on…

Hardware Architecture · Computer Science 2023-10-05 Binqi Sun , Debayan Roy , Tomasz Kloda , Andrea Bastoni , Rodolfo Pellizzoni , Marco Caccamo

LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures

Processing-in-memory (PIM) architectures have seen an increase in popularity recently, as the high internal bandwidth available within 3D-stacked memory provides greater incentive to move some computation into the logic layer of the memory.…

Hardware Architecture · Computer Science 2017-06-13 Amirali Boroumand , Saugata Ghose , Minesh Patel , Hasan Hassan , Brandon Lucia , Nastaran Hajinazar , Kevin Hsieh , Krishna T. Malladi , Hongzhong Zheng , Onur Mutlu

Analytical models of Energy and Throughput for Caches in MPSoCs

General trends in computer architecture are shifting more towards parallelism. Multicore architectures have proven to be a major step in processor evolution. With the advancement in multicore architecture, researchers are focusing on…

Hardware Architecture · Computer Science 2019-10-22 Arsalan Shahid , Muhammad Tayyab , Muhammad Yasir Qadri , Nadia N. Qadri , Jameel Ahmed

Making Applications Faster by Asynchronous Execution: Slowing Down Processes or Relaxing MPI Collectives

Comprehending the performance bottlenecks at the core of the intricate hardware-software interactions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-06 Ayesha Afzal , Georg Hager , Stefano Markidis , Gerhard Wellein

Analysis and Optimization of I/O Cache Coherency Strategies for SoC-FPGA Device

Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA devices provide tighter integrations between software running on CPUs and hardware accelerators. Modern heterogeneous SoC-FPGA platforms support multiple I/O cache…

Hardware Architecture · Computer Science 2019-08-06 Seung Won Min , Sitao Huang , Mohamed El-Hadedy , Jinjun Xiong , Deming Chen , Wen-mei Hwu

Performance of Cache Memory Subsystems for Multicore Architectures

Advancements in multi-core have created interest among many research groups in finding out ways to harness the true power of processor cores. Recent research suggests that on-board component such as cache memory plays a crucial role in…

Hardware Architecture · Computer Science 2011-11-15 N. Ramasubramanian , Srinivas V. V. , N. Ammasai Gounden

Per-Bank Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems

Modern commercial-off-the-shelf (COTS) multicore processors have advanced memory hierarchies that enhance memory-level parallelism (MLP), which is crucial for high performance. To support high MLP, shared last-level caches (LLCs) are…

Hardware Architecture · Computer Science 2025-07-23 Connor Sullivan , Alex Manley , Mohammad Alian , Heechul Yun

Lightweight ML-based Runtime Prefetcher Selection on Many-core Platforms

Modern computer designs support composite prefetching, where multiple individual prefetcher components are used to target different memory access patterns. However, multiple prefetchers competing for resources can drastically hurt…

Hardware Architecture · Computer Science 2023-07-18 Erika S. Alcorta , Mahesh Madhav , Scott Tetrick , Neeraja J. Yadwadkar , Andreas Gerstlauer

Compositional Memory Systems for Multimedia Communicating Tasks

Conventional cache models are not suited for real-time parallel processing because tasks may flush each other's data out of the cache in an unpredictable manner. In this way the system is not compositional so the overall performance is…

Hardware Architecture · Computer Science 2011-11-09 A. M. Molnos , M. J. M. Heijligers , S. D. Cotofana , J. T. J. Van Eijndhoven

DLS: Directoryless Shared Last-level Cache

Directory-based protocols have been the de facto solution for maintaining cache coherence in shared-memory parallel systems comprising multi/many cores, where each store instruction is eagerly made globally visible by invalidating the…

Hardware Architecture · Computer Science 2012-10-09 Daofu Liu , Yunji Chen , Qi Guo , Tianshi Chen , Ling Li , Qunfeng Dong , Weiwu Hu

Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects

Conventional wisdom holds that an efficient interface between an OS running on a CPU and a high-bandwidth I/O device should use Direct Memory Access (DMA) to offload data transfer, descriptor rings for buffering and queuing, and interrupts…

Hardware Architecture · Computer Science 2025-04-25 Anastasiia Ruzhanskaia , Pengcheng Xu , David Cock , Timothy Roscoe

The Parallel Persistent Memory Model

We consider a parallel computational model that consists of $P$ processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-15 Guy E. Blelloch , Phillip B. Gibbons , Yan Gu , Charles McGuffey , Julian Shun