Related papers: Parallelism-Aware Memory Interference Delay Analys…

Per-Bank Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems

Modern commercial-off-the-shelf (COTS) multicore processors have advanced memory hierarchies that enhance memory-level parallelism (MLP), which is crucial for high performance. To support high MLP, shared last-level caches (LLCs) are…

Hardware Architecture · Computer Science 2025-07-23 Connor Sullivan , Alex Manley , Mohammad Alian , Heechul Yun

Optimal Placement of Cores, Caches and Memory Controllers in Network On-Chip

Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches…

Performance · Computer Science 2016-09-27 Diman Zad Tootaghaj , Farshid Farhat

CADS: Core-Aware Dynamic Scheduler for Multicore Memory Controllers

Memory controller scheduling is crucial in multicore processors, where DRAM bandwidth is shared. Since increased number of requests from multiple cores of processors becomes a source of bottleneck, scheduling the requests efficiently is…

Hardware Architecture · Computer Science 2019-07-19 Eduardo Olmedo Sanchez , Xian-He Sun

DLS: Directoryless Shared Last-level Cache

Directory-based protocols have been the de facto solution for maintaining cache coherence in shared-memory parallel systems comprising multi/many cores, where each store instruction is eagerly made globally visible by invalidating the…

Hardware Architecture · Computer Science 2012-10-09 Daofu Liu , Yunji Chen , Qi Guo , Tianshi Chen , Ling Li , Qunfeng Dong , Weiwu Hu

Memory Aware Load Balance Strategy on a Parallel Branch-and-Bound Application

The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-25 Juliana M. N. Silva , Cristina Boeres , Lúcia M. A. Drummond , Artur A. Pessoa

The Importance of Worst-Case Memory Contention Analysis for Heterogeneous SoCs

Memory interference may heavily inflate task execution times in Heterogeneous Systems-on-Chips (HeSoCs). Knowing worst-case interference is consequently fundamental for supporting the correct execution of time-sensitive applications. In…

Performance · Computer Science 2023-09-25 Lorenzo Carletti , Gianluca Brilli , Alessandro Capotondi , Paolo Valente , Andrea Marongiu

Data Criticality in Multi-Threaded Applications: An Insight for Many-Core Systems

Multi-threaded applications are capable of exploiting the full potential of many-core systems. However, Network-on-Chip (NoC) based inter-core communication in many-core systems is responsible for 60-75% of the miss latency experienced by…

Hardware Architecture · Computer Science 2021-01-05 Abhijit Das , John Jose , Prabhat Mishra

Tight Cache Contention Analysis for WCET Estimation on Multicore Systems

WCET (Worst-Case Execution Time) estimation on multicore architecture is particularly challenging mainly due to the complex accesses over cache shared by multiple cores. Existing analysis identifies possible contentions between parallel…

Software Engineering · Computer Science 2025-09-09 Shuai Zhao , Jieyu Jiang , Shenlin Cai , Yaowei Liang , Chen Jie , Yinjie Fang , Wei Zhang , Guoquan Zhang , Yaoyao Gu , Xiang Xiao , Wei Qin , Xiangzhen Ouyang , Wanli Chang

Criticality Aware Multiprocessors

Typically, a memory request from a processor may need to go through many intermediate interconnect routers, directory node, owner node, etc before it is finally serviced. Current multiprocessors do not give preference to any particular…

Hardware Architecture · Computer Science 2016-06-21 Sandeep Navada , Anil Krishna

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review

Current day processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). An efficient cache replacement policy at LLC is essential for reducing the off-chip memory transfer as…

Hardware Architecture · Computer Science 2013-07-25 Bijay Paikaray

Interference-free Operating System: A 6 Years' Experience in Mitigating Cross-Core Interference in Linux

Real-time operating systems employ spatial and temporal isolation to guarantee predictability and schedulability of real-time systems on multi-core processors. Any unbounded and uncontrolled cross-core performance interference poses a…

Operating Systems · Computer Science 2024-12-25 Zhaomeng Deng , Ziqi Zhang , Ding Li , Yao Guo , Yunfeng Ye , Yuxin Ren , Ning Jia , Xinwei Hu

On the Off-chip Memory Latency of Real-Time Systems: Is DDR DRAM Really the Best Option?

Predictable execution time upon accessing shared memories in multi-core real-time systems is a stringent requirement. A plethora of existing works focus on the analysis of Double Data Rate Dynamic Random Access Memories (DDR DRAMs), or…

Hardware Architecture · Computer Science 2018-10-17 Mohamed Hassan

Per-Bank Memory Bandwidth Regulation for Predictable and Performant Real-Time System

Modern multicore system-on-chips (SoCs) share off-chip DRAM across cores, where bank-level interference can significantly degrade performance and threaten real-time guarantees. While prior work has focused on per-core bandwidth regulation,…

Hardware Architecture · Computer Science 2026-03-30 Connor Rudy Sullivan , Amin Mamandipoor , Cole Ridge Strickler , Heechul Yun

Cache Where you Want! Reconciling Predictability and Coherent Caching

Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-29 Ayoosh Bansal , Jayati Singh , Yifan Hao , Jen-Yang Wen , Renato Mancuso , Marco Caccamo

Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism

The significant resource demands in LLM serving prompts production clusters to fully utilize heterogeneous hardware by partitioning LLM models across a mix of high-end and low-end GPUs. However, existing parallelization approaches often…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-11 Zizhao Mo , Jianxiong Liao , Huanle Xu , Zhi Zhou , Chengzhong Xu

Scheduling Coflows in Multi-Core OCS Networks with Performance Guarantee

Coflow provides a key application-layer abstraction for capturing communication patterns, enabling the efficient coordination of parallel data flows to reduce job completion times in distributed systems. Modern data center networks (DCNs)…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-10 Xin Wang , Hong Shen , Hui Tian , Dong Wang

Towards Predictable Real-Time Performance on Multi-Core Platforms

Cyber-physical systems (CPS) integrate sensing, computing, communication and actuation capabilities to monitor and control operations in the physical environment. A key requirement of such systems is the need to provide predictable…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-07-29 Hyoseung Kim

MLP Aware Scheduling Techniques in Multithreaded Processors

Major chip manufacturers have all introduced Multithreaded processors. These processors are used for running a variety of workloads. Efficient resource utilization is an important design aspect in such processors. Particularly, it is…

Performance · Computer Science 2019-08-13 Murthy Durbhakula

Reducing Competitive Cache Misses in Modern Processor Architectures

The increasing number of threads inside the cores of a multicore processor, and competitive access to the shared cache memory, become the main reasons for an increased number of competitive cache misses and performance decline. Inevitably,…

Hardware Architecture · Computer Science 2017-01-09 Milcho Prisagjanec , Pece Mitrevski

Estimate The Efficiency Of Multiprocessor's Cash Memory Work Algorithms

Many computer systems for calculating the proper organization of memory are among the most critical issues. Using a tier cache memory (along with branching prediction) is an effective means of increasing modern multi-core processors'…

Networking and Internet Architecture · Computer Science 2021-05-21 Mohamed A. Hamada , Abdelrahman Abdallah