Related papers: Parallelism-Aware Memory Interference Delay Analys…
Modern commercial-off-the-shelf (COTS) multicore processors have advanced memory hierarchies that enhance memory-level parallelism (MLP), which is crucial for high performance. To support high MLP, shared last-level caches (LLCs) are…
Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches…
Memory controller scheduling is crucial in multicore processors, where DRAM bandwidth is shared. Since increased number of requests from multiple cores of processors becomes a source of bottleneck, scheduling the requests efficiently is…
Directory-based protocols have been the de facto solution for maintaining cache coherence in shared-memory parallel systems comprising multi/many cores, where each store instruction is eagerly made globally visible by invalidating the…
The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the…
Memory interference may heavily inflate task execution times in Heterogeneous Systems-on-Chips (HeSoCs). Knowing worst-case interference is consequently fundamental for supporting the correct execution of time-sensitive applications. In…
Multi-threaded applications are capable of exploiting the full potential of many-core systems. However, Network-on-Chip (NoC) based inter-core communication in many-core systems is responsible for 60-75% of the miss latency experienced by…
WCET (Worst-Case Execution Time) estimation on multicore architecture is particularly challenging mainly due to the complex accesses over cache shared by multiple cores. Existing analysis identifies possible contentions between parallel…
Typically, a memory request from a processor may need to go through many intermediate interconnect routers, directory node, owner node, etc before it is finally serviced. Current multiprocessors do not give preference to any particular…
Current day processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). An efficient cache replacement policy at LLC is essential for reducing the off-chip memory transfer as…
Real-time operating systems employ spatial and temporal isolation to guarantee predictability and schedulability of real-time systems on multi-core processors. Any unbounded and uncontrolled cross-core performance interference poses a…
Predictable execution time upon accessing shared memories in multi-core real-time systems is a stringent requirement. A plethora of existing works focus on the analysis of Double Data Rate Dynamic Random Access Memories (DDR DRAMs), or…
Modern multicore system-on-chips (SoCs) share off-chip DRAM across cores, where bank-level interference can significantly degrade performance and threaten real-time guarantees. While prior work has focused on per-core bandwidth regulation,…
Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of…
The significant resource demands in LLM serving prompts production clusters to fully utilize heterogeneous hardware by partitioning LLM models across a mix of high-end and low-end GPUs. However, existing parallelization approaches often…
Coflow provides a key application-layer abstraction for capturing communication patterns, enabling the efficient coordination of parallel data flows to reduce job completion times in distributed systems. Modern data center networks (DCNs)…
Cyber-physical systems (CPS) integrate sensing, computing, communication and actuation capabilities to monitor and control operations in the physical environment. A key requirement of such systems is the need to provide predictable…
Major chip manufacturers have all introduced Multithreaded processors. These processors are used for running a variety of workloads. Efficient resource utilization is an important design aspect in such processors. Particularly, it is…
The increasing number of threads inside the cores of a multicore processor, and competitive access to the shared cache memory, become the main reasons for an increased number of competitive cache misses and performance decline. Inevitably,…
Many computer systems for calculating the proper organization of memory are among the most critical issues. Using a tier cache memory (along with branching prediction) is an effective means of increasing modern multi-core processors'…