English
Related papers

Related papers: Parallelism-Aware Memory Interference Delay Analys…

200 papers

Modern commercial-off-the-shelf (COTS) multicore processors have advanced memory hierarchies that enhance memory-level parallelism (MLP), which is crucial for high performance. To support high MLP, shared last-level caches (LLCs) are…

Hardware Architecture · Computer Science 2025-07-23 Connor Sullivan , Alex Manley , Mohammad Alian , Heechul Yun

Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches…

Performance · Computer Science 2016-09-27 Diman Zad Tootaghaj , Farshid Farhat

Memory controller scheduling is crucial in multicore processors, where DRAM bandwidth is shared. Since increased number of requests from multiple cores of processors becomes a source of bottleneck, scheduling the requests efficiently is…

Hardware Architecture · Computer Science 2019-07-19 Eduardo Olmedo Sanchez , Xian-He Sun

Directory-based protocols have been the de facto solution for maintaining cache coherence in shared-memory parallel systems comprising multi/many cores, where each store instruction is eagerly made globally visible by invalidating the…

Hardware Architecture · Computer Science 2012-10-09 Daofu Liu , Yunji Chen , Qi Guo , Tianshi Chen , Ling Li , Qunfeng Dong , Weiwu Hu

The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-25 Juliana M. N. Silva , Cristina Boeres , Lúcia M. A. Drummond , Artur A. Pessoa

Memory interference may heavily inflate task execution times in Heterogeneous Systems-on-Chips (HeSoCs). Knowing worst-case interference is consequently fundamental for supporting the correct execution of time-sensitive applications. In…

Performance · Computer Science 2023-09-25 Lorenzo Carletti , Gianluca Brilli , Alessandro Capotondi , Paolo Valente , Andrea Marongiu

Multi-threaded applications are capable of exploiting the full potential of many-core systems. However, Network-on-Chip (NoC) based inter-core communication in many-core systems is responsible for 60-75% of the miss latency experienced by…

Hardware Architecture · Computer Science 2021-01-05 Abhijit Das , John Jose , Prabhat Mishra

WCET (Worst-Case Execution Time) estimation on multicore architecture is particularly challenging mainly due to the complex accesses over cache shared by multiple cores. Existing analysis identifies possible contentions between parallel…

Typically, a memory request from a processor may need to go through many intermediate interconnect routers, directory node, owner node, etc before it is finally serviced. Current multiprocessors do not give preference to any particular…

Hardware Architecture · Computer Science 2016-06-21 Sandeep Navada , Anil Krishna

Current day processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). An efficient cache replacement policy at LLC is essential for reducing the off-chip memory transfer as…

Hardware Architecture · Computer Science 2013-07-25 Bijay Paikaray

Real-time operating systems employ spatial and temporal isolation to guarantee predictability and schedulability of real-time systems on multi-core processors. Any unbounded and uncontrolled cross-core performance interference poses a…

Operating Systems · Computer Science 2024-12-25 Zhaomeng Deng , Ziqi Zhang , Ding Li , Yao Guo , Yunfeng Ye , Yuxin Ren , Ning Jia , Xinwei Hu

Predictable execution time upon accessing shared memories in multi-core real-time systems is a stringent requirement. A plethora of existing works focus on the analysis of Double Data Rate Dynamic Random Access Memories (DDR DRAMs), or…

Hardware Architecture · Computer Science 2018-10-17 Mohamed Hassan

Modern multicore system-on-chips (SoCs) share off-chip DRAM across cores, where bank-level interference can significantly degrade performance and threaten real-time guarantees. While prior work has focused on per-core bandwidth regulation,…

Hardware Architecture · Computer Science 2026-03-30 Connor Rudy Sullivan , Amin Mamandipoor , Cole Ridge Strickler , Heechul Yun

Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-29 Ayoosh Bansal , Jayati Singh , Yifan Hao , Jen-Yang Wen , Renato Mancuso , Marco Caccamo

The significant resource demands in LLM serving prompts production clusters to fully utilize heterogeneous hardware by partitioning LLM models across a mix of high-end and low-end GPUs. However, existing parallelization approaches often…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-11 Zizhao Mo , Jianxiong Liao , Huanle Xu , Zhi Zhou , Chengzhong Xu

Coflow provides a key application-layer abstraction for capturing communication patterns, enabling the efficient coordination of parallel data flows to reduce job completion times in distributed systems. Modern data center networks (DCNs)…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-10 Xin Wang , Hong Shen , Hui Tian , Dong Wang

Cyber-physical systems (CPS) integrate sensing, computing, communication and actuation capabilities to monitor and control operations in the physical environment. A key requirement of such systems is the need to provide predictable…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-07-29 Hyoseung Kim

Major chip manufacturers have all introduced Multithreaded processors. These processors are used for running a variety of workloads. Efficient resource utilization is an important design aspect in such processors. Particularly, it is…

Performance · Computer Science 2019-08-13 Murthy Durbhakula

The increasing number of threads inside the cores of a multicore processor, and competitive access to the shared cache memory, become the main reasons for an increased number of competitive cache misses and performance decline. Inevitably,…

Hardware Architecture · Computer Science 2017-01-09 Milcho Prisagjanec , Pece Mitrevski

Many computer systems for calculating the proper organization of memory are among the most critical issues. Using a tier cache memory (along with branching prediction) is an effective means of increasing modern multi-core processors'…

Networking and Internet Architecture · Computer Science 2021-05-21 Mohamed A. Hamada , Abdelrahman Abdallah
‹ Prev 1 2 3 10 Next ›