Related papers: 3D Cache Hierarchy Optimization
Power consumption, off-chip memory bandwidth, chip area and Network on Chip (NoC) capacity are among main chip resources limiting the scalability of Chip Multiprocessors (CMP). A closed form analytical solution for optimizing the CMP cache…
With technology scaling, the size of cache systems in chip-multiprocessors (CMPs) has been dramatically increased to efficiently store and manipulate a large amount of data in future applications and decrease the gap between cores and…
In multithreaded applications with high degree of data sharing, the miss rate of private cache is shown to exhibit a compulsory miss component. It manifests because at least some of the shared data originates from other cores and can only…
Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We…
The increasing density of transistors in Integrated Circuits (ICs) has enabled the development of highly integrated Systems-on-Chip (SoCs) and, more recently, Multiprocessor Systems-on-Chip (MPSoCs). To address scalability challenges in…
With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the much-anticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature…
Caching at mobile devices and leveraging cooperative device-to-device (D2D) communications are two promising approaches to support massive content delivery over wireless networks while mitigating the effects of interference. To show the…
This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong…
The task of 3D ICs layout design involves the assembly of millions of components taking into account many different requirements and constraints such as topological, wiring or manufacturability ones. It is a NP-hard problem that requires…
For a system-level design of Networks-on-Chip for 3D heterogeneous System-on-Chip (SoC), the locations of components, routers and vertical links are determined from an application model and technology parameters. In conventional methods,…
Major chip manufacturers have all introduced multicore microprocessors. Multi-socket systems built from these processors are used for running various server applications. Depending on the application, remote cache-to-cache transfers can…
The in-memory cache system is an important component in a cloud for the data access performance. As the tenants may have different performance goals for data access depending on the nature of their tasks, effectively managing the memory…
Major chip manufacturers have all introduced multicore microprocessors. Multi-socket systems built from these processors are used for running various server applications. Depending on the application that is run on the system, remote memory…
Current embedded systems are specifically designed to run multimedia applications. These applications have a big impact on both performance and energy consumption. Both metrics can be optimized selecting the best cache configuration for a…
Most previous 3D IC research focused on stacking traditional 2D silicon layers, so the interconnect reduction is limited to inter-block delays. In this paper, we propose techniques that enable efficient exploration of the 3D design space…
Many computer systems for calculating the proper organization of memory are among the most critical issues. Using a tier cache memory (along with branching prediction) is an effective means of increasing modern multi-core processors'…
Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of…
Three-dimensional integrated circuits promise power, performance, and footprint gains compared to their 2D counterparts, thanks to drastic reductions in the interconnects' length through their smaller form factor. We can leverage the…
The increasing number of threads inside the cores of a multicore processor, and competitive access to the shared cache memory, become the main reasons for an increased number of competitive cache misses and performance decline. Inevitably,…
Cache partitioning techniques have been successfully adopted to mitigate interference among concurrently executing real-time tasks on multi-core processors. Considering that the execution time of a cache-sensitive task strongly depends on…