English
Related papers

Related papers: DLS: Directoryless Shared Last-level Cache

200 papers

Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches…

Performance · Computer Science 2016-09-27 Diman Zad Tootaghaj , Farshid Farhat

In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can generate many parallel memory requests at a time. The processing of these parallel requests in the DRAM controller greatly affects the memory interference delay…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-07-29 Heechul Yun

Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding but suffer from prohibitive quadratic computational complexity and memory overhead during inference. Current caching techniques accelerate…

Computation and Language · Computer Science 2025-11-06 Yuerong Song , Xiaoran Liu , Ruixiao Li , Zhigeng Liu , Zengfeng Huang , Qipeng Guo , Ziwei He , Xipeng Qiu

The disaggregated memory (DM) architecture offers high resource elasticity at the cost of data access performance. While caching frequently accessed data in compute nodes (CNs) reduces access overhead, it requires costly centralized…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-26 Hanze Zhang , Kaiming Wang , Rong Chen , Xingda Wei , Haibo Chen

Modern multicore processors are employing large last-level caches, for example Intel's E7-8800 processor uses 24MB L3 cache. Further, with each CMOS technology generation, leakage energy has been dramatically increasing and hence, leakage…

Hardware Architecture · Computer Science 2013-10-17 Sparsh Mittal

This extended report presents DDS, a novel disaggregated storage architecture enabled by emerging networking hardware, namely DPUs (Data Processing Units). DPUs can optimize the latency and CPU consumption of disaggregated storage servers.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-29 Qizhen Zhang , Philip Bernstein , Badrish Chandramouli , Jiasheng Hu , Yiming Zheng

Disaggregating memory from compute offers the opportunity to better utilize stranded memory in cloud data centers. It is important to cache data in the compute nodes and maintain cache coherence across multiple compute nodes. However, the…

Databases · Computer Science 2026-01-14 Ruihong Wang , Jianguo Wang , Walid G. Aref

The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and…

Hardware Architecture · Computer Science 2025-12-09 Zhongchun Zhou , Chengtao Lai , Yuhang Gu , Wei Zhang

The persistence diagram, which describes the topological features of a dataset, is a key descriptor in Topological Data Analysis. The "Discrete Morse Sandwich" (DMS) method has been reported to be the most efficient algorithm for computing…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Eve Le Guillou , Pierre Fortin , Julien Tierny

Modern commercial-off-the-shelf (COTS) multicore processors have advanced memory hierarchies that enhance memory-level parallelism (MLP), which is crucial for high performance. To support high MLP, shared last-level caches (LLCs) are…

Hardware Architecture · Computer Science 2025-07-23 Connor Sullivan , Alex Manley , Mohammad Alian , Heechul Yun

Persistent memory (PM) is an emerging class of storage technology that combines the benefits of DRAM and SSD. This characteristic inspires research on persistent objects in PM with fine-grained concurrency control. Among such objects,…

Programming Languages · Computer Science 2022-03-16 Kyeongmin Cho , Seungmin Jeon , Jeehoon Kang

Current day processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). An efficient cache replacement policy at LLC is essential for reducing the off-chip memory transfer as…

Hardware Architecture · Computer Science 2013-07-25 Bijay Paikaray

In modern server CPUs, the Last-Level Cache (LLC) serves not only as a victim cache for higher-level private caches but also as a buffer for low-latency DMA transfers between CPU cores and I/O devices through Direct Cache Access (DCA).…

Hardware Architecture · Computer Science 2025-06-16 Haneul Park , Jiaqi Lou , Sangjin Lee , Yifan Yuan , Kyoung Soo Park , Yongseok Son , Ipoom Jeong , Nam Sung Kim

Cache plays a critical role in reducing the performance gap between CPU and main memory. A modern multi-core CPU generally employs a multi-level hierarchy of caches, through which the most recently and frequently used data are maintained in…

Hardware Architecture · Computer Science 2021-06-01 Rui Wang , Chundong Wang , Chongnan Ye

Motivated by emerging applications to the edge computing paradigm, we introduce a two-layer erasure-coded fault-tolerant distributed storage system offering atomic access for read and write operations. In edge computing, clients interact…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-01 Kishori M. Konwar , N. Prakash , Nancy Lynch , Muriel Medard

While (1) serverless computing is emerging as a popular form of cloud execution, datacenters are going through major changes: (2) storage dissaggregation in the system infrastructure level and (3) integration of domain-specific accelerators…

Last-level cache (LLC) partitioning is a technique to provide temporal isolation and low worst-case latency (WCL) bounds when cores access the shared LLC in multicore safety-critical systems. A typical approach to cache partitioning…

Hardware Architecture · Computer Science 2022-04-05 Zhuanhao Wu , Hiren Patel

With a growing number of cores in modern high-performance servers, effective sharing of the last level cache (LLC) is more critical than ever. The primary agenda of such systems is to maximize performance by efficiently supporting…

Programming Languages · Computer Science 2022-10-04 Bodhisatwa Chatterjee , Sharjeel Khan , Santosh Pande

Serverless computing offers attractive scalability, elasticity and cost-effectiveness. However, constraints on memory, CPU and function runtime have hindered its adoption for data-intensive applications and machine learning (ML) workloads.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-25 Joe Oakley , Hakan Ferhatosmanoglu

Memory controller scheduling is crucial in multicore processors, where DRAM bandwidth is shared. Since increased number of requests from multiple cores of processors becomes a source of bottleneck, scheduling the requests efficiently is…

Hardware Architecture · Computer Science 2019-07-19 Eduardo Olmedo Sanchez , Xian-He Sun
‹ Prev 1 2 3 10 Next ›