English
Related papers

Related papers: DSPatch: Dual Spatial Pattern Prefetcher

200 papers

The discrepancy between processor speed and memory system performance continues to limit the performance of many workloads. To address the issue, one effective and well studied technique is cache prefetching. Many prefetching designs have…

Hardware Architecture · Computer Science 2026-02-10 Maccoy Merrell , Lei Wang , Stavros Kalafatis , Paul V. Gratz

With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the much-anticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature…

Emerging applications, such as big data analytics and machine learning, require increasingly large amounts of main memory, often exceeding the capacity of current commodity processors built on DRAM technology. To address this, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Manel Lurbe , Miguel Avargues , Salvador Petit , Maria E. Gomez , Rui Yang , Guanhao Wang , Julio Sahuquillo

Modern high-performance architectures employ large last-level caches (LLCs). While large LLCs can reduce average memory access latency for workloads with a high degree of locality, they can also increase latency for workloads with irregular…

Hardware Architecture · Computer Science 2025-11-26 Hoa Nguyen , Pongstorn Maidee , Jason Lowe-Power , Alireza Kaviani

We propose an approach to data memory prefetching which augments the standard prefetch buffer with selection criteria based on performance and usage pattern of a given instruction. This approach is built on top of a pattern matching based…

Hardware Architecture · Computer Science 2015-05-18 Jean Sung , Sebastian Krupa , Andrew Fishberg , Josef Spjut

Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's…

Hardware Architecture · Computer Science 2017-12-05 Haoyuan Wang , Zhiwei Luo

Data prefetching, i.e., the act of predicting application's future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely-used approach to hide the long latency of memory accesses. The fruitfulness…

Hardware Architecture · Computer Science 2020-09-03 Mohammad Bakhshalipour , Mehran Shakerinava , Fatemeh Golshan , Ali Ansari , Pejman Lotfi-Karman , Hamid Sarbazi-Azad

High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is…

Hardware Architecture · Computer Science 2021-03-30 Majid Jalili , Mattan Erez

Over the past two decades, the storage capacity and access bandwidth of main memory have improved tremendously, by 128x and 20x, respectively. These improvements are mainly due to the continuous technology scaling of DRAM (dynamic…

Hardware Architecture · Computer Science 2017-12-25 Kevin K. Chang

Putting the DRAM on the same package with a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising…

Hardware Architecture · Computer Science 2017-04-11 Xiangyao Yu , Christopher J. Hughes , Nadathur Satish , Onur Mutlu , Srinivas Devadas

The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and…

Hardware Architecture · Computer Science 2025-12-09 Zhongchun Zhou , Chengtao Lai , Yuhang Gu , Wei Zhang

Hardware based memory pooling enabled by interconnect standards like CXL have been gaining popularity amongst cloud providers and system integrators. While pooling memory resources has cost benefits, it comes at a penalty of increased…

Hardware Architecture · Computer Science 2024-06-24 Chandrahas Tirumalasetty , Narasimha Annapreddy

Modern storage systems intensively utilize data prefetching algorithms while processing sequences of the read requests. Performance of the prefetching algorithm (for instance increase of the cache hit ratio of the cache system - CHR)…

Databases · Computer Science 2024-06-14 Vadim Voevodkin , Andrey Sokolov

Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-17 Jens Domke , Emil Vatai , Balazs Gerofi , Yuetsu Kodama , Mohamed Wahib , Artur Podobas , Sparsh Mittal , Miquel Pericàs , Lingqi Zhang , Peng Chen , Aleksandr Drozd , Satoshi Matsuoka

Modern caches are often required to handle a massive amount of data, which exceeds the amount of available memory; thus, hybrid caches, specifically DRAM/SSD combination, become more and more prevalent. In such environments, in addition to…

Operating Systems · Computer Science 2022-06-28 Ohad Eytan , Roy Friedman

Advancement in Processor technology has made it easy to handle data-intensive workloads, but limiting main memory advances has created performance bottlenecks. In DRAM, there have been improvements in DRAM access latency as well as…

Hardware Architecture · Computer Science 2021-05-24 Saurabh Jaiswal , Shailendra Kumar Gupta , Soumya Soubhagya Dandapat

Byte-addressable persistent memory (PM) brings hash tables the potential of low latency, cheap persistence and instant recovery. The recent advent of Intel Optane DC Persistent Memory Modules (DCPMM) further accelerates this trend. Many new…

Databases · Computer Science 2020-10-30 Baotong Lu , Xiangpeng Hao , Tianzheng Wang , Eric Lo

As the High Performance Computing world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional storage/memory hierarchy, each compute node retains its data objects in its local…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-07 Yehonatan Fridman , Yaniv Snir , Matan Rusanovsky , Kfir Zvi , Harel Levin , Danny Hendler , Hagit Attiya , Gal Oren

Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity. As LLMs scale to hundreds of billions or trillions of parameters, distributed inference across…

Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memories shared among large numbers of CPUs.…

Hardware Architecture · Computer Science 2025-09-24 Samuel Dayo , Shuhan Liu , Peijing Li , Philip Levis , Subhasish Mitra , Thierry Tambe , David Tennenhouse , H. -S. Philip Wong
‹ Prev 1 2 3 10 Next ›