Related papers: A Memory Hierarchical Layer Assigning and Prefetch…

Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques

Modern applications process massive data volumes that overwhelm the storage and retrieval capabilities of memory systems, making memory the primary performance and energy-efficiency bottleneck of computing systems. Although many…

Hardware Architecture · Computer Science 2026-03-10 Rahul Bera

Prefetching in Deep Memory Hierarchies with NVRAM as Main Memory

Emerging applications, such as big data analytics and machine learning, require increasingly large amounts of main memory, often exceeding the capacity of current commodity processors built on DRAM technology. To address this, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Manel Lurbe , Miguel Avargues , Salvador Petit , Maria E. Gomez , Rui Yang , Guanhao Wang , Julio Sahuquillo

Evolutionary Design of the Memory Subsystem

The memory hierarchy has a high impact on the performance and power consumption in the system. Moreover, current embedded systems, included in mobile devices, are specifically designed to run multimedia applications, which are memory…

Hardware Architecture · Computer Science 2023-03-29 Josefa Díaz Álvarez , José L. Risco-Martín , J. Manuel Colmenar

Reducing Load Latency with Cache Level Prediction

High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is…

Hardware Architecture · Computer Science 2021-03-30 Majid Jalili , Mattan Erez

Memory-Centric Computing: Solving Computing's Memory Problem

Computing has a huge memory problem. The memory system, consisting of multiple technologies at different levels, is responsible for most of the energy consumption, performance bottlenecks, robustness problems, monetary cost, and hardware…

Hardware Architecture · Computer Science 2025-09-05 Onur Mutlu , Ataberk Olgun , Ismail Emir Yuksel

The Exact Rate-Memory Tradeoff for Caching with Uncoded Prefetching

We consider a basic cache network, in which a single server is connected to multiple users via a shared bottleneck link. The server has a database of files (content). Each user has an isolated memory that can be used to cache content in a…

Information Theory · Computer Science 2019-02-19 Qian Yu , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Memory Aware Load Balance Strategy on a Parallel Branch-and-Bound Application

The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-25 Juliana M. N. Silva , Cristina Boeres , Lúcia M. A. Drummond , Artur A. Pessoa

On Memory Codelets: Prefetching, Recoding, Moving and Streaming Data

For decades, memory capabilities have scaled up much slower than compute capabilities, leaving memory utilization as a major bottleneck. Prefetching and cache hierarchies mitigate this in applications with easily predictable memory accesses…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-02 Dawson Fox , Jose Monsalve Diaz , Xiaoming Li

MemFly: On-the-Fly Memory Optimization via Information Bottleneck

Long-term memory enables large language model agents to tackle complex tasks through historical interactions. However, existing frameworks encounter a fundamental dilemma between compressing redundant information efficiently and maintaining…

Artificial Intelligence · Computer Science 2026-02-10 Zhenyuan Zhang , Xianzhang Jia , Zhiqin Yang , Zhenbo Song , Wei Xue , Sirui Han , Yike Guo

Learning Memory Access Patterns

The explosion in workload complexity and the recent slow-down in Moore's law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations,…

Machine Learning · Computer Science 2018-03-16 Milad Hashemi , Kevin Swersky , Jamie A. Smith , Grant Ayers , Heiner Litz , Jichuan Chang , Christos Kozyrakis , Parthasarathy Ranganathan

Recent Advances in Overcoming Bottlenecks in Memory Systems and Managing Memory Resources in GPU Systems

This article features extended summaries and retrospectives of some of the recent research done by our research group, SAFARI, on (1) various critical problems in memory systems and (2) how memory system bottlenecks affect graphics…

Hardware Architecture · Computer Science 2018-05-30 Onur Mutlu , Saugata Ghose , Rachata Ausavarungnirun

Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines

Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with…

Machine Learning · Computer Science 2022-03-28 Alexander Isenko , Ruben Mayer , Jeffrey Jedele , Hans-Arno Jacobsen

Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks

The rapid advancement of embedded multicore and many-core systems has revolutionized computing, enabling the development of high-performance, energy-efficient solutions for a wide range of applications. As models scale up in size, data…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-15 Ruhai Lin , Rui-Jie Zhu , Jason K. Eshraghian

Techniques for Shared Resource Management in Systems with Throughput Processors

The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun

Understanding Power Consumption Metric on Heterogeneous Memory Systems

Contemporary memory systems contain a variety of memory types, each possessing distinct characteristics. This trend empowers applications to opt for memory types aligning with developer's desired behavior. As a result, developers gain…

Performance · Computer Science 2024-08-14 Andrès Rubio Proaño , Kento Sato

Lightweight ML-based Runtime Prefetcher Selection on Many-core Platforms

Modern computer designs support composite prefetching, where multiple individual prefetcher components are used to target different memory access patterns. However, multiple prefetchers competing for resources can drastically hurt…

Hardware Architecture · Computer Science 2023-07-18 Erika S. Alcorta , Mahesh Madhav , Scott Tetrick , Neeraja J. Yadwadkar , Andreas Gerstlauer

Online Application Guidance for Heterogeneous Memory Systems

Many high end and next generation computing systems to incorporated alternative memory technologies to meet performance goals. Since these technologies present distinct advantages and tradeoffs compared to conventional DDR* SDRAM, such as…

Performance · Computer Science 2021-10-06 M. Ben Olson , Brandon Kammerdiener , Kshitij A. Doshi , Terry Jones , Michael R. Jantz

Pretraining with hierarchical memories: separating long-tail and common knowledge

The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a…

Computation and Language · Computer Science 2026-03-24 Hadi Pouransari , David Grangier , C Thomas , Michael Kirchhof , Oncel Tuzel

Memory Layers at Scale

Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated…

Computation and Language · Computer Science 2024-12-23 Vincent-Pierre Berges , Barlas Oğuz , Daniel Haziza , Wen-tau Yih , Luke Zettlemoyer , Gargi Ghosh

Hierarchical Skills for Efficient Exploration

In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. However, prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control)…

Machine Learning · Computer Science 2021-10-22 Jonas Gehring , Gabriel Synnaeve , Andreas Krause , Nicolas Usunier