English
Related papers

Related papers: A Memory Hierarchical Layer Assigning and Prefetch…

200 papers

Modern applications process massive data volumes that overwhelm the storage and retrieval capabilities of memory systems, making memory the primary performance and energy-efficiency bottleneck of computing systems. Although many…

Hardware Architecture · Computer Science 2026-03-10 Rahul Bera

Emerging applications, such as big data analytics and machine learning, require increasingly large amounts of main memory, often exceeding the capacity of current commodity processors built on DRAM technology. To address this, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Manel Lurbe , Miguel Avargues , Salvador Petit , Maria E. Gomez , Rui Yang , Guanhao Wang , Julio Sahuquillo

The memory hierarchy has a high impact on the performance and power consumption in the system. Moreover, current embedded systems, included in mobile devices, are specifically designed to run multimedia applications, which are memory…

Hardware Architecture · Computer Science 2023-03-29 Josefa Díaz Álvarez , José L. Risco-Martín , J. Manuel Colmenar

High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is…

Hardware Architecture · Computer Science 2021-03-30 Majid Jalili , Mattan Erez

Computing has a huge memory problem. The memory system, consisting of multiple technologies at different levels, is responsible for most of the energy consumption, performance bottlenecks, robustness problems, monetary cost, and hardware…

Hardware Architecture · Computer Science 2025-09-05 Onur Mutlu , Ataberk Olgun , Ismail Emir Yuksel

We consider a basic cache network, in which a single server is connected to multiple users via a shared bottleneck link. The server has a database of files (content). Each user has an isolated memory that can be used to cache content in a…

Information Theory · Computer Science 2019-02-19 Qian Yu , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-25 Juliana M. N. Silva , Cristina Boeres , Lúcia M. A. Drummond , Artur A. Pessoa

For decades, memory capabilities have scaled up much slower than compute capabilities, leaving memory utilization as a major bottleneck. Prefetching and cache hierarchies mitigate this in applications with easily predictable memory accesses…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-02 Dawson Fox , Jose Monsalve Diaz , Xiaoming Li

Long-term memory enables large language model agents to tackle complex tasks through historical interactions. However, existing frameworks encounter a fundamental dilemma between compressing redundant information efficiently and maintaining…

Artificial Intelligence · Computer Science 2026-02-10 Zhenyuan Zhang , Xianzhang Jia , Zhiqin Yang , Zhenbo Song , Wei Xue , Sirui Han , Yike Guo

The explosion in workload complexity and the recent slow-down in Moore's law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations,…

This article features extended summaries and retrospectives of some of the recent research done by our research group, SAFARI, on (1) various critical problems in memory systems and (2) how memory system bottlenecks affect graphics…

Hardware Architecture · Computer Science 2018-05-30 Onur Mutlu , Saugata Ghose , Rachata Ausavarungnirun

Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with…

Machine Learning · Computer Science 2022-03-28 Alexander Isenko , Ruben Mayer , Jeffrey Jedele , Hans-Arno Jacobsen

The rapid advancement of embedded multicore and many-core systems has revolutionized computing, enabling the development of high-performance, energy-efficient solutions for a wide range of applications. As models scale up in size, data…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-15 Ruhai Lin , Rui-Jie Zhu , Jason K. Eshraghian

The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun

Contemporary memory systems contain a variety of memory types, each possessing distinct characteristics. This trend empowers applications to opt for memory types aligning with developer's desired behavior. As a result, developers gain…

Performance · Computer Science 2024-08-14 Andrès Rubio Proaño , Kento Sato

Modern computer designs support composite prefetching, where multiple individual prefetcher components are used to target different memory access patterns. However, multiple prefetchers competing for resources can drastically hurt…

Hardware Architecture · Computer Science 2023-07-18 Erika S. Alcorta , Mahesh Madhav , Scott Tetrick , Neeraja J. Yadwadkar , Andreas Gerstlauer

Many high end and next generation computing systems to incorporated alternative memory technologies to meet performance goals. Since these technologies present distinct advantages and tradeoffs compared to conventional DDR* SDRAM, such as…

Performance · Computer Science 2021-10-06 M. Ben Olson , Brandon Kammerdiener , Kshitij A. Doshi , Terry Jones , Michael R. Jantz

The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a…

Computation and Language · Computer Science 2026-03-24 Hadi Pouransari , David Grangier , C Thomas , Michael Kirchhof , Oncel Tuzel

Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated…

Computation and Language · Computer Science 2024-12-23 Vincent-Pierre Berges , Barlas Oğuz , Daniel Haziza , Wen-tau Yih , Luke Zettlemoyer , Gargi Ghosh

In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. However, prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control)…

Machine Learning · Computer Science 2021-10-22 Jonas Gehring , Gabriel Synnaeve , Andreas Krause , Nicolas Usunier
‹ Prev 1 2 3 10 Next ›