Related papers: On Algorithmic Cache Optimization

Improving the Representativeness of Simulation Intervals for the Cache Memory System

Accurate simulation techniques are indispensable to efficiently propose new memory or architectural organizations. As implementing new hardware concepts in real systems is often not feasible, cycle-accurate simulators employed together with…

Hardware Architecture · Computer Science 2024-02-02 Nicolas Bueno , Fernando Castro , Luis Pinuel , Jose Ignacio Gomez-Perez , Francky Catthoor

An O(1) algorithm for implementing the LFU cache eviction scheme

Cache eviction algorithms are used widely in operating systems, databases and other systems that use caches to speed up execution by caching data that is used by the application. There are many policies such as MRU (Most Recently Used), MFU…

Data Structures and Algorithms · Computer Science 2021-10-25 Dhruv Matani , Ketan Shah , Anirban Mitra

Improving the Space-Time Efficiency of Processor-Oblivious Matrix Multiplication Algorithms

Classic cache-oblivious parallel matrix multiplication algorithms achieve optimality either in time or space, but not both, which promotes lots of research on the best possible balance or tradeoff of such algorithms. We study modern…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-14 Yuan Tang

A Fast Analytical Model of Fully Associative Caches

While the cost of computation is an easy to understand local property, the cost of data movement on cached architectures depends on global state, does not compose, and is hard to predict. As a result, programmers often fail to consider the…

Performance · Computer Science 2020-01-07 Tobias Gysi , Tobias Grosser , Laurin Brandner , Torsten Hoefler

Comparative Analysis of Distributed Caching Algorithms: Performance Metrics and Implementation Considerations

This paper presents a comprehensive comparison of distributed caching algorithms employed in modern distributed systems. We evaluate various caching strategies including Least Recently Used (LRU), Least Frequently Used (LFU), Adaptive…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-04 Helen Mayer , James Richards

Security Analysis of Cache Replacement Policies

Modern computer architectures share physical resources between different programs in order to increase area-, energy-, and cost-efficiency. Unfortunately, sharing often gives rise to side channels that can be exploited for extracting or…

Cryptography and Security · Computer Science 2017-01-24 Pablo Cañones , Boris Köpf , Jan Reineke

On the complexity of cache analysis for different replacement policies

Modern processors use cache memory: a memory access that "hits" the cache returns early, while a "miss" takes more time. Given a memory access in a program, cache analysis consists in deciding whether this access is always a hit, always a…

Programming Languages · Computer Science 2019-09-24 David Monniaux , Valentin Touzeau

Proficient Pair of Replacement Algorithms on L1 and L2 Cache for Merge Sort

Memory hierarchy is used to compete the processors speed. Cache memory is the fast memory which is used to conduit the speed difference of memory and processor. The access patterns of Level 1 cache (L1) and Level 2 cache (L2) are different,…

Operating Systems · Computer Science 2010-03-23 Richa Gupta , Sanjiv Tokekar

On the best approximation of the hierarchical matrix product

The multiplication of matrices is an important arithmetic operation in computational mathematics. In the context of hierarchical matrices, this operation can be realized by the multiplication of structured block-wise low-rank matrices,…

Numerical Analysis · Mathematics 2018-05-24 Jürgen Dölz , Helmut Harbrecht , Michael D. Multerer

On the Capacity of Secure Distributed Matrix Multiplication

Matrix multiplication is one of the key operations in various engineering applications. Outsourcing large-scale matrix multiplication tasks to multiple distributed servers or cloud is desirable to speed up computation. However, security…

Information Theory · Computer Science 2018-06-04 Wei-Ting Chang , Ravi Tandon

Towards a Theory of Cache-Efficient Algorithms

We describe a model that enables us to analyze the running time of an algorithm in a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our model, an extension of Aggarwal and Vitter's I/O…

Hardware Architecture · Computer Science 2007-05-23 Sandeep Sen , Siddhartha Chatterjee , Neeraj Dumir

Matrix Multiplication in the MPC Model

In this paper, we present algorithms to solve matrix multiplication problems in the MPC model. In particular, we consider the problem under various processor/memory constraints in the MPC model and prove the following results. 1.…

Computational Complexity · Computer Science 2025-09-30 Lakshya Joshi , Arya Deshmukh , Atharv Chhabra , Chetan Gupta

Toward Robust and Efficient ML-Based GPU Caching for Modern Inference

In modern GPU inference, cache efficiency remains a major bottleneck, and heuristic policies such as \textsc{LRU} can perform far worse than the offline optimum. Existing learning-based caching systems improve hit rates mainly through…

Machine Learning · Computer Science 2026-04-27 Peng Chen , Jiaji Zhang , Hailiang Zhao , Yirong Zhang , Shenyao Chen , Jiahong Yu , Xueyan Tang , Yixuan Wang , Hao Li , Jianping Zou , Gang Xiong , Kingsum Chow , Shuibing He , Shuiguang Deng

Computable Compressed Matrices

The biggest cost of computing with large matrices in any modern computer is related to memory latency and bandwidth. The average latency of modern RAM reads is 150 times greater than a clock step of the processor. Throughput is a little…

Data Structures and Algorithms · Computer Science 2013-03-04 Crysttian Arantes Paixão , Flávio Codeço Coelho

Multiplica\c{c}\~ao de matrizes: uma compara\c{c}\~ao entre as abordagens sequencial (CPU) e paralela (GPU)

Designing problems using matrices is very important in Computer Science. Fields like graph computer, graphs theory, and machine learning use matrices very often to solve their own problems. The most often matrix operation is the…

Performance · Computer Science 2019-05-10 Andre G. C. Pacheco

Spatial multi-LRU: Distributed Caching for Wireless Networks with Coverage Overlaps

This article introduces a novel family of decentralised caching policies, applicable to wireless networks with finite storage at the edge-nodes (stations). These policies, that are based on the Least-Recently-Used replacement principle, are…

Networking and Internet Architecture · Computer Science 2016-12-14 Anastasios Giovanidis , Apostolos Avranas

On Optimal Caching and Model Multiplexing for Large Model Inference

Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. In particular, the large-scale deployment of these models is…

Machine Learning · Computer Science 2023-08-30 Banghua Zhu , Ying Sheng , Lianmin Zheng , Clark Barrett , Michael I. Jordan , Jiantao Jiao

Rate-Efficiency and Straggler-Robustness through Partition in Distributed Two-Sided Secure Matrix Computation

Computationally efficient matrix multiplication is a fundamental requirement in various fields, including and particularly in data analytics. To do so, the computation task of a large-scale matrix multiplication is typically outsourced to…

Information Theory · Computer Science 2018-11-01 Jaber Kakar , Seyedhamed Ebadifar , Aydin Sezgin

Fundamentals of Caching Layered Data objects

The effective management of large amounts of data processed or required by today's cloud or edge computing systems remains a fundamental challenge. This paper focuses on cache management for applications where data objects can be stored in…

Networking and Internet Architecture · Computer Science 2025-04-03 Agrim Bari , Gustavo de Veciana , George Kesidis

Superfast CUR Matrix Algorithms, Their Pre-Processing and Extensions

We study superfast algorithms that computes low rank approximation of a matrix (hereafter referred to as LRA) that use much fewer memory cells and arithmetic operations than the input matrix has entries. We first specify a family of 2mn…

Numerical Analysis · Mathematics 2018-06-08 Victor Y. Pan , Qi Luan , John Svadlenka , Liang Zhao