Related papers: A Memory Hierarchical Layer Assigning and Prefetch…
Modern applications process massive data volumes that overwhelm the storage and retrieval capabilities of memory systems, making memory the primary performance and energy-efficiency bottleneck of computing systems. Although many…
Emerging applications, such as big data analytics and machine learning, require increasingly large amounts of main memory, often exceeding the capacity of current commodity processors built on DRAM technology. To address this, recent…
The memory hierarchy has a high impact on the performance and power consumption in the system. Moreover, current embedded systems, included in mobile devices, are specifically designed to run multimedia applications, which are memory…
High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is…
Computing has a huge memory problem. The memory system, consisting of multiple technologies at different levels, is responsible for most of the energy consumption, performance bottlenecks, robustness problems, monetary cost, and hardware…
We consider a basic cache network, in which a single server is connected to multiple users via a shared bottleneck link. The server has a database of files (content). Each user has an isolated memory that can be used to cache content in a…
The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the…
For decades, memory capabilities have scaled up much slower than compute capabilities, leaving memory utilization as a major bottleneck. Prefetching and cache hierarchies mitigate this in applications with easily predictable memory accesses…
Long-term memory enables large language model agents to tackle complex tasks through historical interactions. However, existing frameworks encounter a fundamental dilemma between compressing redundant information efficiently and maintaining…
The explosion in workload complexity and the recent slow-down in Moore's law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations,…
This article features extended summaries and retrospectives of some of the recent research done by our research group, SAFARI, on (1) various critical problems in memory systems and (2) how memory system bottlenecks affect graphics…
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with…
The rapid advancement of embedded multicore and many-core systems has revolutionized computing, enabling the development of high-performance, energy-efficient solutions for a wide range of applications. As models scale up in size, data…
The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…
Contemporary memory systems contain a variety of memory types, each possessing distinct characteristics. This trend empowers applications to opt for memory types aligning with developer's desired behavior. As a result, developers gain…
Modern computer designs support composite prefetching, where multiple individual prefetcher components are used to target different memory access patterns. However, multiple prefetchers competing for resources can drastically hurt…
Many high end and next generation computing systems to incorporated alternative memory technologies to meet performance goals. Since these technologies present distinct advantages and tradeoffs compared to conventional DDR* SDRAM, such as…
The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a…
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated…
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. However, prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control)…