Related papers: Learning Memory Access Patterns
Accurate memory prefetching is paramount for processor performance, and modern processors employ various techniques to identify and prefetch different memory access patterns. While most modern prefetchers target spatio-temporal patterns by…
Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…
Traditional neural networks require enormous amounts of data to build their complex mappings during a slow training procedure that hinders their abilities for relearning and adapting to new data. Memory-augmented neural networks enhance…
For decades, memory capabilities have scaled up much slower than compute capabilities, leaving memory utilization as a major bottleneck. Prefetching and cache hierarchies mitigate this in applications with easily predictable memory accesses…
The present von Neumann computing paradigm involves a significant amount of information transfer between a central processing unit (CPU) and memory, with concomitant limitations in the actual execution speed. However, it has been recently…
Deep neural networks have excelled on a wide range of problems, from vision to language and game playing. Neural networks very gradually incorporate information into weights as they process data, requiring very low learning rates. If the…
In this paper, we propose and investigate a new neural network architecture called Neural Random Access Machine. It can manipulate and dereference pointers to an external variable-size random-access memory. The model is trained from pure…
Deep Learning neural networks are pervasive, but traditional computer architectures are reaching the limits of being able to efficiently execute them for the large workloads of today. They are limited by the von Neumann bottleneck: the high…
Intelligence necessitates memory. Without memory, humans fail to perform various nontrivial tasks such as reading novels, playing games or solving maths. As the ultimate goal of machine learning is to derive intelligent systems that learn…
Unified Virtual Memory (UVM) relieves the developers from the onus of maintaining complex data structures and explicit data migration by enabling on-demand data movement between CPU memory and GPU memory. However, on-demand paging soon…
Modern applications process massive data volumes that overwhelm the storage and retrieval capabilities of memory systems, making memory the primary performance and energy-efficiency bottleneck of computing systems. Although many…
The conventional von Neumann architecture has been revealed as a major performance and energy bottleneck for rising data-intensive applications. %, due to the intensive data movements. The decade-old idea of leveraging in-memory processing…
Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of "one-shot learning." Traditional gradient-based networks require a lot of data to learn, often through…
The memory subsystem has always been a bottleneck in performance as well as significant power contributor in memory intensive applications. Many researchers have presented multi-layered memory hierarchies as a means to design energy and…
Advancement in Processor technology has made it easy to handle data-intensive workloads, but limiting main memory advances has created performance bottlenecks. In DRAM, there have been improvements in DRAM access latency as well as…
Advances in sensor technology and automation have ushered in an era of data abundance, where the ability to identify and extract relevant information in real time has become increasingly critical. Traditional filtering approaches, which…
The rapid advancement of neuromorphic technology aims to address the memory wall challenge inherent in conventional von Neumann architectures. This paper critically examines current digital neuromorphic processors and their strategies to…
We propose a class of interleavers for a novel deep neural network (DNN) architecture that uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements, and speed up training. The…
Neural networks have become dominant computational workloads across cloud and edge platforms, but their rapid growth in model size and deployment diversity has exposed hardware bottlenecks increasingly dominated by memory movement,…
Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's…