Related papers: When In-Memory Computing is Slower than Heavy Disk…
Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory…
Memory-centric computing aims to enable computation capability in and near all places where data is generated and stored. As such, it can greatly reduce the large negative performance and energy impact of data access and data movement, by…
Computing has a huge memory problem. The memory system, consisting of multiple technologies at different levels, is responsible for most of the energy consumption, performance bottlenecks, robustness problems, monetary cost, and hardware…
In high performance computing, researchers try to optimize the CPU Scheduling algorithms, for faster and efficient working of computers. But a process needs both CPU bound and I/O bound for completion of its execution. With modernization of…
In-memory computing is a promising alternative to traditional computer designs, as it helps overcome performance limits caused by the separation of memory and processing units. However, many current approaches struggle with unreliable…
The speed of modern digital systems is severely limited by memory latency (the ``Memory Wall'' problem). Data exchange between Logic and Memory is also responsible for a large part of the system energy consumption. Logic--In--Memory (LiM)…
Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memories shared among large numbers of CPUs.…
Memory-centric computing aims to enable computation capability in and near all places where data is generated and stored. As such, it can greatly reduce the large negative performance and energy impact of data access and data movement, by…
Sorting is a fundamental operation across numerous computational domains. Traditionally, this process involves transferring data from main memory to a processing unit for sorting, followed by writing the sorted data back to memory. This…
This paper presents an analysis of the fundamental limits on energy efficiency in both digital and analog in-memory computing architectures, and compares their performance to single instruction, single data (scalar) machines specifically in…
High performance networks (e.g. Infiniband) rely on zero-copy operations for performance. Zero-copy operations, as the name implies, avoid copying buffers for sending and receiving data. Instead, hardware devices directly read and write to…
An accepted practice to decrease applications' memory usage is to reduce the amount and frequency of memory allocations. Factors such as (a) the prevalence of out-of-memory (OOM) killers, (b) memory allocations in modern programming…
Memory latencies and bandwidth are major factors, limiting system performance and scalability. Modern CPUs aim at hiding latencies by employing large caches, out-of-order execution, or complex hardware prefetchers. However, software-based…
Many modern and emerging applications must process increasingly large volumes of data. Unfortunately, prevalent computing paradigms are not designed to efficiently handle such large-scale data: the energy and performance costs to move this…
Software managed byte-addressable hybrid memory systems consisting of DRAMs and NVMMs offer a lot of flexibility to design efficient large scale data processing applications. Operating systems (OS) play an important role in enabling the…
In the future, embedded processors must process more computation-intensive network applications and internet traffic and packet-processing tasks become heavier and sophisticated. Since the processor performance is severely related to the…
Algorithm research focuses primarily on how many operations processors need to do (time complexity). But for many problems, both the runtime and energy used are dominated by memory accesses. In this paper, we present the first broad survey…
Systems that require high-throughput and fault tolerance, such as key-value stores and databases, are looking to persistent memory to combine the performance of in-memory systems with the data-consistent fault-tolerance of nonvolatile…
A processor's memory hierarchy has a major impact on the performance of running code. However, computing platforms, where the actual hardware characteristics are hidden from both the end user and the tools that mediate execution, such as a…
This article features extended summaries and retrospectives of some of the recent research done by our research group, SAFARI, on (1) various critical problems in memory systems and (2) how memory system bottlenecks affect graphics…