Related papers: MARS: Memory Aware Reordered Source
Raw signal genome analysis (RSGA) has emerged as a promising approach to enable real-time genome analysis by directly analyzing raw electrical signals. However, rapid advancements in sequencing technologies make it increasingly difficult…
Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly. As a promising solution achieving high scalability and low manufacturing cost, multi-accelerator systems widely exist in data centers,…
Sequence alignment is a fundamental process in computational biology which identifies regions of similarity in biological sequences. With the exponential growth in the volume of data in bioinformatics databases, the time, processing power,…
When multiple processor cores (CPUs) and a GPU integrated together on the same chip share the off-chip DRAM, requests from the GPU can heavily interfere with requests from the CPUs, leading to low system performance and starvation of cores.…
This paper investigates hardware-based memory compression designs to increase the memory bandwidth. When lines are compressible, the hardware can store multiple lines in a single memory location, and retrieve all these lines in a single…
In this paper, we introduce MARS, a new scheduling system for HPC-cloud infrastructures based on a cost-aware, flexible reinforcement learning approach, which serves as an intermediate layer for next generation HPC-cloud resource manager.…
As the size of artificial intelligence and machine learning (AI/ML) models and datasets grows, the memory bandwidth becomes a critical bottleneck. The paper presents a novel extended memory hierarchy that addresses some major memory…
One of the primary sources of unpredictability in modern multi-core embedded systems is contention over shared memory resources, such as caches, interconnects, and DRAM. Despite significant achievements in the design and analysis of…
Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops,…
Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we…
Although a data processing system often works as a batch processing system, many enterprises deploy such a system as a service, which we call the service-oriented data processing system. It has been shown that in-memory data processing…
AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance, but underprovisioned on density and…
Several studies have identified a significant amount of redundancy in the network traffic. For example, it is demonstrated that there is a great amount of redundancy within the content of a server over time. This redundancy can be leveraged…
Deep learning-based recommendation models (DLRMs) are widely deployed in commercial applications to enhance user experience. However, the large and sparse embedding layers in these models impose substantial memory bandwidth bottlenecks due…
The performance of large-scale computing systems often critically depends on high-performance communication networks. Dynamically reconfigurable topologies, e.g., based on optical circuit switches, are emerging as an innovative new…
Self-adaptive approaches for runtime resource management of manycore computing platforms often require a runtime model of the system that represents the software organization or the architecture of the target platform. The increasing…
Recent advances demonstrate that irregularly wired neural networks from Neural Architecture Search (NAS) and Random Wiring can not only automate the design of deep neural networks but also emit models that outperform previous manual…
Dynamic Random Access Memory (DRAM) is the prevalent memory technology used to build main memory systems of almost all computers. A fundamental shortcoming of DRAM is the need to refresh memory cells to keep stored data intact. DRAM refresh…
Sparse matrix ordering is a vital optimization technique often employed for solving large-scale sparse matrices. Its goal is to minimize the matrix bandwidth by reorganizing its rows and columns, thus enhancing efficiency. Conventional…
Memory system is often the main bottleneck in chipmultiprocessor (CMP) systems in terms of latency, bandwidth and efficiency, and recently additionally facing capacity and power problems in an era of big data. A lot of research works have…