Related papers: MARS: Memory Aware Reordered Source

MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem

Raw signal genome analysis (RSGA) has emerged as a promising approach to enable real-time genome analysis by directly analyzing raw electrical signals. However, rapid advancements in sequencing technologies make it increasingly difficult…

Hardware Architecture · Computer Science 2025-07-04 Melina Soysal , Konstantina Koliogeorgi , Can Firtina , Nika Mansouri Ghiasi , Rakesh Nadig , Haiyu Mao , Geraldo F. Oliveira , Yu Liang , Klea Zambaku , Mohammad Sadrosadati , Onur Mutlu

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems

Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly. As a promising solution achieving high scalability and low manufacturing cost, multi-accelerator systems widely exist in data centers,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-25 Guan Shen , Jieru Zhao , Zeke Wang , Zhe Lin , Wenchao Ding , Chentao Wu , Quan Chen , Minyi Guo

A Customized Memory-aware Architecture for Biological Sequence Alignment

Sequence alignment is a fundamental process in computational biology which identifies regions of similarity in biological sequences. With the exponential growth in the volume of data in bioinformatics databases, the time, processing power,…

Hardware Architecture · Computer Science 2025-07-31 Nasrin Akbari , Mehdi Modarressi , Alireza Khadem

High-Performance and Energy-Effcient Memory Scheduler Design for Heterogeneous Systems

When multiple processor cores (CPUs) and a GPU integrated together on the same chip share the off-chip DRAM, requests from the GPU can heavily interfere with requests from the CPUs, leading to low system performance and starvation of cores.…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun , Gabriel H. Loh , Lavanya Subramanian , Kevin Chang , Onur Mutlu

CRAM: Efficient Hardware-Based Memory Compression for Bandwidth Enhancement

This paper investigates hardware-based memory compression designs to increase the memory bandwidth. When lines are compressible, the hardware can store multiple lines in a single memory location, and retrieve all these lines in a single…

Hardware Architecture · Computer Science 2018-07-23 Vinson Young , Sanjay Kariyappa , Moinuddin K. Qureshi

MARS: Malleable Actor-Critic Reinforcement Learning Scheduler

In this paper, we introduce MARS, a new scheduling system for HPC-cloud infrastructures based on a cost-aware, flexible reinforcement learning approach, which serves as an intermediate layer for next generation HPC-cloud resource manager.…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-26 Betis Baheri , Jacob Tronge , Bo Fang , Ang Li , Vipin Chaudhary , Qiang Guan

Addressing memory bandwidth scalability in vector processors for streaming applications

As the size of artificial intelligence and machine learning (AI/ML) models and datasets grows, the memory bandwidth becomes a critical bottleneck. The paper presents a novel extended memory hierarchy that addresses some major memory…

Hardware Architecture · Computer Science 2025-05-20 Jordi Altayo , Paul Delestrac , David Novo , Simey Yang , Debjyoti Bhattacharjee , Francky Catthoor

Analysis of Dynamic Memory Bandwidth Regulation in Multi-core Real-Time Systems

One of the primary sources of unpredictability in modern multi-core embedded systems is contention over shared memory resources, such as caches, interconnects, and DRAM. Despite significant achievements in the design and analysis of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-18 Ankit Agrawal , Renato Mancuso , Rodolfo Pellizzoni , Gerhard Fohler

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops,…

Operating Systems · Computer Science 2026-05-01 Yifei Wang , Hancheng Ye , Yechen Xu , Cong Guo , Chiyue Wei , Qinsi Wang , Dongting Li , Tingjun Chen , Hai "Helen" Li , Danyang Zhuo , Yiran Chen

Performance Impact of Memory Channels on Sparse and Irregular Algorithms

Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-10 Oded Green , James Fox , Jeffrey Young , Jun Shirako , David Bader

MURS: Mitigating Memory Pressure in Service-oriented Data Processing Systems

Although a data processing system often works as a batch processing system, many enterprises deploy such a system as a service, which we call the service-oriented data processing system. It has been shown that in-memory data processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-30 Xuanhua Shi , Xiong Zhang , Ligang He , Hai Jin , Zhixiang Ke , Song Wu

Managed-Retention Memory: A New Class of Memory for the AI Era

AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance, but underprovisioned on density and…

Hardware Architecture · Computer Science 2025-01-17 Sergey Legtchenko , Ioan Stefanovici , Richard Black , Antony Rowstron , Junyi Liu , Paolo Costa , Burcu Canakci , Dushyanth Narayanan , Xingbo Wu

On the Network-Wide Gain of Memory-Assisted Source Coding

Several studies have identified a significant amount of redundancy in the network traffic. For example, it is demonstrated that there is a great amount of redundancy within the content of a server over time. This redundancy can be leveraged…

Information Theory · Computer Science 2016-11-18 Mohsen Sardari , Ahmad Beirami , Faramarz Fekri

ReCross: Efficient Embedding Reduction Scheme for In-Memory Computing using ReRAM-Based Crossbar

Deep learning-based recommendation models (DLRMs) are widely deployed in commercial applications to enhance user experience. However, the large and sparse embedding layers in these models impose substantial memory bandwidth bottlenecks due…

Hardware Architecture · Computer Science 2025-09-16 Yu-Hong Lai , Chieh-Lin Tsai , Wen Sheng Lim , Han-Wen Hu , Tei-Wei Kuo , Yuan-Hao Chang

Mars: Near-Optimal Throughput with Shallow Buffers in Reconfigurable Datacenter Networks

The performance of large-scale computing systems often critically depends on high-performance communication networks. Dynamically reconfigurable topologies, e.g., based on optical circuit switches, are emerging as an innovative new…

Networking and Internet Architecture · Computer Science 2022-12-29 Vamsi Addanki , Chen Avin , Stefan Schmid

MARS: Middleware for Adaptive Reflective Computer Systems

Self-adaptive approaches for runtime resource management of manycore computing platforms often require a runtime model of the system that represents the software organization or the architecture of the target platform. The increasing…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-27 Tiago Mück , Bryan Donyanavard , Biswadip Maity , Kasra Moazzemi , Nikil Dutt

Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices

Recent advances demonstrate that irregularly wired neural networks from Neural Architecture Search (NAS) and Random Wiring can not only automate the design of deep neural networks but also emit models that outperform previous manual…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-06 Byung Hoon Ahn , Jinwon Lee , Jamie Menjay Lin , Hsin-Pai Cheng , Jilei Hou , Hadi Esmaeilzadeh

Retrospective: RAIDR: Retention-Aware Intelligent DRAM Refresh

Dynamic Random Access Memory (DRAM) is the prevalent memory technology used to build main memory systems of almost all computers. A fundamental shortcoming of DRAM is the need to refresh memory cells to keep stored data intact. DRAM refresh…

Hardware Architecture · Computer Science 2023-06-29 Onur Mutlu

Selection of Supervised Learning-based Sparse Matrix Reordering Algorithms

Sparse matrix ordering is a vital optimization technique often employed for solving large-scale sparse matrices. Its goal is to minimize the matrix bandwidth by reorganizing its rows and columns, thus enhancing efficiency. Conventional…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-14 Tao Tang , Youfu Jiang , Yingbo Cui , Jianbin Fang , Peng Zhang , Lin Peng , Chun Huang

MIMS: Towards a Message Interface based Memory System

Memory system is often the main bottleneck in chipmultiprocessor (CMP) systems in terms of latency, bandwidth and efficiency, and recently additionally facing capacity and power problems in an era of big data. A lot of research works have…

Hardware Architecture · Computer Science 2014-04-10 Licheng Chen , Tianyue Lu , Yanan Wang , Mingyu Chen , Yuan Ruan , Zehan Cui , Yongbing Huang , Mingyang Chen , Jiutian Zhang , Yungang Bao