English
Related papers

Related papers: MARS: Memory Aware Reordered Source

200 papers

Raw signal genome analysis (RSGA) has emerged as a promising approach to enable real-time genome analysis by directly analyzing raw electrical signals. However, rapid advancements in sequencing technologies make it increasingly difficult…

Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly. As a promising solution achieving high scalability and low manufacturing cost, multi-accelerator systems widely exist in data centers,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-25 Guan Shen , Jieru Zhao , Zeke Wang , Zhe Lin , Wenchao Ding , Chentao Wu , Quan Chen , Minyi Guo

Sequence alignment is a fundamental process in computational biology which identifies regions of similarity in biological sequences. With the exponential growth in the volume of data in bioinformatics databases, the time, processing power,…

Hardware Architecture · Computer Science 2025-07-31 Nasrin Akbari , Mehdi Modarressi , Alireza Khadem

When multiple processor cores (CPUs) and a GPU integrated together on the same chip share the off-chip DRAM, requests from the GPU can heavily interfere with requests from the CPUs, leading to low system performance and starvation of cores.…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun , Gabriel H. Loh , Lavanya Subramanian , Kevin Chang , Onur Mutlu

This paper investigates hardware-based memory compression designs to increase the memory bandwidth. When lines are compressible, the hardware can store multiple lines in a single memory location, and retrieve all these lines in a single…

Hardware Architecture · Computer Science 2018-07-23 Vinson Young , Sanjay Kariyappa , Moinuddin K. Qureshi

In this paper, we introduce MARS, a new scheduling system for HPC-cloud infrastructures based on a cost-aware, flexible reinforcement learning approach, which serves as an intermediate layer for next generation HPC-cloud resource manager.…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-26 Betis Baheri , Jacob Tronge , Bo Fang , Ang Li , Vipin Chaudhary , Qiang Guan

As the size of artificial intelligence and machine learning (AI/ML) models and datasets grows, the memory bandwidth becomes a critical bottleneck. The paper presents a novel extended memory hierarchy that addresses some major memory…

Hardware Architecture · Computer Science 2025-05-20 Jordi Altayo , Paul Delestrac , David Novo , Simey Yang , Debjyoti Bhattacharjee , Francky Catthoor

One of the primary sources of unpredictability in modern multi-core embedded systems is contention over shared memory resources, such as caches, interconnects, and DRAM. Despite significant achievements in the design and analysis of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-18 Ankit Agrawal , Renato Mancuso , Rodolfo Pellizzoni , Gerhard Fohler

Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops,…

Operating Systems · Computer Science 2026-05-01 Yifei Wang , Hancheng Ye , Yechen Xu , Cong Guo , Chiyue Wei , Qinsi Wang , Dongting Li , Tingjun Chen , Hai "Helen" Li , Danyang Zhuo , Yiran Chen

Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-10 Oded Green , James Fox , Jeffrey Young , Jun Shirako , David Bader

Although a data processing system often works as a batch processing system, many enterprises deploy such a system as a service, which we call the service-oriented data processing system. It has been shown that in-memory data processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-30 Xuanhua Shi , Xiong Zhang , Ligang He , Hai Jin , Zhixiang Ke , Song Wu

AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance, but underprovisioned on density and…

Several studies have identified a significant amount of redundancy in the network traffic. For example, it is demonstrated that there is a great amount of redundancy within the content of a server over time. This redundancy can be leveraged…

Information Theory · Computer Science 2016-11-18 Mohsen Sardari , Ahmad Beirami , Faramarz Fekri

Deep learning-based recommendation models (DLRMs) are widely deployed in commercial applications to enhance user experience. However, the large and sparse embedding layers in these models impose substantial memory bandwidth bottlenecks due…

Hardware Architecture · Computer Science 2025-09-16 Yu-Hong Lai , Chieh-Lin Tsai , Wen Sheng Lim , Han-Wen Hu , Tei-Wei Kuo , Yuan-Hao Chang

The performance of large-scale computing systems often critically depends on high-performance communication networks. Dynamically reconfigurable topologies, e.g., based on optical circuit switches, are emerging as an innovative new…

Networking and Internet Architecture · Computer Science 2022-12-29 Vamsi Addanki , Chen Avin , Stefan Schmid

Self-adaptive approaches for runtime resource management of manycore computing platforms often require a runtime model of the system that represents the software organization or the architecture of the target platform. The increasing…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-27 Tiago Mück , Bryan Donyanavard , Biswadip Maity , Kasra Moazzemi , Nikil Dutt

Recent advances demonstrate that irregularly wired neural networks from Neural Architecture Search (NAS) and Random Wiring can not only automate the design of deep neural networks but also emit models that outperform previous manual…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-06 Byung Hoon Ahn , Jinwon Lee , Jamie Menjay Lin , Hsin-Pai Cheng , Jilei Hou , Hadi Esmaeilzadeh

Dynamic Random Access Memory (DRAM) is the prevalent memory technology used to build main memory systems of almost all computers. A fundamental shortcoming of DRAM is the need to refresh memory cells to keep stored data intact. DRAM refresh…

Hardware Architecture · Computer Science 2023-06-29 Onur Mutlu

Sparse matrix ordering is a vital optimization technique often employed for solving large-scale sparse matrices. Its goal is to minimize the matrix bandwidth by reorganizing its rows and columns, thus enhancing efficiency. Conventional…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-14 Tao Tang , Youfu Jiang , Yingbo Cui , Jianbin Fang , Peng Zhang , Lin Peng , Chun Huang

Memory system is often the main bottleneck in chipmultiprocessor (CMP) systems in terms of latency, bandwidth and efficiency, and recently additionally facing capacity and power problems in an era of big data. A lot of research works have…

Hardware Architecture · Computer Science 2014-04-10 Licheng Chen , Tianyue Lu , Yanan Wang , Mingyu Chen , Yuan Ruan , Zehan Cui , Yongbing Huang , Mingyang Chen , Jiutian Zhang , Yungang Bao
‹ Prev 1 2 3 10 Next ›