Related papers: Memory Controller Design Under Cloud Workloads

Workload Behavior Driven Memory Subsystem Design for Hyperscale

Hyperscalars run services across a large fleet of servers, serving billions of users worldwide. These services, however, behave differently than commonly available benchmark suites, resulting in server architectures that are not optimized…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-03 Suyash Mahar , Hao Wang , Wei Shu , Abhishek Dhanotia

Programmable FPGA-based Memory Controller

Even with generational improvements in DRAM technology, memory access latency still remains the major bottleneck for application accelerators, primarily due to limitations in memory interface IPs which cannot fully account for variations in…

Hardware Architecture · Computer Science 2021-08-24 Sasindu Wijeratne , Sanket Pattnaik , Zhiyu Chen , Rajgopal Kannan , Viktor Prasanna

Centralized Congestion Control and Scheduling in a Datacenter

We consider the problem of designing a packet-level congestion control and scheduling policy for datacenter networks. Current datacenter networks primarily inherit the principles that went into the design of Internet, where congestion…

Networking and Internet Architecture · Computer Science 2017-10-10 Devavrat Shah , Qiaomin Xie

Design Guidelines for High-Performance SCM Hierarchies

With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the much-anticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature…

Hardware Architecture · Computer Science 2019-03-11 Dmitrii Ustiugov , Alexandros Daglis , Javier Picorel , Mark Sutherland , Edouard Bugnion , Babak Falsafi , Dionisios Pnevmatikatos

Plan-based Job Scheduling for Supercomputers with Shared Burst Buffers

The ever-increasing gap between compute and I/O performance in HPC platforms, together with the development of novel NVMe storage devices (NVRAM), led to the emergence of the burst buffer concept - an intermediate persistent storage layer…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-11 Jan Kopanski , Krzysztof Rzadca

Memory Aware Load Balance Strategy on a Parallel Branch-and-Bound Application

The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-25 Juliana M. N. Silva , Cristina Boeres , Lúcia M. A. Drummond , Artur A. Pessoa

Sharp Waiting-Time Bounds for Multiserver Jobs

Multiserver jobs, which are jobs that occupy multiple servers simultaneously during service, are prevalent in today's computing clusters. But little is known about the delay performance of systems with multiserver jobs. We consider queueing…

Performance · Computer Science 2023-04-17 Yige Hong , Weina Wang

Fine-Grained Scheduling for Containerized HPC Workloads in Kubernetes Clusters

Containerization technology offers lightweight OS-level virtualization, and enables portability, reproducibility, and flexibility by packing applications with low performance overhead and low effort to maintain and scale them. Moreover,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-22 Peini Liu , Jordi Guitart

High-Performance and Energy-Effcient Memory Scheduler Design for Heterogeneous Systems

When multiple processor cores (CPUs) and a GPU integrated together on the same chip share the off-chip DRAM, requests from the GPU can heavily interfere with requests from the CPUs, leading to low system performance and starvation of cores.…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun , Gabriel H. Loh , Lavanya Subramanian , Kevin Chang , Onur Mutlu

Task Admission Control and Boundary Analysis of Cognitive Cloud Data Centers

A novel cloud data center (DC) model is studied here with cognitive capabilities for real-time (or online) flow compared to the batch tasks. Here, a DC can determine the cost of using resources and an online user or the user with batch…

Systems and Control · Electrical Eng. & Systems 2020-10-07 Wenlong Ni , Yuhong Zhang , Wei Li

Memory-Centric Computing: Solving Computing's Memory Problem

Computing has a huge memory problem. The memory system, consisting of multiple technologies at different levels, is responsible for most of the energy consumption, performance bottlenecks, robustness problems, monetary cost, and hardware…

Hardware Architecture · Computer Science 2025-09-05 Onur Mutlu , Ataberk Olgun , Ismail Emir Yuksel

Understanding the Interactions of Workloads and DRAM Types: A Comprehensive Experimental Study

It has become increasingly difficult to understand the complex interaction between modern applications and main memory, composed of DRAM chips. Manufacturers are now selling and proposing many different types of DRAM, with each DRAM type…

Hardware Architecture · Computer Science 2019-10-21 Saugata Ghose , Tianshi Li , Nastaran Hajinazar , Damla Senol Cali , Onur Mutlu

The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface

Supported by their high power efficiency and recent advancements in High Level Synthesis (HLS), FPGAs are quickly finding their way into HPC and cloud systems. Large amounts of work have been done so far on loop and area optimizations for…

Hardware Architecture · Computer Science 2020-02-17 Hamid Reza Zohouri , Satoshi Matsuoka

DReAM: Dynamic Re-arrangement of Address Mapping to Improve the Performance of DRAMs

The initial location of data in DRAMs is determined and controlled by the 'address-mapping' and even modern memory controllers use a fixed and run-time-agnostic address mapping. On the other hand, the memory access pattern seen at the…

Hardware Architecture · Computer Science 2015-09-15 Mohsen Ghasempour , Jim Garside , Aamer Jaleel , Mikel Luján

Achieving Multi-Port Memory Performance on Single-Port Memory with Coding Techniques

Many performance critical systems today must rely on performance enhancements, such as multi-port memories, to keep up with the increasing demand of memory-access capacity. However, the large area footprints and complexity of existing…

Hardware Architecture · Computer Science 2020-01-28 Hardik Jain , Matthew Edwards , Ethan Elenberg , Ankit Singh Rawat , Sriram Vishwanath

Comparison of Flow Scheduling Policies for Mix of Regular and Deadline Traffic in Datacenter Environments

Datacenters are the main infrastructure on top of which cloud computing services are offered. Such infrastructure may be shared by a large number of tenants and applications generating a spectrum of datacenter traffic. Delay sensitive…

Networking and Internet Architecture · Computer Science 2017-07-21 Mohammad Noormohammadpour , Cauligi S. Raghavendra

Co-Optimizing Cache Partitioning and Multi-Core Task Scheduling: Exploit Cache Sensitivity or Not?

Cache partitioning techniques have been successfully adopted to mitigate interference among concurrently executing real-time tasks on multi-core processors. Considering that the execution time of a cache-sensitive task strongly depends on…

Hardware Architecture · Computer Science 2023-10-05 Binqi Sun , Debayan Roy , Tomasz Kloda , Andrea Bastoni , Rodolfo Pellizzoni , Marco Caccamo

DFModel: Design Space Optimization of Large-Scale Systems Exploiting Dataflow Mappings

We propose DFModel, a modeling framework for mapping dataflow computation graphs onto large-scale systems. Mapping a workload to a system requires optimizing dataflow mappings at various levels, including the inter-chip (between chips)…

Hardware Architecture · Computer Science 2024-12-24 Sho Ko , Nathan Zhang , Olivia Hsu , Ardavan Pedram , Kunle Olukotun

Learning to Remember More with Less Memorization

Memory-augmented neural networks consisting of a neural controller and an external memory have shown potentials in long-term sequential learning. Current RAM-like memory models maintain memory accessing every timesteps, thus they do not…

Machine Learning · Computer Science 2019-03-21 Hung Le , Truyen Tran , Svetha Venkatesh

Storage and Memory Characterization of Data Intensive Workloads for Bare Metal Cloud

As the cost-per-byte of storage systems dramatically decreases, SSDs are finding their ways in emerging cloud infrastructure. Similar trend is happening for main memory subsystem, as advanced DRAM technologies with higher capacity,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-16 Hosein Mohammadi Makrani