English
Related papers

Related papers: Heterogeneous-Reliability Memory: Exploiting Appli…

200 papers

Many high end and next generation computing systems to incorporated alternative memory technologies to meet performance goals. Since these technologies present distinct advantages and tradeoffs compared to conventional DDR* SDRAM, such as…

Performance · Computer Science 2021-10-06 M. Ben Olson , Brandon Kammerdiener , Kshitij A. Doshi , Terry Jones , Michael R. Jantz

The current mobile applications have rapidly growing memory footprints, posing a great challenge for memory system design. Insufficient DRAM main memory will incur frequent data swaps between memory and storage, a process that hurts…

Hardware Architecture · Computer Science 2024-03-19 Fei Wen , Mian Qin , Paul Gratz , Narasimha Reddy

Reliability has emerged as a key topic of interest for researchers around the world to detect and/or mitigate the side effects of decreasing transistor sizes, such as soft errors. Traditional solutions, like DMR and TMR, incur significant…

Hardware Architecture · Computer Science 2019-10-22 Bharath Srinivas Prabakaran , Mihika Dave , Florian Kriebel , Semeen Rehman , Muhammad Shafique

Quantum Random Access Memory (QRAM) holds the promise of enabling several large scale applications of quantum computers. However, designing fault tolerant QRAMs for large scale applications is still an open problem due to the poor error and…

Quantum Physics · Physics 2025-12-09 Ansh Singal , Kaitlin N. Smith

With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the much-anticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature…

Remote Memory Access (RMA) is an emerging mechanism for programming high-performance computers and datacenters. However, little work exists on resilience schemes for RMA-based applications and systems. In this paper we analyze fault…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-20 Maciej Besta , Torsten Hoefler

Raw bit errors are common in NAND flash memory and will increase in the future. These errors reduce flash reliability and limit the lifetime of a flash memory device. We aim to improve flash reliability with a multitude of low-cost…

Hardware Architecture · Computer Science 2018-08-16 Yixin Luo

In modern systems, DRAM-based main memory is significantly slower than the processor. Consequently, processors spend a long time waiting to access data from main memory, making the long main memory access latency one of the most critical…

Hardware Architecture · Computer Science 2016-11-01 Donghyuk Lee

AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance, but underprovisioned on density and…

In recent years, high availability and reliability of Data Storage Systems (DSS) have been significantly threatened by soft errors occurring in storage controllers. Due to their specific functionality and hardware-software stack, error…

Performance · Computer Science 2021-12-24 Mostafa Kishani , Mehdi Tahoori , Hossein Asadi

The continuing advancement of memory technology has not only fueled a surge in performance, but also substantially exacerbate reliability challenges. Traditional solutions have primarily focused on improving the efficiency of protection…

Hardware Architecture · Computer Science 2025-09-09 Fan Li , Mimi Xie , Yanan Guo , Huize Li , Xin Xin

Modern DRAM modules are often equipped with hardware error correction capabilities, especially for DRAM deployed in large-scale data centers, as process technology scaling has increased the susceptibility of these devices to errors. To…

Hardware Architecture · Computer Science 2017-06-29 Yixin Luo , Saugata Ghose , Tianshi Li , Sriram Govindan , Bikash Sharma , Bryan Kelly , Amirali Boroumand , Onur Mutlu

A large language model (LLM) is one of the most important emerging machine learning applications nowadays. However, due to its huge model size and runtime increase of the memory footprint, LLM inferences suffer from the lack of memory…

Hardware Architecture · Computer Science 2025-04-22 Soojin Hwang , Jungwoo Kim , Sanghyeon Lee , Hongbeen Kim , Jaehyuk Huh

Due to the diversity and implicit redundancy in terms of processing units and compute kernels, off-the-shelf heterogeneous systems offer the opportunity to detect and tolerate faults during task execution in hardware as well as in software.…

Operating Systems · Computer Science 2014-05-14 Mario Kicherer , Wolfgang Karl

Caching is crucial for enabling high-throughput networks for data intensive applications. Traditional caching technology relies on DRAM, as it can transfer data at a high rate. However, DRAM capacity is subject to contention by most system…

Networking and Internet Architecture · Computer Science 2023-10-12 Faruk Volkan Mutlu , Edmund Yeh

Cloud computing has become inevitable for every digital service which has exponentially increased its usage. However, a tremendous surge in cloud resource demand stave off service availability resulting into outages, performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-08 Deepika Saxena , Ishu Gupta , Ashutosh Kumar Singh , Chung-Nan Lee

The aggressive scaling of technology may have helped to meet the growing demand for higher memory capacity and density, but has also made DRAM cells more prone to errors. Such a reality triggered a lot of interest in modeling DRAM behavior…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-30 Lev Mukhanov , Konstantinos Tovletoglou , Hans Vandierendonck , Dimitrios S. Nikolopoulos , Georgios Karakonstantis

In-memory key-value stores provide consistent low-latency access to all objects which is important for interactive large-scale applications like social media networks or online graph analytics and also opens up new application areas. But,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-17 Kevin Beineke , Stefan Nothaas , Michael Schoettner

High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is…

Hardware Architecture · Computer Science 2019-09-04 Prashant J. Nair

This paper summarizes our work on experimentally characterizing, mitigating, and recovering data retention errors in multi-level cell (MLC) NAND flash memory, which was published in HPCA 2015, and examines the work's significance and future…

Hardware Architecture · Computer Science 2018-05-09 Yu Cai , Yixin Luo , Erich F. Haratsch , Ken Mai , Saugata Ghose , Onur Mutlu
‹ Prev 1 2 3 10 Next ›