English
Related papers

Related papers: Workload-Aware DRAM Error Prediction using Machine…

200 papers

It has become increasingly difficult to understand the complex interaction between modern applications and main memory, composed of DRAM chips. Manufacturers are now selling and proposing many different types of DRAM, with each DRAM type…

Hardware Architecture · Computer Science 2019-10-21 Saugata Ghose , Tianshi Li , Nastaran Hajinazar , Damla Senol Cali , Onur Mutlu

Main memory's rising energy consumption has emerged as a critical challenge in modern computing architectures, particularly in large-scale systems, driven by frequent access patterns, growing data volumes, and insufficient power management…

To protect multicores from soft-error perturbations, resiliency schemes have been developed with high coverage but high power and performance overheads. Emerging safety-critical machine learning applications are increasingly being deployed…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-07-11 Qingchuan Shi , Hamza Omar , Omer Khan

This dissertation rigorously characterizes many modern commodity DRAM devices and shows that by exploiting DRAM access timing margins within manufacturer-recommended DRAM timing specifications, we can significantly improve system…

Hardware Architecture · Computer Science 2021-09-30 Jeremie S. Kim

The energy consumption of DRAM is a critical concern in modern computing systems. Improvements in manufacturing process technology have allowed DRAM vendors to lower the DRAM supply voltage conservatively, which reduces some of the DRAM…

Dynamic Random Access Memory (DRAM) is the de-facto choice for main memory devices due to its cost-effectiveness. It offers a larger capacity and higher bandwidth compared to SRAM but is slower than the latter. With each passing generation,…

Hardware Architecture · Computer Science 2022-01-19 Kaustav Goswami , Hemanta Kumar Mondal , Shirshendu Das , Dip Sankar Banerjee

The pivotal issue of reliability is one of colossal concern for circuit designers. The driving force is transistor aging, dependent on operating voltage and workload. At the design time, it is difficult to estimate close-to-the-edge…

Machine Learning · Computer Science 2024-02-07 Paul R. Genssler , Hamza E. Barkam , Karthik Pandaram , Mohsen Imani , Hussam Amrouch

The demand for accurate information about the internal structure and characteristics of dynamic random-access memory (DRAM) has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing…

Cryptography and Security · Computer Science 2023-08-15 Hwayong Nam , Seungmin Baek , Minbok Wi , Michael Jaemin Kim , Jaehyun Park , Chihun Song , Nam Sung Kim , Jung Ho Ahn

Variation has been shown to exist across the cells within a modern DRAM chip. We empirically demonstrate a new form of variation that exists within a real DRAM chip, induced by the design and placement of different components in the DRAM…

RowHammer is a DRAM vulnerability that can cause bit errors in a victim DRAM row solely by accessing its neighboring DRAM rows at a high-enough rate. Recent studies demonstrate that new DRAM devices are becoming increasingly vulnerable to…

Cryptography and Security · Computer Science 2024-06-04 Lois Orosa , Ulrich Rührmair , A. Giray Yaglikci , Haocong Luo , Ataberk Olgun , Patrick Jattke , Minesh Patel , Jeremie Kim , Kaveh Razavi , Onur Mutlu

This paper summarizes our work on characterizing application memory error vulnerability to optimize datacenter cost via Heterogeneous-Reliability Memory (HRM), which was published in DSN 2014, and examines the work's significance and future…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-11 Yixin Luo , Sriram Govindan , Bikash Sharma , Mark Santaniello , Justin Meza , Aman Kansal , Jie Liu , Badriddine Khessib , Kushagra Vaid , Onur Mutlu

Failed workloads that consumed significant computational resources in time and space affect the efficiency of data centers significantly and thus limit the amount of scientific work that can be achieved. While the computational power has…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Jie Li , Rui Wang , Ghazanfar Ali , Tommy Dang , Alan Sill , Yong Chen

Ageing detection and failure prediction are essential in many Internet of Things (IoT) deployments, which operate huge quantities of embedded devices unattended in the field for years. In this paper, we present a large-scale empirical…

Hardware Architecture · Computer Science 2023-07-14 Leandro Lanzieri , Peter Kietzmann , Goerschwin Fey , Holger Schlarb , Thomas C. Schmidt

The rise of transient faults in modern hardware requires system designers to consider errors occurring at runtime. Both hardware- and software-based error handling must be deployed to meet application reliability requirements. The level of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-23 Björn Bönninghoff , Horst Schirmeier

This paper summarizes our work on experimental characterization and analysis of reduced-voltage operation in modern DRAM chips, which was published in SIGMETRICS 2017, and examines the work's significance and future potential. We take a…

Large-scale datacenters often experience memory failures, where Uncorrectable Errors (UEs) highlight critical malfunction in Dual Inline Memory Modules (DIMMs). Existing approaches primarily utilize Correctable Errors (CEs) to predict UEs,…

Hardware Architecture · Computer Science 2024-12-17 Qiao Yu , Wengui Zhang , Min Zhou , Jialiang Yu , Zhenli Sheng , Jasmin Bogatinovski , Jorge Cardoso , Odej Kao

Modern DRAM modules are often equipped with hardware error correction capabilities, especially for DRAM deployed in large-scale data centers, as process technology scaling has increased the susceptibility of these devices to errors. To…

Hardware Architecture · Computer Science 2017-06-29 Yixin Luo , Saugata Ghose , Tianshi Li , Sriram Govindan , Bikash Sharma , Bryan Kelly , Amirali Boroumand , Onur Mutlu

Dynamic Random Access Memory (DRAM) is pervasive in computer systems. Cell vulnerabilities caused by unintended phenomena (forced retention failure, latency alteration, rowhammer and rowpress) lead to unintended bit flips in memory. These…

Cryptography and Security · Computer Science 2026-03-20 Zilong Hu , Hongming Fei , Prosanta Gope , Jack Miskelly , Owen Millwood , Biplab Sikdar

Graphics processing units (GPUs) are the de facto standard for processing deep learning (DL) tasks. Meanwhile, GPU failures, which are inevitable, cause severe consequences in DL tasks: they disrupt distributed trainings, crash inference…

Machine Learning · Computer Science 2022-01-31 Heting Liu , Zhichao Li , Cheng Tan , Rongqiu Yang , Guohong Cao , Zherui Liu , Chuanxiong Guo

In this paper, we present a comprehensive analysis investigating the reliability of SSD-based I/O caching architectures used in enterprise storage systems under power failure and high-operating temperature. We explore variety of SSDs from…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-04 Saba Ahmadian , Farhad Taheri , Hossein Asadi
‹ Prev 1 2 3 10 Next ›