Related papers: Chip Guard ECC: An Efficient, Low Latency Method
Efficient low complexity error correcting code(ECC) is considered as an effective technique for mitigation of multi-bit upset (MBU) in the configuration memory(CM)of static random access memory (SRAM) based Field Programmable Gate Array…
Modern DRAM modules are often equipped with hardware error correction capabilities, especially for DRAM deployed in large-scale data centers, as process technology scaling has increased the susceptibility of these devices to errors. To…
Modern Deep Learning (DL) workloads are increasingly deployed in safety-critical domains, such as automotive systems and hyperscale data centers, where transient hardware faults pose a serious threat to system reliability. These workloads…
Error Detection and Correction Codes (ECCs) are often used in digital designs to protect data integrity. Especially in safety-critical systems such as automotive electronics, ECCs are widely used and the verification of such complex logic…
Improvements in main memory storage density are primarily driven by process technology scaling, which negatively impacts reliability by exacerbating various circuit-level error mechanisms. To compensate for growing error rates, both memory…
State-of-the-art techniques for addressing scaling-related main memory errors identify and repair bits that are at risk of error from within the memory controller. Unfortunately, modern main memory chips internally use on-die error…
When neural networks (NeuralNets) are implemented in hardware, their weights need to be stored in memory devices. As noise accumulates in the stored weights, the NeuralNet's performance will degrade. This paper studies how to use error…
Coherent parity check (CPC) codes are a new framework for the construction of quantum error correction codes that encode multiple qubits per logical block. CPC codes have a canonical structure involving successive rounds of bit and phase…
In order to achieve fault tolerance, highly reliable system often require the ability to detect errors as soon as they occur and prevent the speared of erroneous information throughout the system. Thus, the need for codes capable of…
Field Programmable Gate Arrays (FPGAs) are more prone to be affected by transient faults in presence of radiation and other environmental hazards compared to Application Specific Integrated Circuits (ASICs). Hence, error mitigation and…
Using Error Detection Code (EDC) and Error Correction Code (ECC) is a noteworthy way to increase cache memories robustness against soft errors. EDC enables detecting errors in cache memory while ECC is used to correct erroneous cache…
Inefficient data transfer between computation and memory inspired emerging processing-in-memory (PIM) technologies. Many PIM solutions enable storage and processing using memristors in a crossbar-array structure, with techniques such as…
The reliability of memory devices is affected by radiation induced soft errors. Multiple cell upsets (MCUs) caused by radiation corrupt data stored in multiple cells within memories. Error correction codes (ECCs) are typically used to…
Synthetic DNA can in principle be used for the archival storage of arbitrary data. Because errors are introduced during DNA synthesis, storage, and sequencing, an error-correcting code (ECC) is necessary for error-free recovery of the data.…
As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per…
The growing demand for highly reliable communication systems drives the research and development of algorithms that identify and correct errors during data transmission and storage. This need becomes even more critical in hard-to-access or…
As DRAM scales to higher density and I/O speeds, ensuring data correctness becomes increasingly difficult. Industry has responded with a three-layer stack: on-die ECC (O-ECC), link ECC (L-ECC), and system ECC (S-ECC). However, these layers…
Computational storage, known as a solution to significantly reduce the latency by moving data-processing down to the data storage, has received wide attention because of its potential to accelerate data-driven devices at the edge. To meet…
We consider analog error-correcting codes (analog ECCs) that are designed to correct/detect outlying errors arising in analog implementations of vector-matrix multiplication. The error-correction/detection capability of an analog ECC can be…
Voltage underscaling below the nominal level is an effective solution for improving energy efficiency in digital circuits, e.g., Field Programmable Gate Arrays (FPGAs). However, further undervolting below a safe voltage level and without…