Related papers: An Efficient Approach towards Mitigating Soft Erro…
Importance of addressing soft errors in both safety critical applications and commercial consumer products is increasing, mainly due to ever shrinking geometries, higher-density circuits, and employment of power-saving techniques such as…
Fault tolerance is a critical aspect of modern computing systems, ensuring correct functionality in the presence of faults. This paper presents a comprehensive survey of fault tolerance methods and software-based mitigation techniques in…
Many safety-critical real-time systems operate under harsh environment and are subject to soft errors caused by transient or intermittent faults. It is critical and yet often very challenging to apply fault tolerance techniques in these…
It is now widely accepted that the CMOS technology implementing irreversible logic will hit a scaling limit beyond 2016, and that the increased power dissipation is a major limiting factor. Reversible computing can potentially require…
Three-Dimensional Networks-on-Chips (3D-NoCs) have been proposed as an auspicious solution, merging the high parallelism of the Network-on-Chip (NoC) paradigm with the high-performance and low-power cost of 3D-ICs. However, as technology…
Nanometer circuits are becoming increasingly susceptible to soft-errors due to alpha-particle and atmospheric neutron strikes as device scaling reduces node capacitances and supply/threshold voltage scaling reduces noise margins. It is…
Alpha-particles and cosmic rays cause bit flips in chips. Protection circuits ease the problem, but cost chip area and power, and so designers try hard to optimize them. This leads to bugs: an undetected fault can bring miscalculations, the…
Many real-world data mining applications need varying cost for different types of classification errors and thus call for cost-sensitive classification algorithms. Existing algorithms for cost-sensitive classification are successful in…
In contemporary times, the increasing complexity of the system poses significant challenges to the reliability, trustworthiness, and security of the SACRES. Key issues include the susceptibility to phenomena such as instantaneous voltage…
The Network-on-Chip (NoC) paradigm has been proposed as a favorable solution to handle the strict communication requirements between the increasingly large number of cores on a single chip. However, NoC systems are exposed to the aggressive…
The effects of soft errors in processor cores have been widely studied. However, little has been published about soft errors in uncore components, such as memory subsystem and I/O controllers, of a System-on-a-Chip (SoC). In this work, we…
Soft errors in large VLSI circuits pose dramatic influence on computing- and memory-intensive neural network (NN) processing. Understanding the influence of soft errors on NNs is critical to protect against soft errors for reliable NN…
Efficient low complexity error correcting code(ECC) is considered as an effective technique for mitigation of multi-bit upset (MBU) in the configuration memory(CM)of static random access memory (SRAM) based Field Programmable Gate Array…
Network-on-Chip (NoC) paradigm has been proposed as an auspicious solution to handle the strict communication requirements between the increasingly large number of cores on a single multi and many-core chips. However, NoC systems are…
As technology scales, nano-scale digital circuits face heightened susceptibility to single event upsets (SEUs) and transients (SETs) due to shrinking feature sizes and reduced operating voltages. While logical, electrical, and timing…
Many aerospace and automotive applications use FPGAs in their designs due to their low power and reconfigurability requirements. Meanwhile, such applications also pose a high standard on system reliability, which makes the early-stage…
Quantum error correction codes are usually designed to correct errors regardless of their physical origins. In large-scale devices, this is an essential feature. In smaller-scale devices, however, the main error sources are often…
NAND flash memory is ubiquitous in everyday life today because its capacity has continuously increased and cost has continuously decreased over decades. This positive growth is a result of two key trends: (1) effective process technology…
Fault tolerance in multi-core architecture has attracted attention of research community for the past 20 years. Rapid improvements in the CMOS technology resulted in exponential growth of transistor density. It resulted in increased…
Resilient algorithms in high-performance computing are subject to rigorous non-functional constraints. Resiliency must not increase the runtime, memory footprint or I/O demands too significantly. We propose a task-based soft error detection…