English
Related papers

Related papers: Application-layer Fault-Tolerance Protocols

200 papers

The structures for the expression of fault-tolerance provisions into the application software are the central topic of this paper. Structuring techniques answer the questions "How to incorporate fault-tolerance in the application layer of a…

Software Engineering · Computer Science 2015-04-14 Vincenzo De Florio , Chris Blondia

The embedding of fault tolerance provisions into the application layer of a programming language is a non-trivial task that has not found a satisfactory solution yet. Such a solution is very important, and the lack of a simple, coherent and…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-16 Vincenzo De Florio , G. Deconinck

This book consists of the chapters describing novel approaches to integrating fault tolerance into software development process. They cover a wide range of topics focusing on fault tolerance during the different phases of the software…

Software Engineering · Computer Science 2010-11-09 Patrizio Pelliccione , Henry Muccini , Nicolas Guelfi , Alexander Romanovsky

Understanding the application resilience in the presence of faults is critical to address the HPC resilience challenge. Currently, we largely rely on random fault injection (RFI) to quantify the application resilience. However, RFI provides…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-02 Luanzheng Guo , Hanlin He , Dong Li

The structures for the expression of fault-tolerance provisions into the application software are the central topic of this dissertation. Structuring techniques provide means to control complexity, the latter being a relevant factor for the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-08 Vincenzo De Florio

Fault tolerance is a critical aspect of modern computing systems, ensuring correct functionality in the presence of faults. This paper presents a comprehensive survey of fault tolerance methods and software-based mitigation techniques in…

Systems and Control · Electrical Eng. & Systems 2024-04-17 Mohammadreza Amel Solouki , Shaahin Angizi , Massimo Violante

Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new situations and contexts. At the same time, this adaptability raises uncertainties concerning the run-time product quality or dependability,…

Software Engineering · Computer Science 2022-10-18 Lalli Myllyaho , Mikko Raatikainen , Tomi Männistö , Jukka K. Nurminen , Tommi Mikkonen

Fault tolerance is a key factor of industrial computing systems design. But in practical terms, these systems, like every commercial product, are under great financial constraints and they have to remain in operational state as long as…

Systems and Control · Computer Science 2015-03-31 Andrey A. Shchurov

With the rapid advancements of deep learning in the past decade, it can be foreseen that deep learning will be continuously deployed in more and more safety-critical applications such as autonomous driving and robotics. In this context,…

Hardware Architecture · Computer Science 2022-04-06 Cheng Liu , Zhen Gao , Siting Liu , Xuefei Ning , Huawei Li , Xiaowei Li

Application partitioning and code offloading are being researched extensively during the past few years. Several frameworks for code offloading have been proposed. However, fewer works attempted to address issues occurred with its…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-21 Nevin Vunka Jungum , Nawaz Mohamudally , Nimal Nissanke

Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Michael Treaster

With the increasing complexity of computing systems, complete hardware reliability can no longer be guaranteed. We need, however, to ensure overall system reliability. One of the most important features of artificial neural networks is…

Neural and Evolutionary Computing · Computer Science 2015-10-07 Anton Kulakov , Mark Zwolinski , Jeff Reeve

The idle computers on a local area, campus area, or even wide area network represent a significant computational resource---one that is, however, also unreliable, heterogeneous, and opportunistic. This type of resource has been used…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Adriana Iamnitchi , Ian Foster

At our behest or otherwise, while our software is being executed, a huge variety of design assumptions is continuously matched with the truth of the current condition. While standards and tools exist to express and verify some of these…

Software Engineering · Computer Science 2016-05-09 Vincenzo De Florio

This paper introduces different views for understanding problems and faults with the goal of defining a method for the formal specification of systems. The idea of Layered Fault Tolerant Specification (LFTS) is proposed to make the method…

Software Engineering · Computer Science 2012-07-12 Manuel Mazzara

I will give an overview of what I see as some of the most important future directions in the theory of fault-tolerant quantum computation. In particular, I will give a brief summary of the major problems that need to be solved in fault…

Quantum Physics · Physics 2022-10-31 Daniel Gottesman

Environmental noise (e.g.heat, ionized particles, etc.) causes transient faults in hardware, which lead to corruption of stored values. Mission-critical devices require such faults to be mitigated by fault-tolerance --- a combination of…

Cryptography and Security · Computer Science 2014-10-28 Filippo Del Tedesco , David Sands , Alejandro Russo

With the rapid evolution of Large Language Models (LLMs) and their large-scale experimentation in cloud-computing spaces, the challenge of guaranteeing their security and efficiency in a failure scenario has become a main issue. To ensure…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-18 Yihong Jin , Ze Yang , Xinhe Xu , Yihan Zhang , Shuyang Ji

Data storage systems serve as the foundation of digital society. The enormous data generated by people on a daily basis make the fault tolerance of data storage systems increasingly important. Unfortunately, modern storage systems consist…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-08 Mai Zheng , Duo Zhang , Ahmed Dajani

This short paper describes early experiments to validate the capabilities of a component-based platform to observe and control a software architecture in the small. This is part of a whole process for resilient computing, i.e. targeting the…

Software Engineering · Computer Science 2012-04-06 Miruna Stoicescu , Jean-Charles Fabre , Matthieu Roy
‹ Prev 1 2 3 10 Next ›