Related papers: Distributed Analysis for Diagnosability in Concurr…
Distributed Systems involve two or more computer systems which may be situated at geographically distinct locations and are connected by a communication network. Due to failures in the communication link, faults arise which may make the…
In this paper we review algorithms for checking diagnosability of discrete-event systems and timed automata. We point out that the diagnosability problems in both cases reduce to the emptiness problem for (timed) B\"uchi automata. Moreover,…
Complex systems are naturally hybrid: their dynamic behavior is both continuous and discrete. For these systems, maintenance and repair are an increasing part of the total cost of final product. Efficient diagnosis and prognosis techniques…
Diagnosability is a system theoretical property characterizing whether fault occurrences in a system can always be detected within a finite time. In this paper, we investigate the verification of diagnosability for cyber-physical systems…
The log-based analysis and trouble-shooting has remained prevalent and commonly used approach for centralized and time-haring systems. However, for parallel and distributed systems where happen-before relations are not directly available…
In this paper an efficient model based diagnostic process is described for systems whose components possess a causal relation between their inputs and their outputs. In this diagnostic process, firstly, a set of focuses on likely broken…
Reliable systems require effective monitoring techniques for fault identification. System-level diagnosis was originally proposed in the 1960s as a test-based approach to monitor and identify faulty components of a general system. Over the…
In the modern world, we are permanently using, leveraging, interacting with, and relying upon systems of ever higher sophistication, ranging from our cars, recommender systems in e-commerce, and networks when we go online, to integrated…
The reliability of a system of components depends on reliability of each component. Thus, the initial statistical work should be the estimation of the reliability of each component of the system. This is not an easy task because when the…
This paper deals with diagnosability of discrete-time nonlinear systems with unknown inputs and quantized outputs. We propose a novel notion of diagnosability that we term approximate diagnosability, corresponding to the possibility of…
Serverless applications can be particularly difficult to troubleshoot, as these applications are often composed of various managed and partly managed services. Faults are often unpredictable and can occur at multiple points, even in simple…
Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components…
Single fault sequential change point problems have become important in modeling for various phenomena in large distributed systems, such as sensor networks. But such systems in many situations present multiple interacting faults. For…
This paper deals with the problem of enforcing modular diagnosability for discrete-event systems that don't satisfy this property by their natural modularity. We introduce an approach to achieve this property combining existing modules into…
In this paper, we review some recent results about the use of dynamic observers for fault diagnosis of discrete event systems. Fault diagnosis consists in synthesizing a diagnoser that observes a given plant and identifies faults in the…
A parallel computer system is a collection of processing elements that communicate and cooperate to solve large computational problems efficiently. To achieve this, at first the large computational problem is partitioned into several tasks…
In this paper, we study a fault-tolerant control for systems consisting of multiple homogeneous components such as parallel processing machines. This type of system is often more robust to uncertainty compared to those with a single…
Large-scale decentralized systems of autonomous agents interacting via asynchronous communication often experience the following self-healing dilemma: fault detection inherits network uncertainties making a remote faulty process…
Motivated by the development and deployment of large-scale dynamical systems, often composed of geographically distributed smaller subsystems, we address the problem of verifying their controllability in a distributed manner. In this work…
Usually, methods evaluating system reliability require engineers to quantify the reliability of each of the system components. For series and parallel systems, there are some options to handle the estimation of each component's reliability.…