Related papers: Distributed Analysis for Diagnosability in Concurr…

Fault Diagnosis for Distributed Systems using Accuracy Technique

Distributed Systems involve two or more computer systems which may be situated at geographically distinct locations and are connected by a communication network. Due to failures in the communication link, faults arise which may make the…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-20 Poorva Kulkarni , Varsha Deshpande , Latika Sarna , Sumedha Shenolikar , Supriya Kelkar

A Note on Fault Diagnosis Algorithms

In this paper we review algorithms for checking diagnosability of discrete-event systems and timed automata. We point out that the diagnosability problems in both cases reduce to the emptiness problem for (timed) B\"uchi automata. Moreover,…

Logic in Computer Science · Computer Science 2016-11-17 Franck Cassez

An Integrated Framework for Diagnosis and Prognosis of Hybrid Systems

Complex systems are naturally hybrid: their dynamic behavior is both continuous and discrete. For these systems, maintenance and repair are an increasing part of the total cost of final product. Efficient diagnosis and prognosis techniques…

Systems and Control · Computer Science 2013-08-27 Elodie Chanthery , Pauline Ribot

Verification of Diagnosability for Cyber-Physical Systems: A Hybrid Barrier Certificate Approach

Diagnosability is a system theoretical property characterizing whether fault occurrences in a system can always be detected within a finite time. In this paper, we investigate the verification of diagnosability for cyber-physical systems…

Systems and Control · Electrical Eng. & Systems 2024-08-14 Bingzhuo Zhong , Weijie Dong , Xiang Yin , Majid Zamani

Diagnosing Distributed Systems through Log Data Analysis

The log-based analysis and trouble-shooting has remained prevalent and commonly used approach for centralized and time-haring systems. However, for parallel and distributed systems where happen-before relations are not directly available…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-09 K. R. Chowdhary , Rajendra Purohit

Efficient Model Based Diagnosis

In this paper an efficient model based diagnostic process is described for systems whose components possess a causal relation between their inputs and their outputs. In this diagnostic process, firstly, a set of focuses on likely broken…

Artificial Intelligence · Computer Science 2022-09-21 Nico Roos

A Distributed System-level Diagnosis Model for the Implementation of Unreliable Failure Detectors

Reliable systems require effective monitoring techniques for fault identification. System-level diagnosis was originally proposed in the 1960s as a test-based approach to monitor and identify faulty components of a general system. Over the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-07 Elias P. Duarte , Luiz A. Rodrigues , Edson T. Camargo , Rogerio Turchetti

Don't Treat the Symptom, Find the Cause! Efficient Artificial-Intelligence Methods for (Interactive) Debugging

In the modern world, we are permanently using, leveraging, interacting with, and relying upon systems of ever higher sophistication, ranging from our cars, recommender systems in e-commerce, and networks when we go online, to integrated…

Artificial Intelligence · Computer Science 2023-06-23 Patrick Rodler

Reliability of components of coherent systems: estimates in presence of masked data

The reliability of a system of components depends on reliability of each component. Thus, the initial statistical work should be the estimation of the reliability of each component of the system. This is not an easy task because when the…

Methodology · Statistics 2018-01-09 Agatha Sacramento Rodrigues , Carlos Alberto de Braganca Pereira , Adriano Polpo

On Approximate Diagnosability of Nonlinear Systems

This paper deals with diagnosability of discrete-time nonlinear systems with unknown inputs and quantized outputs. We propose a novel notion of diagnosability that we term approximate diagnosability, corresponding to the possibility of…

Optimization and Control · Mathematics 2017-04-10 Elena De Santis , Giordano Pola , Maria Domenica Di Benedetto

FaaSter Troubleshooting -- Evaluating Distributed Tracing Approaches for Serverless Applications

Serverless applications can be particularly difficult to troubleshoot, as these applications are often composed of various managed and partly managed services. Faults are often unpredictable and can occur at multiple points, even in simple…

Software Engineering · Computer Science 2024-07-16 Maria C. Borges , Sebastian Werner , Ahmet Kilic

A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Michael Treaster

Simultaneous Sequential Detection of Multiple Interacting Faults

Single fault sequential change point problems have become important in modeling for various phenomena in large distributed systems, such as sensor networks. But such systems in many situations present multiple interacting faults. For…

Information Theory · Computer Science 2015-03-17 Ram Rajagopal , XuanLong Nguyen , Sinem Coleri Ergen , Pravin Varaiya

Virtual Modules in Discrete-Event Systems: Achieving Modular Diagnosability

This paper deals with the problem of enforcing modular diagnosability for discrete-event systems that don't satisfy this property by their natural modularity. We introduce an approach to achieve this property combining existing modules into…

Systems and Control · Computer Science 2013-11-13 Dmitry Myadzelets , Andrea Paoli

Fault Diagnosis with Dynamic Observers

In this paper, we review some recent results about the use of dynamic observers for fault diagnosis of discrete event systems. Fault diagnosis consists in synthesizing a diagnoser that observes a given plant and identifies faults in the…

Formal Languages and Automata Theory · Computer Science 2010-04-19 Franck Cassez , Stavros Tripakis

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems

A parallel computer system is a collection of processing elements that communicate and cooperate to solve large computational problems efficiently. To achieve this, at first the large computational problem is partitioned into several tasks…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-09-09 Ardhendu Mandal , Subhas Chandra Pal

Near-Optimal Design for Fault-Tolerant Systems with Homogeneous Components under Incomplete Information

In this paper, we study a fault-tolerant control for systems consisting of multiple homogeneous components such as parallel processing machines. This type of system is often more robust to uncertainty compared to those with a single…

Optimization and Control · Mathematics 2020-12-03 Jalal Arabneydi , Amir G. Aghdam

Self-healing Dilemmas in Distributed Systems: Fault Correction vs. Fault Tolerance

Large-scale decentralized systems of autonomous agents interacting via asynchronous communication often experience the following self-healing dilemma: fault detection inherits network uncertainties making a remote faulty process…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-25 Jovan Nikolic , Nursultan Jubatyrov , Evangelos Pournaras

Distributed Verification of Structural Controllability for Linear Time-Invariant Systems

Motivated by the development and deployment of large-scale dynamical systems, often composed of geographically distributed smaller subsystems, we address the problem of verifying their controllability in a distributed manner. In this work…

Optimization and Control · Mathematics 2015-06-19 Joao Carvalho , Sergio Pequito , A. Pedro Aguiar , Soummya Kar , Karl H. Johansson

Reliability Estimation in Coherent Systems

Usually, methods evaluating system reliability require engineers to quantify the reliability of each of the system components. For series and parallel systems, there are some options to handle the estimation of each component's reliability.…

Methodology · Statistics 2018-05-29 Agatha Rodrigues , Carlos Alberto Pereira , Adriano Polpo