Related papers: Fault Tolerance for Service Function Chains
In current scenario several commercial and social organizations are using computer networks for their business and management purposes. In order to meet the business requirements networks are also grow. The growth of network also promotes…
PUBLISHED ON IEEE/ASME TRANSACTIONS ON MECHATRONICS, DOI: 10.1109/TMECH.2021.3100150. Ideally, accurate sensor measurements are needed to achieve a good performance in the closed-loop control of mechatronic systems. As a consequence, sensor…
Middleboxes have become a vital part of modern networks by providing service functions such as content filtering, load balancing and optimization of network traffic. An ordered sequence of middleboxes composing a logical service is called…
This paper addresses Fault Tolerant Control (FTC) of Large Power Systems (LPS) subject to sensor failure. Hiding the fault from the controller allows the nominal controller to remain in the loop. We assume specific faults that violate…
Network functions virtualization (NFV) allows operators to employ NF chains to realize custom policies, and dynamically add instances to meet demand or for failover. NFs maintain detailed per- and cross-flow state which needs careful…
We conducted the feasibility analysis of utilizing a highly available multi-stage architecture for TSN switches used for sending high priority, mission-critical traffic within a bounded latency instead of traditional single-stage…
Coordination services are a fundamental building block of modern cloud systems, providing critical functionalities like configuration management and distributed locking. The major challenge is to achieve low latency and high throughput…
The Low Latency Fault Tolerance (LLFT) system provides fault tolerance for distributed applications, using the leader-follower replication technique. The LLFT system provides application-transparent replication, with strong replica…
Although model-based fault tolerant control (FTC) has become prevalent in various engineering fields, its application to air-conditioning systems is limited due to the lack of control-oriented models to characterize the phase change of…
We study the fault-tolerant variant of the online bin packing problem. Similar to the classic bin packing problem, an online sequence of items of various sizes should be packed into a minimum number of bins of uniform capacity. For…
Fault-tolerant routing (FTR) in Networks-on-Chip (NoCs) has become a common practice to sustain the performance of multi-core systems with an increasing number of faults on a chip. On the other hand, usage of third-party intellectual…
This paper presents an adaptive fault-tolerant control (FTC) scheme for a class of nonlinear uncertain multi-agent systems. A local FTC scheme is designed for each agent using local measurements and suitable information exchanged between…
Queues allow network operators to control traffic: where queues build, they can enforce scheduling and shaping policies. In the Internet today, however, there is a mismatch between where queues build and where control is most effectively…
Fault tolerance overhead of high performance computing (HPC) applications is becoming critical to the efficient utilization of HPC systems at large scale. HPC applications typically tolerate fail-stop failures by checkpointing. Another…
Targeted attacks against network infrastructure are notoriously difficult to guard against. In the case of communication networks, such attacks can leave users vulnerable to censorship and surveillance, even when cryptography is used. Much…
Replicating data across multiple data centers not only allows moving the data closer to the user and, thus, reduces latency for applications, but also increases the availability in the event of a data center failure. Therefore, it is not…
Modern data stores achieve scalability by partitioning data into shards and fault-tolerance by replicating each shard across several servers. A key component of such systems is a Transaction Certification Service (TCS), which atomically…
Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components…
This paper addresses poor cluster formation and frequent Cluster Head (CH) failure issues of underwater sensor networks by proposing an energy-efficient hierarchical topology-aware clustering routing (EEHTAC) protocol. In this paper,…
Transformer models rely on High-Performance Computing (HPC) resources for inference, where soft errors are inevitable in large-scale systems, making the reliability of the model particularly critical. Existing fault tolerance frameworks for…