Related papers: Distributed asynchronous convergence detection wit…
Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations,…
This work considers the problem of detecting signals from multiple sequentially observed data streams, where only one stream can be observed at every time instant. The goal is to detect signals as quickly as possible while controlling the…
We consider sequential change-point detection in parallel data streams, where each stream has its own change point. Once a change is detected in a data stream, this stream is deactivated permanently. The goal is to maximize the normal…
The problem of sequential anomaly detection is considered, where multiple data sources are monitored in real time and the goal is to identify the "anomalous" ones among them, when it is not possible to sample all sources at all times. A…
This paper addresses the distributed convergence detection problem in asynchronous iterations. A modified recursive doubling algorithm is investigated in order to adapt to the non-power-of-two case. Some convergence detection algorithms are…
We develop a mixture procedure to monitor parallel streams of data for a change-point that affects only a subset of them, without assuming a spatial structure relating the data streams to one another. Observations are assumed initially to…
The problem of sequential anomaly detection and identification is considered, where multiple data sources are simultaneously monitored and the goal is to identify in real time those, if any, that exhibit ``anomalous" statistical behavior.…
This paper introduces a novel, fast atomic-snapshot protocol for asynchronous message-passing systems. In the process of defining what ``fast'' means exactly, we spot a few interesting issues that arise when conventional time metrics are…
The problem of minimizing a sum of local convex objective functions over a networked system captures many important applications and has received much attention in the distributed optimization field. Most of existing work focuses on…
Contrary to the sequential world, the processes involved in a distributed system do not necessarily know when a computation is globally finished. This paper investigates the problem of the detection of the termination of local computations.…
A distributed system consisting of a huge number of computational entities is prone to faults, because faults in a few nodes cause the entire system to fail. Consequently, fault tolerance of distributed systems is a critical issue.…
We focus on the problem of checkpointing (or taking a snapshot) in fully replicated eventually consistent distributed databases. In particular, we consider the problem of taking Distributed Transaction-Consistent Snapshots (DTCS). A typical…
Temporal point processes are powerful generative models for event sequences that capture complex dependencies in time-series data. They are commonly specified using autoregressive models that learn the distribution of the next event from…
In many applications one is interested to detect certain (known) patterns in the mean of a process with smallest delay. Using an asymptotic framework which allows to capture that feature, we study a class of appropriate sequential…
We investigate the problem of jointly testing two hypotheses and estimating a random parameter based on data that is observed sequentially by sensors in a distributed network. In particular, we assume the data to be drawn from a Gaussian…
We consider a sequential problem in decentralized detection. Two observers can make repeated noisy observations of a binary hypothesis on the state of the environment. At any time, any of the two observers can stop and send a final message…
This paper considers the detection of change points in parallel data streams, a problem widely encountered when analyzing large-scale real-time streaming data. Each stream may have its own change point, at which its data has a…
In this paper, we revisit traditional checkpointing and rollback recovery strategies, with a focus on silent data corruption errors. Contrarily to fail-stop failures, such latent errors cannot be detected immediately, and a mechanism to…
Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed…
Asynchronous iterations are more and more investigated for both scaling and fault-resilience purpose on high performance computing platforms. While so far, they have been exclusively applied within space domain decomposition frameworks,…