Related papers: Optimal Checkpoint Interval with Availability as a…

A Utilization Model for Optimization of Checkpoint Intervals in Distributed Stream Processing Systems

State-of-the-art distributed stream processing systems such as Apache Flink and Storm have recently included checkpointing to provide fault-tolerance for stateful applications. This is a necessary eventuality as these systems head into the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-21 Sachini Jayasekara , Aaron Harwood , Shanika Karunasekera

On the Combination of Silent Error Detection and Checkpointing

In this paper, we revisit traditional checkpointing and rollback recovery strategies, with a focus on silent data corruption errors. Contrarily to fail-stop failures, such latent errors cannot be detected immediately, and a mechanism to…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-01 Guillaume Aupy , Anne Benoit , Thomas Hérault , Yves Robert , Frédéric Vivien , Dounia Zaidouni

Adaptive, Distribution-Free Prediction Intervals for Deep Networks

The machine learning literature contains several constructions for prediction intervals that are intuitively reasonable but ultimately ad-hoc in that they do not come with provable performance guarantees. We present methods from the…

Machine Learning · Statistics 2020-02-25 Danijel Kivaranovic , Kory D. Johnson , Hannes Leeb

Determination of Checkpointing Intervals for Malleable Applications

Selecting optimal intervals of checkpointing an application is important for minimizing the run time of the application in the presence of system failures. Most of the existing efforts on checkpointing interval selection were developed for…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-02 K. Raghavendra , Sathish S Vadhiyar

Available observation time regulates optimal balance between sensitivity and confidence

Tasks that require information about the world imply a trade-off between the time spent on observation and the variance of the response. In particular, fast decisions need to rely on uncertain information. However, standard estimates of…

Neurons and Cognition · Quantitative Biology 2023-07-18 Sahel Azizpour , Viola Priesemann , Johannes Zierenberg , Anna Levina

Improving Performance of Iterative Methods by Lossy Checkponting

Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-30 Dingwen Tao , Sheng Di , Xin Liang , Zizhong Chen , Franck Cappello

Optimal Multi-Level Interval-based Checkpointing for Exascale Stream Processing Systems

State-of-the-art stream processing platforms make use of checkpointing to support fault tolerance, where a "checkpoint tuple" flows through the topology to all operators, indicating a checkpoint and triggering a checkpoint operation. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-17 Sachini Jayasekara , Aaron Harwood , Shanika Karunasekera

Asymptotic efficiency of restart and checkpointing

Many tasks are subject to failure before completion. Two of the most common failure recovery strategies are restart and checkpointing. Under restart, once a failure occurs, it is restarted from the beginning. Under checkpointing, the task…

Probability · Mathematics 2018-05-15 Antonio Sodre

Accelerating Optimization via Differentiable Stopping Time

Optimization is an important module of modern machine learning applications. Tremendous efforts have been made to accelerate optimization algorithms. A common formulation is achieving a lower loss at a given time. This enables a…

Machine Learning · Computer Science 2025-05-29 Zhonglin Xie , Yiman Fong , Haoran Yuan , Zaiwen Wen

On rescheduling due to machine disruption while to minimize the total weighted completion time

We investigate a single machine rescheduling problem that arises from an unexpected machine unavailability, after the given set of jobs has already been scheduled to minimize the total weighted completion time. Such a disruption is…

Data Structures and Algorithms · Computer Science 2017-01-27 Wenchang Luo , Taibo Luo , Randy Goebel , Guohui Lin

Checkpointing algorithms and fault prediction

This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical first-order analysis of Young and Daly in the presence of a fault prediction system, characterized by its recall and its…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-05 Guillaume Aupy , Yves Robert , Frédéric Vivien , Dounia Zaidouni

Online Conformal Prediction with Efficiency Guarantees

We study the problem of conformal prediction in a novel online framework that directly optimizes efficiency. In our problem, we are given a target miscoverage rate $\alpha > 0$, and a time horizon $T$. On each day $t \le T$ an algorithm…

Machine Learning · Computer Science 2025-10-23 Vaidehi Srinivas

Intelligent sampling for multiple change-points in exceedingly long time series with rate guarantees

Change point estimation in its offline version is traditionally performed by optimizing over the data set of interest, by considering each data point as the true location parameter and computing a data fit criterion. Subsequently, the data…

Methodology · Statistics 2020-04-10 Zhiyuan Lu , Moulinath Banerjee , George Michailidis

Interval peak-to-peak observers for continuous- and discrete-time systems with persistent inputs and delays

While the design of optimal peak-to-peak controllers/observers for linear systems is known to be a difficult problem, this problem becomes interestingly much easier in the context of interval observers because of the positive nature of the…

Optimization and Control · Mathematics 2016-08-01 Corentin Briat , Mustafa Khammash

Impact of fault prediction on checkpointing strategies

This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical analysis of Young and Daly in the presence of a fault prediction system, which is characterized by its recall and its…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-10-10 Guillaume Aupy , Yves Robert , Frédéric Vivien , Dounia Zaidouni

Optimal Change-Point Detection with Training Sequences in the Large and Moderate Deviations Regimes

This paper investigates a novel offline change-point detection problem from an information-theoretic perspective. In contrast to most related works, we assume that the knowledge of the underlying pre- and post-change distributions are not…

Information Theory · Computer Science 2021-10-05 Haiyun He , Qiaosheng Zhang , Vincent Y. F. Tan

Inference in high-dimensional online changepoint detection

We introduce and study two new inferential challenges associated with the sequential detection of change in a high-dimensional mean vector. First, we seek a confidence interval for the changepoint, and second, we estimate the set of indices…

Methodology · Statistics 2023-03-03 Yudong Chen , Tengyao Wang , Richard J. Samworth

Online Convex Optimization with Time-Varying Constraints

This paper considers online convex optimization with time-varying constraint functions. Specifically, we have a sequence of convex objective functions $\{f_t(x)\}_{t=0}^{\infty}$ and convex constraint functions…

Optimization and Control · Mathematics 2017-02-20 Michael J. Neely , Hao Yu

High-level python abstractions for optimal checkpointing in inversion problems

Inversion and PDE-constrained optimization problems often rely on solving the adjoint problem to calculate the gradient of the objec- tive function. This requires storing large amounts of intermediate data, setting a limit to the largest…

Mathematical Software · Computer Science 2018-02-08 Navjot Kukreja , Jan Hückelheim , Michael Lange , Mathias Louboutin , Andrea Walther , Simon W. Funke , Gerard Gorman

Robustness to missing data: breakdown point analysis

Missing data is pervasive in econometric applications, and rarely is it plausible that the data are missing (completely) at random. This paper proposes a methodology for studying the robustness of results drawn from incomplete datasets.…

Econometrics · Economics 2025-12-29 Daniel Ober-Reynolds