English
Related papers

Related papers: Optimal Checkpoint Interval with Availability as a…

200 papers

State-of-the-art distributed stream processing systems such as Apache Flink and Storm have recently included checkpointing to provide fault-tolerance for stateful applications. This is a necessary eventuality as these systems head into the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-21 Sachini Jayasekara , Aaron Harwood , Shanika Karunasekera

In this paper, we revisit traditional checkpointing and rollback recovery strategies, with a focus on silent data corruption errors. Contrarily to fail-stop failures, such latent errors cannot be detected immediately, and a mechanism to…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-01 Guillaume Aupy , Anne Benoit , Thomas Hérault , Yves Robert , Frédéric Vivien , Dounia Zaidouni

The machine learning literature contains several constructions for prediction intervals that are intuitively reasonable but ultimately ad-hoc in that they do not come with provable performance guarantees. We present methods from the…

Machine Learning · Statistics 2020-02-25 Danijel Kivaranovic , Kory D. Johnson , Hannes Leeb

Selecting optimal intervals of checkpointing an application is important for minimizing the run time of the application in the presence of system failures. Most of the existing efforts on checkpointing interval selection were developed for…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-02 K. Raghavendra , Sathish S Vadhiyar

Tasks that require information about the world imply a trade-off between the time spent on observation and the variance of the response. In particular, fast decisions need to rely on uncertain information. However, standard estimates of…

Neurons and Cognition · Quantitative Biology 2023-07-18 Sahel Azizpour , Viola Priesemann , Johannes Zierenberg , Anna Levina

Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-30 Dingwen Tao , Sheng Di , Xin Liang , Zizhong Chen , Franck Cappello

State-of-the-art stream processing platforms make use of checkpointing to support fault tolerance, where a "checkpoint tuple" flows through the topology to all operators, indicating a checkpoint and triggering a checkpoint operation. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-17 Sachini Jayasekara , Aaron Harwood , Shanika Karunasekera

Many tasks are subject to failure before completion. Two of the most common failure recovery strategies are restart and checkpointing. Under restart, once a failure occurs, it is restarted from the beginning. Under checkpointing, the task…

Probability · Mathematics 2018-05-15 Antonio Sodre

Optimization is an important module of modern machine learning applications. Tremendous efforts have been made to accelerate optimization algorithms. A common formulation is achieving a lower loss at a given time. This enables a…

Machine Learning · Computer Science 2025-05-29 Zhonglin Xie , Yiman Fong , Haoran Yuan , Zaiwen Wen

We investigate a single machine rescheduling problem that arises from an unexpected machine unavailability, after the given set of jobs has already been scheduled to minimize the total weighted completion time. Such a disruption is…

Data Structures and Algorithms · Computer Science 2017-01-27 Wenchang Luo , Taibo Luo , Randy Goebel , Guohui Lin

This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical first-order analysis of Young and Daly in the presence of a fault prediction system, characterized by its recall and its…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-05 Guillaume Aupy , Yves Robert , Frédéric Vivien , Dounia Zaidouni

We study the problem of conformal prediction in a novel online framework that directly optimizes efficiency. In our problem, we are given a target miscoverage rate $\alpha > 0$, and a time horizon $T$. On each day $t \le T$ an algorithm…

Machine Learning · Computer Science 2025-10-23 Vaidehi Srinivas

Change point estimation in its offline version is traditionally performed by optimizing over the data set of interest, by considering each data point as the true location parameter and computing a data fit criterion. Subsequently, the data…

Methodology · Statistics 2020-04-10 Zhiyuan Lu , Moulinath Banerjee , George Michailidis

While the design of optimal peak-to-peak controllers/observers for linear systems is known to be a difficult problem, this problem becomes interestingly much easier in the context of interval observers because of the positive nature of the…

Optimization and Control · Mathematics 2016-08-01 Corentin Briat , Mustafa Khammash

This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical analysis of Young and Daly in the presence of a fault prediction system, which is characterized by its recall and its…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-10-10 Guillaume Aupy , Yves Robert , Frédéric Vivien , Dounia Zaidouni

This paper investigates a novel offline change-point detection problem from an information-theoretic perspective. In contrast to most related works, we assume that the knowledge of the underlying pre- and post-change distributions are not…

Information Theory · Computer Science 2021-10-05 Haiyun He , Qiaosheng Zhang , Vincent Y. F. Tan

We introduce and study two new inferential challenges associated with the sequential detection of change in a high-dimensional mean vector. First, we seek a confidence interval for the changepoint, and second, we estimate the set of indices…

Methodology · Statistics 2023-03-03 Yudong Chen , Tengyao Wang , Richard J. Samworth

This paper considers online convex optimization with time-varying constraint functions. Specifically, we have a sequence of convex objective functions $\{f_t(x)\}_{t=0}^{\infty}$ and convex constraint functions…

Optimization and Control · Mathematics 2017-02-20 Michael J. Neely , Hao Yu

Inversion and PDE-constrained optimization problems often rely on solving the adjoint problem to calculate the gradient of the objec- tive function. This requires storing large amounts of intermediate data, setting a limit to the largest…

Mathematical Software · Computer Science 2018-02-08 Navjot Kukreja , Jan Hückelheim , Michael Lange , Mathias Louboutin , Andrea Walther , Simon W. Funke , Gerard Gorman

Missing data is pervasive in econometric applications, and rarely is it plausible that the data are missing (completely) at random. This paper proposes a methodology for studying the robustness of results drawn from incomplete datasets.…

Econometrics · Economics 2025-12-29 Daniel Ober-Reynolds
‹ Prev 1 2 3 10 Next ›