Related papers: High-level python abstractions for optimal checkpo…
Seismic inversion and imaging are adjoint-based optimization problems that process up to terabytes of data, regularly exceeding the memory capacity of available computers. Data compression is an effective strategy to reduce this memory…
We consider checkpointing strategies that minimize the number of recomputations needed when performing discrete adjoint computations using multistage time-stepping schemes, which requires computing several substeps within one complete time…
Checkpointing is a cornerstone of data-flow reversal in adjoint algorithmic differentiation. Checkpointing is a storage/recomputation trade-off that can be applied at different levels, one of which being the call tree. We are looking for…
PDE-constrained inverse problems are some of the most challenging and computationally demanding problems in computational science today. Fine meshes that are required to accurately compute the PDE solution introduce an enormous number of…
This paper presents a new functionality of the Automatic Differentiation (AD) tool Tapenade. Tapenade generates adjoint codes which are widely used for optimization or inverse problems. Unfortunately, for large applications the adjoint code…
Automated code generation allows for a separation between the development of a model, expressed via a domain specific language, and lower level implementation details. Algorithmic differentiation can be applied symbolically at the level of…
Iterative algorithms aimed at solving some problems are discussed. For certain problems, such as finding a common point in the intersection of a finite number of convex sets, there often exist iterative algorithms that impose very little…
Resilience is a major design goal for HPC. Checkpoint is the most common method to enable resilient HPC. Checkpoint periodically saves critical data objects to non-volatile storage to enable data persistence. However, using checkpoint, we…
This paper introduces a new activation checkpointing method which allows to significantly decrease memory usage when training Deep Neural Networks with the back-propagation algorithm. Similarly to checkpoint-ing techniques coming from the…
Recovery from transient failures is one of the prime issues in the context of distributed systems. These systems demand to have transparent yet efficient techniques to achieve the same. Checkpoint is defined as a designated place in a…
The reliability of concurrent and distributed systems often depends on some well-known techniques for fault tolerance. One such technique is based on checkpointing and rollback recovery. Checkpointing involves processes to take snapshots of…
We consider the problem of minimizing a convex function over the intersection of finitely many simple sets which are easy to project onto. This is an important problem arising in various domains such as machine learning. The main difficulty…
Optimal control problems including partial differential equation (PDE) as well as integer constraints merge the combinatorial difficulties of integer programming and the challenges related to large-scale systems resulting from discretized…
We propose algorithms and software for computing projections onto the intersection of multiple convex and non-convex constraint sets. The software package, called SetIntersectionProjection, is intended for the regularization of inverse…
We present a simplified derivation of the optimal checkpoint interval in Young_1974 [1]. The optimal checkpoint interval derivation in [1] is based on minimizing the total lost time as an objective-function. Lost time is a function of…
Neural ordinary differential equations (neural ODEs) have emerged as a novel network architecture that bridges dynamical systems and deep learning. However, the gradient obtained with the continuous adjoint method in the vanilla neural ODE…
Changepoints are a very common feature of Big Data that arrive in the form of a data stream. In this paper, we study high-dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the…
One approach for reducing run time and improving efficiency of machine learning is to reduce the convergence rate of the optimization algorithm used. Shuffling is an algorithm technique that is widely used in machine learning, but it only…
This paper describes a method for scheduling the events of a switched system to achieve an optimal performance. The approach has guarantees on convergence and computational complexity that parallel derivative-based iterative optimization…
The Performance Estimation Problem (PEP) approach consists in computing worst-case performance bounds on optimization algorithms by solving an optimization problem: one maximizes an error criterion over all initial conditions allowed and…