Related papers: Differentiable Integer Linear Programming is not D…
When samples have internal structure, we often see a mismatch between the objective optimized during training and the model's goal during inference. For example, in sequence-to-sequence modeling we are interested in high-quality translated…
Structured prediction involves learning to predict complex structures rather than simple scalar values. The main challenge arises from the non-Euclidean nature of the output space, which generally requires relaxing the problem formulation.…
Research efforts of the past fifty years have led to a development of linear integer programming as a mature discipline of mathematical optimization. Such a level of maturity has not been reached when one considers nonlinear systems subject…
Differentiable programming is a new programming paradigm which enables large scale optimization through automatic calculation of gradients also known as auto-differentiation. This concept emerges from deep learning, and has also been…
Validation is a major challenge in differentiable programming. The state of the art is based on algorithmic differentiation. Consistency of first-order tangent and adjoint programs is defined by a well-known first-order differential…
Differentiable programming is a fresh programming paradigm which composes parameterized algorithmic components and trains them using automatic differentiation (AD). The concept emerges from deep learning but is not only limited to training…
Many important computer vision tasks are naturally formulated to have a non-differentiable objective. Therefore, the standard, dominant training procedure of a neural network is not applicable since back-propagation requires the gradients…
Various software efforts embrace the idea that object oriented programming enables a convenient implementation of the chain rule, facilitating so-called automatic differentiation via backpropagation. Such frameworks have no mechanism for…
We consider a class of stochastic programming problems where the implicitly decision-dependent random variable follows a nonparametric regression model with heteroscedastic error. The Clarke subdifferential and surrogate functions are not…
A line of recent works established that when training linear predictors over separable data, using gradient methods and exponentially-tailed losses, the predictors asymptotically converge in direction to the max-margin predictor. As a…
We study the problem of detecting infeasibility of large-scale linear programming problems using the primal-dual hybrid gradient method (PDHG) of Chambolle and Pock (2011). The literature on PDHG has mostly focused on settings where the…
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The…
Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming…
Iterative optimization algorithms depend on access to information about the objective function. In a differentiable programming framework, this information, such as gradients, can be automatically derived from the computational graph. We…
Iterative refinement (IR) is a popular scheme for solving a linear system of equations based on gradually improving the accuracy of an initial approximation. Originally developed to improve upon the accuracy of Gaussian elimination,…
In Computational Science, Engineering and Finance (CSEF) scripts typically serve as the "glue" between potentially highly complex and computationally expensive external subprograms. Differentiability of the resulting programs turns out to…
A method for estimating theoretical predictability of time series is presented, based on information-theoretic functionals---redundancies and surrogate data technique. The redundancy, designed for a chosen model and a prediction horizon,…
In this work we present a model for computation of random processes in digital computers which solves the problem of periodic sequences and hidden errors produced by correlations. We show that systems with non-invertible non-linearities can…
A wide variety of transition-based algorithms are currently used for dependency parsers. Empirical studies have shown that performance varies across different treebanks in such a way that one algorithm outperforms another on one treebank…
Linearizability is the gold standard among algorithm designers for deducing the correctness of a distributed algorithm using implemented shared objects from the correctness of the corresponding algorithm using atomic versions of the same…