Related papers: Zero Coordinate Shift: Whetted Automatic Different…
Automatic differentiation (AD) is a set of techniques that systematically applies the chain rule to compute the gradients of functions without requiring human intervention. Although the fundamentals of this technology were established…
Training Artificial Neural Networks poses a challenging and critical problem in machine learning. Despite the effectiveness of gradient-based learning methods, such as Stochastic Gradient Descent (SGD), in training neural networks, they do…
Coordinate descent (CD) algorithms have become the method of choice for solving a number of optimization problems in machine learning. They are particularly popular for training linear models, including linear support vector machine…
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more…
Zeroth-Order (ZO) optimization is pivotal for scenarios where backpropagation is unavailable, such as memory-constrained on-device learning and black-box optimization. However, existing methods face a stark trade-off: they are either…
Automatic differentiation (AD) is an ensemble of techniques that allow to evaluate accurate numerical derivatives of a mathematical function expressed in a computer programming language. In this paper we use AD for stating and solving solid…
Derivative computation is a key component of optimization, sensitivity analysis, uncertainty quantification, and nonlinear solvers. Automatic differentiation (AD) is a powerful technique for evaluating such derivatives, and in recent years,…
It has been reported that the communication cost for synchronizing gradients can be a bottleneck, which limits the scalability of distributed deep learning. Using low-precision gradients is a promising technique for reducing the bandwidth…
The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD…
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program. AD exploits the fact that every computer program, no matter how…
A resistive memory device-based computing architecture is one of the promising platforms for energy-efficient Deep Neural Network (DNN) training accelerators. The key technical challenge in realizing such accelerators is to accumulate the…
Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering, especially in scenarios featuring complex domains or incorporation of empirical data. One…
Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep…
Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural…
Change-point detection (CPD) is crucial for identifying abrupt shifts in data, which influence decision-making and efficient resource allocation across various domains. To address the challenges posed by the costly and time-intensive data…
Algorithmic differentiation (AD) tools allow to obtain gradient information of a continuously differentiable objective function in a computationally cheap way using the so-called backward mode. It is common practice to use the same tools…
Semantic segmentation requires a holistic understanding of the physical world, as it assigns semantic labels to spatially continuous and structurally coherent objects rather than to isolated pixels. However, existing data-free knowledge…
Many engineering problems involve learning hidden dynamics from indirect observations, where the physical processes are described by systems of partial differential equations (PDE). Gradient-based optimization methods are considered…
Although distributed machine learning has opened up many new and exciting research frontiers, fragmentation of models and data across different machines, nodes, and sites still results in considerable communication overhead, impeding…
Zero-shot classification is a generalization task where no instance from the target classes is seen during training. To allow for test-time transfer, each class is annotated with semantic information, commonly in the form of attributes or…