Related papers: Optimization and Learning With Nonlocal Calculus
It seems that in the current age, computers, computation, and data have an increasingly important role to play in scientific research and discovery. This is reflected in part by the rise of machine learning and artificial intelligence,…
Extrapolation is a well-known technique for solving convex optimization and variational inequalities and recently attracts some attention for non-convex optimization. Several recent works have empirically shown its success in some machine…
Nonlocal operators of fractional type are a popular modeling choice for applications that do not adhere to classical diffusive behavior; however, one major challenge in nonlocal simulations is the selection of model parameters. In this work…
We consider the problem of approximating a function by an element of a nonlinear manifold which admits a differentiable parametrization, typical examples being neural networks with differentiable activation functions or tensor networks.…
Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose…
This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient…
Increasing effort is put into the development of methods for learning mechanistic models from data. This task entails not only the accurate estimation of parameters but also a suitable model structure. Recent work on the discovery of…
We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in saddle points.…
This paper investigates asymptotic behaviors of gradient descent algorithms (particularly accelerated gradient descent and stochastic gradient descent) in the context of stochastic optimization arising in statistics and machine learning…
Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods…
We use differential equations based approaches to provide some {\it \textbf{physics}} insights into analyzing the dynamics of popular optimization algorithms in machine learning. In particular, we study gradient descent, proximal gradient…
Nonlocal neural networks have been proposed and shown to be effective in several computer vision tasks, where the nonlocal operations can directly capture long-range dependencies in the feature space. In this paper, we study the nature of…
It is generally thought that the use of stochastic activation functions in deep learning architectures yield models with superior generalization abilities. However, a sufficiently rigorous statement and theoretical proof of this heuristic…
We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by…
Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods…
Nonlocal and fractional-order models capture effects that classical partial differential equations cannot describe; for this reason, they are suitable for a broad class of engineering and scientific applications that feature multiscale or…
Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of…
We consider the problem of optimising the expected value of a loss functional over a nonlinear model class of functions, assuming that we have only access to realisations of the gradient of the loss. This is a classical task in statistics,…
Many modern learning tasks involve fitting nonlinear models to data which are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Due to this overparameterization, the training…
A nonlocal vector calculus was introduced in [2] that has proved useful for the analysis of the peridynamics model of nonlocal mechanics and nonlocal diffusion models. A generalization is developed that provides a more general setting for…