Related papers: Qualitatively characterizing neural network optimi…
Training a classifier under non-convex constraints has gotten increasing attention in the machine learning community thanks to its wide range of applications such as algorithmic fairness and class-imbalanced classification. However, several…
The optimization problem behind neural networks is highly non-convex. Training with stochastic gradient descent and variants requires careful parameter tuning and provides no guarantee to achieve the global optimum. In contrast we show…
Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of neural networks. The theory behind, however, is still largely unknown. This paper studies this fundamental problem…
Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the…
Optimization problems are ubiquitous in our societies and are present in almost every segment of the economy. Most of these optimization problems are NP-hard and computationally demanding, often requiring approximate solutions for…
While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is…
Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the…
Training deep neural networks for solving machine learning problems is one great challenge in the field, mainly due to its associated optimisation problem being highly non-convex. Recent developments have suggested that many training…
This article reviews modern optimization methods for training neural networks with an emphasis on efficiency and scale. We present state-of-the-art optimization algorithms under a unified algorithmic template that highlights the importance…
This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA)…
Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of…
Neural networks have been used prominently in several machine learning and statistics applications. In general, the underlying optimization of neural networks is non-convex which makes their performance analysis challenging. In this paper,…
Neural networks have become ubiquitous tools for solving signal and image processing problems, and they often outperform standard approaches. Nevertheless, training neural networks is a challenging task in many applications. The prevalent…
Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the…
Despite the recent development in machine learning, most learning systems are still under the concept of "black box", where the performance cannot be understood and derived. With the rise of safety and privacy concerns in public, designing…
The integration of optimization problems within neural network architectures represents a fundamental shift from traditional approaches to handling constraints in deep learning. While it is long known that neural networks can incorporate…
Optimization for deep networks is currently a very active area of research. As neural networks become deeper, the ability in manually optimizing the network becomes harder. Mini-batch normalization, identification of effective respective…
A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low…
Linear networks provide valuable insights into the workings of neural networks in general. This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in…
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more…