Related papers: Reverse Engineering Deep ReLU Networks An Optimiza…
Understanding the fundamental mechanism behind the success of deep neural networks is one of the key challenges in the modern machine learning literature. Despite numerous attempts, a solid theoretical analysis is yet to be developed. In…
Neural networks have shown tremendous potential for reconstructing high-resolution images in inverse problems. The non-convex and opaque nature of neural networks, however, hinders their utility in sensitive applications such as medical…
The theory of deep learning focuses almost exclusively on supervised learning, non-convex optimization using stochastic gradient descent, and overparametrized neural networks. It is common belief that the optimizer dynamics, network…
We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction…
The ongoing decarbonisation of power systems is driving an increasing reliance on distributed energy resources, which introduces complex and nonlinear interactions that are difficult to capture in conventional optimisation models. As a…
It has been widely assumed that a neural network cannot be recovered from its outputs, as the network depends on its parameters in a highly nonlinear way. Here, we prove that in fact it is often possible to identify the architecture,…
Solving non-convex, NP-hard optimization problems is crucial for training machine learning models, including neural networks. However, non-convexity often leads to black-box machine learning models with unclear inner workings. While convex…
Understanding the fundamental principles behind the success of deep neural networks is one of the most important open questions in the current literature. To this end, we study the training problem of deep neural networks and introduce an…
When solving decision-making problems with mathematical optimization, some constraints or objectives may lack analytic expressions but can be approximated from data. When an approximation is made by neural networks, the underlying problem…
Deep neural networks (DNNs), particularly those using Rectified Linear Unit (ReLU) activation functions, have achieved remarkable success across diverse machine learning tasks, including image recognition, audio processing, and language…
Recent work has shown that the training of a one-hidden-layer, scalar-output fully-connected ReLU neural network can be reformulated as a finite-dimensional convex program. Unfortunately, the scale of such a convex program grows…
We develop fast algorithms and robust software for convex optimization of two-layer neural networks with ReLU activation functions. Our work leverages a convex reformulation of the standard weight-decay penalized training problem as a set…
We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We first prove that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set, where simple…
Deep learning-based models have demonstrated remarkable success in solving illposed inverse problems; however, many fail to strictly adhere to the physical constraints imposed by the measurement process. In this work, we introduce a…
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints. Our analysis is novel, characterizes all optimal solutions, and does not leverage…
Deep neural networks, particularly those employing Rectified Linear Units (ReLU), are often perceived as complex, high-dimensional, non-linear systems. This complexity poses a significant challenge to understanding their internal learning…
Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open…
The success of deep neural networks hinges on our ability to accurately and efficiently optimize high-dimensional, non-convex functions. In this paper, we empirically investigate the loss functions of state-of-the-art networks, and how…
This theoretical paper is devoted to developing a rigorous theory for demystifying the global convergence phenomenon in a challenging scenario: learning over-parameterized Rectified Linear Unit (ReLU) nets for very high dimensional dataset…
The non-convexity of the artificial neural network (ANN) training landscape brings inherent optimization difficulties. While the traditional back-propagation stochastic gradient descent (SGD) algorithm and its variants are effective in…