Related papers: Revisiting "Qualitatively Characterizing Neural Ne…

Mollifying Networks

The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e.g. it can involve pathological landscapes such as saddle-surfaces…

Machine Learning · Computer Science 2016-08-18 Caglar Gulcehre , Marcin Moczulski , Francesco Visin , Yoshua Bengio

On the Quality of the Initial Basin in Overspecified Neural Networks

Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open…

Machine Learning · Computer Science 2016-06-15 Itay Safran , Ohad Shamir

Essentially No Barriers in Neural Network Energy Landscape

Training neural networks involves finding minima of a high-dimensional non-convex loss function. Knowledge of the structure of this energy landscape is sparse. Relaxing from linear interpolations, we construct continuous paths between…

Machine Learning · Statistics 2019-02-25 Felix Draxler , Kambis Veschgini , Manfred Salmhofer , Fred A. Hamprecht

Learning to Optimize Neural Nets

Learning to Optimize is a recently proposed framework for learning optimization algorithms using reinforcement learning. In this paper, we explore learning an optimization algorithm for training shallow neural nets. Such high-dimensional…

Machine Learning · Computer Science 2017-12-01 Ke Li , Jitendra Malik

Introspection: Accelerating Neural Network Training By Learning Weight Evolution

Neural Networks are function approximators that have achieved state-of-the-art accuracy in numerous machine learning tasks. In spite of their great success in terms of accuracy, their large training time makes it difficult to use them for…

Machine Learning · Computer Science 2017-04-18 Abhishek Sinha , Mausoom Sarkar , Aahitagni Mukherjee , Balaji Krishnamurthy

A framework for measuring the training efficiency of a neural architecture

Measuring Efficiency in neural network system development is an open research problem. This paper presents an experimental framework to measure the training efficiency of a neural architecture. To demonstrate our approach, we analyze the…

Machine Learning · Computer Science 2024-09-13 Eduardo Cueto-Mendoza , John D. Kelleher

Stiffness: A New Perspective on Generalization in Neural Networks

In this paper we develop a new perspective on generalization of neural networks by proposing and investigating the concept of a neural network stiffness. We measure how stiff a network is by looking at how a small gradient step in the…

Machine Learning · Computer Science 2020-03-17 Stanislav Fort , Paweł Krzysztof Nowak , Stanislaw Jastrzebski , Srini Narayanan

Convergence and Implicit Bias of Gradient Flow on Overparametrized Linear Networks

Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon…

Machine Learning · Computer Science 2022-05-17 Hancheng Min , Salma Tarmoun , Rene Vidal , Enrique Mallada

Qualitatively characterizing neural network optimization problems

Training neural networks involves solving large-scale non-convex optimization problems. This task has long been believed to be extremely difficult, with fear of local minima and other obstacles motivating a variety of schemes to improve…

Neural and Evolutionary Computing · Computer Science 2015-05-25 Ian J. Goodfellow , Oriol Vinyals , Andrew M. Saxe

Preprint: Norm Loss: An efficient yet effective regularization method for deep neural networks

Convolutional neural network training can suffer from diverse issues like exploding or vanishing gradients, scaling-based weight space symmetry and covariant-shift. In order to address these issues, researchers develop weight regularization…

Computer Vision and Pattern Recognition · Computer Science 2021-03-12 Theodoros Georgiou , Sebastian Schmitt , Thomas Bäck , Wei Chen , Michael Lew

Meta-Learning with Differentiable Convex Optimization

Many meta-learning approaches for few-shot learning rely on simple base learners such as nearest-neighbor classifiers. However, even in the few-shot regime, discriminatively trained linear predictors can offer better generalization. We…

Computer Vision and Pattern Recognition · Computer Science 2019-04-24 Kwonjoon Lee , Subhransu Maji , Avinash Ravichandran , Stefano Soatto

Manifold Regularization for Memory-Efficient Training of Deep Neural Networks

One of the prevailing trends in the machine- and deep-learning community is to gravitate towards the use of increasingly larger models in order to keep pushing the state-of-the-art performance envelope. This tendency makes access to the…

Machine Learning · Computer Science 2023-05-29 Shadi Sartipi , Edgar A. Bernal

Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also…

Machine Learning · Computer Science 2023-12-07 Nikhil Vyas , Alexander Atanasov , Blake Bordelon , Depen Morwani , Sabarish Sainathan , Cengiz Pehlevan

Deep neural networks are robust to weight binarization and other non-linear distortions

Recent results show that deep neural networks achieve excellent performance even when, during training, weights are quantized and projected to a binary representation. Here, we show that this is just the tip of the iceberg: these same…

Neural and Evolutionary Computing · Computer Science 2016-06-08 Paul Merolla , Rathinakumar Appuswamy , John Arthur , Steve K. Esser , Dharmendra Modha

The role of optimization geometry in single neuron learning

Recent numerical experiments have demonstrated that the choice of optimization geometry used during training can impact generalization performance when learning expressive nonlinear model classes such as deep neural networks. These…

Machine Learning · Computer Science 2022-04-25 Nicholas M. Boffi , Stephen Tu , Jean-Jacques E. Slotine

Loss Patterns of Neural Networks

We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually. The proposed method is used for a thorough empirical…

Machine Learning · Computer Science 2025-11-18 Ivan Skorokhodov , Mikhail Burtsev

Convolutional neural networks with extra-classical receptive fields

Convolutional neural networks (CNNs) have had great success in many real-world applications and have also been used to model visual processing in the brain. However, these networks are quite brittle - small changes in the input image can…

Neurons and Cognition · Quantitative Biology 2018-10-30 Brian Hu , Stefan Mihalas

TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels

State-of-the-art federated learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions. For neural networks, even when centralized SGD easily finds a solution that is…

Machine Learning · Computer Science 2022-10-06 Yaodong Yu , Alexander Wei , Sai Praneeth Karimireddy , Yi Ma , Michael I. Jordan

Towards an Understanding of Neural Networks in Natural-Image Spaces

Two major uncertainties, dataset bias and adversarial examples, prevail in state-of-the-art AI algorithms with deep neural networks. In this paper, we present an intuitive explanation for these issues as well as an interpretation of the…

Computer Vision and Pattern Recognition · Computer Science 2019-02-12 Yifei Fan , Anthony Yezzi

No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths

Understanding the optimization dynamics of neural networks is necessary for closing the gap between theory and practice. Stochastic first-order optimization algorithms are known to efficiently locate favorable minima in deep neural…

Machine Learning · Computer Science 2023-06-22 Charles Guille-Escuret , Hiroki Naganuma , Kilian Fatras , Ioannis Mitliagkas