Related papers: Data-driven effective model shows a liquid-like de…
Deep neural networks are workhorse models in machine learning with multiple layers of non-linear functions composed in series. Their loss function is highly non-convex, yet empirically even gradient descent minimisation is sufficient to…
Deep generative models such as flow and diffusion models have proven to be effective in modeling high-dimensional and complex data types such as videos or proteins, and this has motivated their use in different data modalities, such as…
There are many surprising and perhaps counter-intuitive properties of optimization of deep neural networks. We propose and experimentally verify a unified phenomenological model of the loss landscape that incorporates many of them. High…
We develop a statistical mechanical approach based on the replica method to study the design space of deep and wide neural networks constrained to meet a large number of training data. Specifically, we analyze the configuration space of the…
Deep learning models are often considered black boxes due to their complex hierarchical transformations. Identifying suitable architectures is crucial for maximizing predictive performance with limited data. Understanding the geometric…
Next generation deep neural networks for classification hosted on embedded platforms will rely on fast, efficient, and accurate learning algorithms. Initialization of weights in learning networks has a great impact on the classification…
Neural networks are powerful models that solve a variety of complex real-world problems. However, the stochastic nature of training and large number of parameters in a typical neural model makes them difficult to evaluate via inspection.…
Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. It has outperformed conventional methods in various fields and achieved great…
We develop a geometric framework to study the structure and function of complex networks. We assume that hyperbolic geometry underlies these networks, and we show that with this assumption, heterogeneous degree distributions and strong…
We scrutinize the structural and operational aspects of deep learning models, particularly focusing on the nuances of learnable parameters (weight) statistics, distribution, node interaction, and visualization. By establishing correlations…
Machine learning problems have an intrinsic geometric structure as central objects including a neural network's weight space and the loss function associated with a particular task can be viewed as encoding the intrinsic geometry of a given…
In comparison to classical shallow representation learning techniques, deep neural networks have achieved superior performance in nearly every application benchmark. But despite their clear empirical advantages, it is still not well…
Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging,…
As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with…
This paper presents an empirical study on the weights of neural networks, where we interpret each model as a point in a high-dimensional space -- the neural weight space. To explore the complex structure of this space, we sample from a…
The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem. Some insights were recently gained using spin glass…
Why do deep neural networks (DNNs) benefit from very high dimensional parameter spaces? Their huge parameter complexities vs stunning performance in practice is all the more intriguing and not explainable using the standard theory of model…
Recent work on mode connectivity in the loss landscape of deep neural networks has demonstrated that the locus of (sub-)optimal weight vectors lies on continuous paths. In this work, we train a neural network that serves as a hypernetwork,…
Understanding the reasons for the success of deep neural networks trained using stochastic gradient-based methods is a key open problem for the nascent theory of deep learning. The types of data where these networks are most successful,…
In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of…