Related papers: A Neural Network Based on First Principles
In most data-scientific approaches, the principle of Maximum Entropy (MaxEnt) is used to a posteriori justify some parametric model which has been already chosen based on experience, prior knowledge or computational simplicity. In a…
The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only its size, however small, typically involves restarting the training process. In contrast to this, we begin training…
We propose a new attribution method for neural networks developed using first principles of causality (to the best of our knowledge, the first such). The neural network architecture is viewed as a Structural Causal Model, and a methodology…
Artificial networks have been studied through the prism of statistical mechanics as disordered systems since the 80s, starting from the simple models of Hopfield's associative memory and the single-neuron perceptron classifier. Assuming…
We introduce a class of neural networks derived from probabilistic models in the form of Bayesian belief networks. By imposing additional assumptions about the nature of the probabilistic models represented in the belief networks, we derive…
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of…
A residual network (or ResNet) is a standard deep neural net architecture, with state-of-the-art performance across numerous applications. The main premise of ResNets is that they allow the training of each layer to focus on fitting just…
Optimization results are one method for understanding neural computation from Nature's perspective and for defining the physical limits on neuron-like engineering. Earlier work looks at individual properties or performance criteria and…
The outputs of non-linear feed-forward neural network are positive, which could be treated as probability when they are normalized to one. If we take Entropy-Based Principle into consideration, the outputs for each sample could be…
How to develop slim and accurate deep neural networks has become crucial for real- world applications, especially for those employed in embedded systems. Though previous work along this research line has shown some promising results, most…
Recently, neural networks have demonstrated remarkable capabilities in mapping two arbitrary sets to two linearly separable sets. The prospect of achieving this with randomly initialized neural networks is particularly appealing due to the…
We propose a novel method for network inference from partially observed edges using a node-specific degree prior. The degree prior is derived from observed edges in the network to be inferred, and its hyper-parameters are determined by…
In an attempt to better understand structural benefits and generalization power of deep neural networks, we firstly present a novel graph theoretical formulation of neural network models, including fully connected, residual network (ResNet)…
Research on neural networks has gained significant momentum over the past few years. Because training is a resource-intensive process and training data cannot always be made available to everyone, there has been a trend to reuse pre-trained…
We explore a data-driven approach for learning to optimize neural networks. We construct a dataset of neural network checkpoints and train a generative model on the parameters. In particular, our model is a conditional diffusion transformer…
We define a neural network as a septuple consisting of (1) a state vector, (2) an input projection, (3) an output projection, (4) a weight matrix, (5) a bias vector, (6) an activation map and (7) a loss function. We argue that the loss…
An important characteristic of neural networks is their ability to learn representations of the input data with effective features for prediction, which is believed to be a key factor to their superior empirical performance. To better…
Today, the dominant paradigm for training neural networks involves minimizing task loss on a large dataset. Using world knowledge to inform a model, and yet retain the ability to perform end-to-end training remains an open question. In this…
Neural networks are complex functions of both their inputs and parameters. Much prior work in deep learning theory analyzes the distribution of network outputs at a fixed a set of inputs (e.g. a training dataset) over random initializations…
We introduce network with sub-networks, a neural network which its weight layers could be detached into sub-neural networks during inference. To develop weights and biases which could be inserted in both base and sub-neural networks,…