Related papers: A Neural Network Based on First Principles

Entropy-based Characterization of Modeling Constraints

In most data-scientific approaches, the principle of Maximum Entropy (MaxEnt) is used to a posteriori justify some parametric model which has been already chosen based on experience, prior knowledge or computational simplicity. In a…

Methodology · Statistics 2022-06-29 Orestis Loukas , Ho Ryun Chung

Self-Expanding Neural Networks

The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only its size, however small, typically involves restarting the training process. In contrast to this, we begin training…

Machine Learning · Computer Science 2024-02-12 Rupert Mitchell , Robin Menzenbach , Kristian Kersting , Martin Mundt

Neural Network Attributions: A Causal Perspective

We propose a new attribution method for neural networks developed using first principles of causality (to the best of our knowledge, the first such). The neural network architecture is viewed as a Structural Causal Model, and a methodology…

Machine Learning · Computer Science 2019-07-04 Aditya Chattopadhyay , Piyushi Manupriya , Anirban Sarkar , Vineeth N Balasubramanian

Neural networks: from the perceptron to deep nets

Artificial networks have been studied through the prism of statistical mechanics as disordered systems since the 80s, starting from the simple models of Hopfield's associative memory and the single-neuron perceptron classifier. Assuming…

Disordered Systems and Neural Networks · Physics 2023-04-14 Marylou Gabrié , Surya Ganguli , Carlo Lucibello , Riccardo Zecchina

Neural Networks Processing Mean Values of Random Variables

We introduce a class of neural networks derived from probabilistic models in the form of Bayesian belief networks. By imposing additional assumptions about the nature of the probabilistic models represented in the belief networks, we derive…

Disordered Systems and Neural Networks · Physics 2007-05-23 M. J. Barber , J. W. Clark , C. H. Anderson

Deep Networks from the Principle of Rate Reduction

This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of…

Machine Learning · Computer Science 2020-10-30 Kwan Ho Ryan Chan , Yaodong Yu , Chong You , Haozhi Qi , John Wright , Yi Ma

Are ResNets Provably Better than Linear Predictors?

A residual network (or ResNet) is a standard deep neural net architecture, with state-of-the-art performance across numerous applications. The main premise of ResNets is that they allow the training of each layer to focus on fitting just…

Machine Learning · Computer Science 2018-09-28 Ohad Shamir

Neural computation from first principles: Using the maximum entropy method to obtain an optimal bits-per-joule neuron

Optimization results are one method for understanding neural computation from Nature's perspective and for defining the physical limits on neuron-like engineering. Earlier work looks at individual properties or performance criteria and…

Neurons and Cognition · Quantitative Biology 2017-12-21 William B Levy , Toby Berger , Mustafa Sungkar

Max-Entropy Feed-Forward Clustering Neural Network

The outputs of non-linear feed-forward neural network are positive, which could be treated as probability when they are normalized to one. If we take Entropy-Based Principle into consideration, the outputs for each sample could be…

Machine Learning · Computer Science 2015-06-12 Han Xiao , Xiaoyan Zhu

Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

How to develop slim and accurate deep neural networks has become crucial for real- world applications, especially for those employed in embedded systems. Though previous work along this research line has shown some promising results, most…

Neural and Evolutionary Computing · Computer Science 2019-10-02 Xin Dong , Shangyu Chen , Sinno Jialin Pan

Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable

Recently, neural networks have demonstrated remarkable capabilities in mapping two arbitrary sets to two linearly separable sets. The prospect of achieving this with randomly initialized neural networks is particularly appealing due to the…

Machine Learning · Computer Science 2023-10-10 Promit Ghosal , Srinath Mahankali , Yihang Sun

Network Inference by Learned Node-Specific Degree Prior

We propose a novel method for network inference from partially observed edges using a node-specific degree prior. The degree prior is derived from observed edges in the network to be inferred, and its hyper-parameters are determined by…

Machine Learning · Statistics 2016-02-09 Qingming Tang , Lifu Tu , Weiran Wang , Jinbo Xu

Nonlinear Weighted Directed Acyclic Graph and A Priori Estimates for Neural Networks

In an attempt to better understand structural benefits and generalization power of deep neural networks, we firstly present a novel graph theoretical formulation of neural network models, including fully connected, residual network (ResNet)…

Machine Learning · Computer Science 2023-05-29 Yuqing Li , Tao Luo , Chao Ma

Making Neural Networks FAIR

Research on neural networks has gained significant momentum over the past few years. Because training is a resource-intensive process and training data cannot always be made available to everyone, there has been a trend to reuse pre-trained…

Machine Learning · Computer Science 2020-12-02 Anna Nguyen , Tobias Weller , Michael Färber , York Sure-Vetter

Learning to Learn with Generative Models of Neural Network Checkpoints

We explore a data-driven approach for learning to optimize neural networks. We construct a dataset of neural network checkpoints and train a generative model on the parameters. In particular, our model is a conditional diffusion transformer…

Machine Learning · Computer Science 2022-09-27 William Peebles , Ilija Radosavovic , Tim Brooks , Alexei A. Efros , Jitendra Malik

Towards a theory of machine learning

We define a neural network as a septuple consisting of (1) a state vector, (2) an input projection, (3) an output projection, (4) a weight matrix, (5) a bias vector, (6) an activation map and (7) a loss function. We argue that the loss…

Machine Learning · Computer Science 2021-02-15 Vitaly Vanchurin

A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features

An important characteristic of neural networks is their ability to learn representations of the input data with effective features for prediction, which is believed to be a key factor to their superior empirical performance. To better…

Machine Learning · Computer Science 2022-06-06 Zhenmei Shi , Junyi Wei , Yingyu Liang

Augmenting Neural Networks with First-order Logic

Today, the dominant paradigm for training neural networks involves minimizing task loss on a large dataset. Using world knowledge to inform a model, and yet retain the ability to perform end-to-end training remains an open question. In this…

Machine Learning · Computer Science 2020-08-21 Tao Li , Vivek Srikumar

Deep Neural Nets as Hamiltonians

Neural networks are complex functions of both their inputs and parameters. Much prior work in deep learning theory analyzes the distribution of network outputs at a fixed a set of inputs (e.g. a training dataset) over random initializations…

Disordered Systems and Neural Networks · Physics 2025-04-08 Mike Winer , Boris Hanin

Network with Sub-Networks

We introduce network with sub-networks, a neural network which its weight layers could be detached into sub-neural networks during inference. To develop weights and biases which could be inserted in both base and sub-neural networks,…

Machine Learning · Computer Science 2021-10-20 Ninnart Fuengfusin , Hakaru Tamukoh