Related papers: Towards Theoretically Inspired Neural Initializati…

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

Innovations in neural architectures have fostered significant breakthroughs in language modeling and computer vision. Unfortunately, novel architectures often result in challenging hyper-parameter choices and training instability if the…

Machine Learning · Computer Science 2021-11-25 Chen Zhu , Renkun Ni , Zheng Xu , Kezhi Kong , W. Ronny Huang , Tom Goldstein

Faster Predictive Coding Networks via Better Initialization

Research aimed at scaling up neuroscience inspired learning algorithms for neural networks is accelerating. Recently, a key research area has been the study of energy-based learning algorithms such as predictive coding, due to their…

Machine Learning · Computer Science 2026-01-30 Luca Pinchetti , Simon Frieder , Thomas Lukasiewicz , Tommaso Salvatori

GIO: Gradient Information Optimization for Training Dataset Selection

It is often advantageous to train models on a subset of the available train examples, because the examples are of variable quality or because one would like to train with fewer examples, without sacrificing performance. We present Gradient…

Machine Learning · Computer Science 2024-07-30 Dante Everaert , Christopher Potts

Target noise: A pre-training based neural network initialization for efficient high resolution learning

Weight initialization plays a crucial role in the optimization behavior and convergence efficiency of neural networks. Most existing initialization methods, such as Xavier and Kaiming initializations, rely on random sampling and do not…

Machine Learning · Computer Science 2026-02-09 Shaowen Wang , Tariq Alkhalifah

KO: Kinetics-inspired Neural Optimizer with PDE Simulation Approaches

The design of optimization algorithms for neural networks remains a critical challenge, with most existing methods relying on heuristic adaptations of gradient-based approaches. This paper introduces KO (Kinetics-inspired Optimizer), a…

Machine Learning · Computer Science 2025-05-22 Mingquan Feng , Yixin Huang , Yifan Fu , Shaobo Wang , Junchi Yan

GradMax: Growing Neural Networks using Gradient Information

The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified. In this work we instead focus on growing the architecture…

Machine Learning · Computer Science 2022-06-08 Utku Evci , Bart van Merriënboer , Thomas Unterthiner , Max Vladymyrov , Fabian Pedregosa

A Unified Perspective on Optimization in Machine Learning and Neuroscience: From Gradient Descent to Neural Adaptation

Iterative optimization is central to modern artificial intelligence (AI) and provides a crucial framework for understanding adaptive systems. This review provides a unified perspective on this subject, bridging classic theory with neural…

Machine Learning · Computer Science 2025-10-22 Jesús García Fernández , Nasir Ahmad , Marcel van Gerven

Neural-Initialized Newton: Accelerating Nonlinear Finite Elements via Operator Learning

We propose a Newton-based scheme, initialized by neural operator predictions, to accelerate the parametric solution of nonlinear problems in computational solid mechanics. First, a physics informed conditional neural field is trained to…

Machine Learning · Computer Science 2025-11-11 Kianoosh Taghikhani , Yusuke Yamazaki , Jerry Paul Varghese , Markus Apel , Reza Najian Asl , Shahed Rezaei

A new initialisation to Control Gradients in Sinusoidal Neural network

Proper initialisation strategy is of primary importance to mitigate gradient explosion or vanishing when training neural networks. Yet, the impact of initialisation parameters still lacks a precise theoretical understanding for several…

Machine Learning · Computer Science 2026-05-12 Andrea Combette , Antoine Venaille , Nelly Pustelnik

Initialization Using Perlin Noise for Training Networks with a Limited Amount of Data

We propose a novel network initialization method using Perlin noise for training image classification networks with a limited amount of data. Our main idea is to initialize the network parameters by solving an artificial noise…

Computer Vision and Pattern Recognition · Computer Science 2021-01-20 Nakamasa Inoue , Eisuke Yamagata , Hirokatsu Kataoka

Neural Architecture Optimization

Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a…

Machine Learning · Computer Science 2019-09-05 Renqian Luo , Fei Tian , Tao Qin , Enhong Chen , Tie-Yan Liu

Initialization-enhanced Physics-Informed Neural Network with Domain Decomposition (IDPINN)

We propose a new physics-informed neural network framework, IDPINN, based on the enhancement of initialization and domain decomposition to improve prediction accuracy. We train a PINN using a small dataset to obtain an initial network…

Machine Learning · Computer Science 2024-06-06 Chenhao Si , Ming Yan

Learning to Optimize Quasi-Newton Methods

Fast gradient-based optimization algorithms have become increasingly essential for the computationally efficient training of machine learning models. One technique is to multiply the gradient by a preconditioner matrix to produce a step,…

Machine Learning · Computer Science 2023-09-12 Isaac Liao , Rumen R. Dangovski , Jakob N. Foerster , Marin Soljačić

Rethinking ImageNet Pre-training

We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are no worse than their ImageNet pre-training counterparts even when using…

Computer Vision and Pattern Recognition · Computer Science 2018-11-22 Kaiming He , Ross Girshick , Piotr Dollár

Robust Pruning at Initialization

Overparameterized Neural Networks (NN) display state-of-the-art performance. However, there is a growing need for smaller, energy-efficient, neural networks tobe able to use machine learning applications on devices with limited…

Machine Learning · Statistics 2021-05-21 Soufiane Hayou , Jean-Francois Ton , Arnaud Doucet , Yee Whye Teh

Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy

Despite the recent success of stochastic gradient descent in deep learning, it is often difficult to train a deep neural network with an inappropriate choice of its initial parameters. Even if training is successful, it has been known that…

Machine Learning · Computer Science 2023-02-10 Cheolhyoung Lee , Kyunghyun Cho

ZerO Initialization: Initializing Neural Networks with only Zeros and Ones

Deep neural networks are usually initialized with random weights, with adequately selected initial variance to ensure stable signal propagation during training. However, selecting the appropriate variance becomes challenging especially as…

Machine Learning · Computer Science 2022-11-07 Jiawei Zhao , Florian Schäfer , Anima Anandkumar

MSE-Optimal Neural Network Initialization via Layer Fusion

Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. However, the use of stochastic gradient descent combined with the nonconvexity of the underlying optimization problems renders…

Machine Learning · Computer Science 2020-01-29 Ramina Ghods , Andrew S. Lan , Tom Goldstein , Christoph Studer

Initialization for Network Embedding: A Graph Partition Approach

Network embedding has been intensively studied in the literature and widely used in various applications, such as link prediction and node classification. While previous work focus on the design of new algorithms or are tailored for various…

Social and Information Networks · Computer Science 2019-11-12 Wenqing Lin , Feng He , Faqiang Zhang , Xu Cheng , Hongyun Cai

RSO: A Gradient Free Sampling Based Approach For Training Deep Neural Networks

We propose RSO (random search optimization), a gradient free Markov Chain Monte Carlo search based approach for training deep neural networks. To this end, RSO adds a perturbation to a weight in a deep neural network and tests if it reduces…

Machine Learning · Computer Science 2020-05-13 Rohun Tripathi , Bharat Singh