English

Second-Order Neural ODE Optimizer

Machine Learning 2021-11-09 v2 Systems and Control Systems and Control Optimization and Control

Abstract

We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient computation by solving a backward ODE, deriving efficient second-order methods becomes highly nontrivial. Nevertheless, inspired by the recent Optimal Control (OC) interpretation of training deep networks, we show that a specific continuous-time OC methodology, called Differential Programming, can be adopted to derive backward ODEs for higher-order derivatives at the same O(1) memory cost. We further explore a low-rank representation of the second-order derivatives and show that it leads to efficient preconditioned updates with the aid of Kronecker-based factorization. The resulting method -- named SNOpt -- converges much faster than first-order baselines in wall-clock time, and the improvement remains consistent across various applications, e.g. image classification, generative flow, and time-series prediction. Our framework also enables direct architecture optimization, such as the integration time of Neural ODEs, with second-order feedback policies, strengthening the OC perspective as a principled tool of analyzing optimization in deep learning. Our code is available at https://github.com/ghliu/snopt.

Keywords

Cite

@article{arxiv.2109.14158,
  title  = {Second-Order Neural ODE Optimizer},
  author = {Guan-Horng Liu and Tianrong Chen and Evangelos A. Theodorou},
  journal= {arXiv preprint arXiv:2109.14158},
  year   = {2021}
}

Comments

Accepted to Advances in Neural Information Processing Systems (NeurIPS) 2021 as Spotlight