Structured and Fast Optimization: The Kronecker SGD Algorithm

Zhao Song; Song Yue

Structured and Fast Optimization: The Kronecker SGD Algorithm

Machine Learning 2026-01-27 v2

Authors: Zhao Song , Song Yue

Abstract

Stochastic gradient descent (SGD) now acts as a fundamental part of optimization in current machine learning. Meanwhile, deep learning architectures have shown outstanding performance in a wide range of fields, such as natural language processing, bioinformatics, and computer vision. Nevertheless, as the parameter size $d$ increases, these models encounter serious efficiency challenges. Previous studies show that the per step calculation expense scales linearly with the input size $d$ . To mitigate this, our paper explores inherent patterns, such as Kronecker products within the training examples. We consider input data points that can be represented as tensor products of lower-dimensional vectors. We introduce a novel stochastic optimization method where the computational load for every update scales sublinearly with $d$ , assuming moderate structural properties of the inputs. We believe our research is the first work achieving this result, representing a significant step forward for efficient deep learning optimization. Our theoretical findings are supported by a formal theorem, demonstrating that the proposed algorithm can train a two-layer fully connected neural network with a per-iteration cost independent of $d$ .

Keywords

stochastic gradient descent distributed training neural network training

Cite

@article{arxiv.2305.08001,
  title  = {Structured and Fast Optimization: The Kronecker SGD Algorithm},
  author = {Zhao Song and Song Yue},
  journal= {arXiv preprint arXiv:2305.08001},
  year   = {2026}
}

Structured and Fast Optimization: The Kronecker SGD Algorithm

Abstract

Keywords

Cite

Related papers