Structured and Fast Optimization: The Kronecker SGD Algorithm
Abstract
Stochastic gradient descent (SGD) now acts as a fundamental part of optimization in current machine learning. Meanwhile, deep learning architectures have shown outstanding performance in a wide range of fields, such as natural language processing, bioinformatics, and computer vision. Nevertheless, as the parameter size increases, these models encounter serious efficiency challenges. Previous studies show that the per step calculation expense scales linearly with the input size . To mitigate this, our paper explores inherent patterns, such as Kronecker products within the training examples. We consider input data points that can be represented as tensor products of lower-dimensional vectors. We introduce a novel stochastic optimization method where the computational load for every update scales sublinearly with , assuming moderate structural properties of the inputs. We believe our research is the first work achieving this result, representing a significant step forward for efficient deep learning optimization. Our theoretical findings are supported by a formal theorem, demonstrating that the proposed algorithm can train a two-layer fully connected neural network with a per-iteration cost independent of .
Cite
@article{arxiv.2305.08001,
title = {Structured and Fast Optimization: The Kronecker SGD Algorithm},
author = {Zhao Song and Song Yue},
journal= {arXiv preprint arXiv:2305.08001},
year = {2026}
}