English

Hardware-Efficient Mixed-Precision CP Tensor Decomposition

Optimization and Control 2022-09-12 v1

Abstract

Tensor decomposition has been widely used in machine learning and high-volume data analysis. However, large-scale tensor factorization often consumes huge memory and computing cost. Meanwhile, modernized computing hardware such as tensor processing units (TPU) and Tensor Core GPU has opened a new window of hardware-efficient computing via mixed- or low-precision arithmetic representations. In this paper, we exploit the low-precision representation of tensor factorization, and propose a mixed-precision block stochastic gradient descent (SGD) method to reduce the costs of CP tensor decomposition. Our method achieves robust and fast convergence via a two-stage optimization, i.e., SignSGD followed by mixed-precision SGD. Detailed theoretical analysis is provided to prove the convergence of the proposed mixed-precision algorithm. Numerical experiments on both synthetic and realistic tensor data sets show the superior efficiency of our mixed-precision algorithm compared to full-precision CP decomposition. This work can remarkably reduce the memory, computing and energy cost on resource-constraint edge computing devices. We demonstrate this benefit via an FPGA prototype.

Keywords

Cite

@article{arxiv.2209.04003,
  title  = {Hardware-Efficient Mixed-Precision CP Tensor Decomposition},
  author = {Zi Yang and Junnan Shan and Zheng Zhang},
  journal= {arXiv preprint arXiv:2209.04003},
  year   = {2022}
}