English

Solving Tensor Low Cycle Rank Approximation

Data Structures and Algorithms 2023-04-14 v1 Machine Learning

Abstract

Large language models have become ubiquitous in modern life, finding applications in various domains such as natural language processing, language translation, and speech recognition. Recently, a breakthrough work [Zhao, Panigrahi, Ge, and Arora Arxiv 2023] explains the attention model from probabilistic context-free grammar (PCFG). One of the central computation task for computing probability in PCFG is formulating a particular tensor low rank approximation problem, we can call it tensor cycle rank. Given an n×n×nn \times n \times n third order tensor AA, we say that AA has cycle rank-kk if there exists three n×k2n \times k^2 size matrices U,VU , V, and WW such that for each entry in each \begin{align*} A_{a,b,c} = \sum_{i=1}^k \sum_{j=1}^k \sum_{l=1}^k U_{a,i+k(j-1)} \otimes V_{b, j + k(l-1)} \otimes W_{c, l + k(i-1) } \end{align*} for all a[n],b[n],c[n]a \in [n], b \in [n], c \in [n]. For the tensor classical rank, tucker rank and train rank, it has been well studied in [Song, Woodruff, Zhong SODA 2019]. In this paper, we generalize the previous ``rotation and sketch'' technique in page 186 of [Song, Woodruff, Zhong SODA 2019] and show an input sparsity time algorithm for cycle rank.

Keywords

Cite

@article{arxiv.2304.06594,
  title  = {Solving Tensor Low Cycle Rank Approximation},
  author = {Yichuan Deng and Yeqi Gao and Zhao Song},
  journal= {arXiv preprint arXiv:2304.06594},
  year   = {2023}
}
R2 v1 2026-06-28T10:04:49.976Z