Deep Tensor Network

Yifan Zhang

Deep Tensor Network

Machine Learning 2025-09-03 v3 Artificial Intelligence Computer Vision and Pattern Recognition Quantum Physics

Authors: Yifan Zhang

Abstract

The quadratic complexity of dot-product attention introduced in Transformer remains a fundamental bottleneck impeding the progress of foundation models toward unbounded context lengths. Addressing this challenge, we introduce the Deep Tensor Network, a new architectural framework that fundamentally reformulates attention by unifying the expressive power of tensor algebra with neural network design. Our approach moves beyond both conventional dot-product attention and subsequent linear-time approximations to capture higher-order statistical dependencies. We introduce two core operators derived from this framework: \emph{Tensor Attention}, which models complex token-mixing via data-dependent polynomial kernels, and Tensor Interaction, a novel mechanism for adaptive channel-mixing. We demonstrate that these operators are powered by second-order summaries that entirely bypass the formation of $n \times n$ matrices, enabling a causality-preserving streaming implementation with $O(d^2)$ per-token updates and $O(d^2)$ state. This efficiency rivals that of modern State Space Models while retaining an attention-like formulation. The Deep Tensor Network thus provides a principled and powerful new class of building blocks for next-generation sequence models, bridging the gap between scalable computation and rich, expressive interaction modeling.

Keywords

attention mechanism tensor decomposition encoder-decoder architecture

Cite

@article{arxiv.2311.11091,
  title  = {Deep Tensor Network},
  author = {Yifan Zhang},
  journal= {arXiv preprint arXiv:2311.11091},
  year   = {2025}
}

Deep Tensor Network

Abstract

Keywords

Cite

Related papers