English

Deep Tensor Network

Machine Learning 2025-09-03 v3 Artificial Intelligence Computer Vision and Pattern Recognition Quantum Physics

Abstract

The quadratic complexity of dot-product attention introduced in Transformer remains a fundamental bottleneck impeding the progress of foundation models toward unbounded context lengths. Addressing this challenge, we introduce the Deep Tensor Network, a new architectural framework that fundamentally reformulates attention by unifying the expressive power of tensor algebra with neural network design. Our approach moves beyond both conventional dot-product attention and subsequent linear-time approximations to capture higher-order statistical dependencies. We introduce two core operators derived from this framework: \emph{Tensor Attention}, which models complex token-mixing via data-dependent polynomial kernels, and Tensor Interaction, a novel mechanism for adaptive channel-mixing. We demonstrate that these operators are powered by second-order summaries that entirely bypass the formation of n×nn \times n matrices, enabling a causality-preserving streaming implementation with O(d2)O(d^2) per-token updates and O(d2)O(d^2) state. This efficiency rivals that of modern State Space Models while retaining an attention-like formulation. The Deep Tensor Network thus provides a principled and powerful new class of building blocks for next-generation sequence models, bridging the gap between scalable computation and rich, expressive interaction modeling.

Keywords

Cite

@article{arxiv.2311.11091,
  title  = {Deep Tensor Network},
  author = {Yifan Zhang},
  journal= {arXiv preprint arXiv:2311.11091},
  year   = {2025}
}
R2 v1 2026-06-28T13:25:04.770Z