Structured Multidimensional Representation Learning for Large Language Models

Alaa El Ichi; Khalide Jbilou; Mohamed El Guide; Franck Dufrenois

Structured Multidimensional Representation Learning for Large Language Models

Computation and Language 2026-03-09 v1 Numerical Analysis Numerical Analysis

Authors: Alaa El Ichi , Khalide Jbilou , Mohamed El Guide , Franck Dufrenois

Abstract

Transformer architectures achieve state-of-the-art performance across a wide range of pattern recognition and natural language processing tasks, but their scaling is accompanied by substantial parameter growth and redundancy in the embedding dimension. In this work, we introduce a structured spectral factorization of the embedding space based on the L-product for third-order tensors. By reshaping token representations into spectral tensor slices and performing attention and feed-forward operations in the transform domain, we obtain a Tensor Transformer architecture that decomposes the encoder into p independent spectral sub-transformers while preserving standard Transformer semantics. We prove that the proposed L-Transformer is spectrally equivalent to p parallel Transformers operating on reduceddimensional embeddings, which yields approximately 1/p reduction (up to lower-order terms such as biases and normalization parameters) in encoder parameters under fixed total embedding size. When instantiated with a real-valued Discrete Cosine Transform (DCT), the method remains fully differentiable and compatible with existing training pipelines. Beyond compression, the spectral decomposition introduces an inductive bias over embedding frequencies, enabling slice-dependent frequency scaling that improves generalization. Experiments on IMDB and AG~News show that the proposed model can substantially reduce encoder parameters (up to 75\% for p=4) while maintaining competitive accuracy. On IMDB, the tensorized encoder matches or improves upon the standard baseline under compression, whereas on AG~News at moderate width we observe a small accuracy decrease in exchange for a 4 times encoder reduction; at BERT-base width (d=768), performance returns to parity.

Keywords

transformer tensor decomposition encoder-decoder architecture

Cite

@article{arxiv.2603.05727,
  title  = {Structured Multidimensional Representation Learning for Large Language Models},
  author = {Alaa El Ichi and Khalide Jbilou and Mohamed El Guide and Franck Dufrenois},
  journal= {arXiv preprint arXiv:2603.05727},
  year   = {2026}
}

Comments

25 pages, 6 figures. Preprint of a journal submission

Structured Multidimensional Representation Learning for Large Language Models

Abstract

Keywords

Cite

Comments

Related papers