Related papers: Deep Tensor Network

TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors

Attention matrices are fundamental to transformer research, supporting a broad range of applications including interpretability, visualization, manipulation, and distillation. Yet, most existing analyses focus on individual attention heads…

Machine Learning · Computer Science 2026-01-27 Ido Andrew Atad , Itamar Zimerman , Shahar Katz , Lior Wolf

Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models

Transformer models typically calculate attention matrices using dot products, which have limitations when capturing nonlinear relationships between embedding vectors. We propose Neural Attention, a technique that replaces dot products with…

Machine Learning · Computer Science 2025-11-10 Andrew DiGiugno , Ausif Mahmood

TensorCoder: Dimension-Wise Attention via Tensor Representation for Natural Language Modeling

Transformer has been widely-used in many Natural Language Processing (NLP) tasks and the scaled dot-product attention between tokens is a core module of Transformer. This attention is a token-wise design and its complexity is quadratic to…

Computation and Language · Computer Science 2020-08-13 Shuai Zhang , Peng Zhang , Xindian Ma , Junqiu Wei , Ningning Wang , Qun Liu

Nexus: Higher-Order Attention Mechanisms in Transformers

Transformers have achieved significant success across various domains, relying on self-attention to capture dependencies. However, the standard first-order attention mechanism is often limited by a low-rank bottleneck, struggling to capture…

Computation and Language · Computer Science 2025-12-05 Hanting Chen , Chong Zhu , Kai Han , Yuchuan Tian , Yuchen Liang , Tianyu Guo , Xinghao Chen , Dacheng Tao , Yunhe Wang

DCT-Former: Efficient Self-Attention with Discrete Cosine Transform

Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of "fully-attentive"…

Machine Learning · Computer Science 2023-03-16 Carmelo Scribano , Giorgia Franchini , Marco Prato , Marko Bertogna

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling

The success of Transformer language models is widely credited to their dot-product attention mechanism, which interweaves a set of key design principles: mixing information across positions (enabling multi-token interactions),…

Computation and Language · Computer Science 2025-10-14 Huiyin Xue , Nafise Sadat Moosavi , Nikolaos Aletras

Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention

The Transformer architecture has become a cornerstone of modern artificial intelligence, but its core self-attention mechanism suffers from a complexity bottleneck that scales quadratically with sequence length, severely limiting its…

Machine Learning · Computer Science 2025-08-29 Zhongpan Tang

Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the…

Machine Learning · Computer Science 2020-11-05 Imanol Schlag , Paul Smolensky , Roland Fernandez , Nebojsa Jojic , Jürgen Schmidhuber , Jianfeng Gao

Convolution, attention and structure embedding

Deep neural networks are composed of layers of parametrised linear operations intertwined with non linear activations. In basic models, such as the multi-layer perceptron, a linear layer operates on a simple input vector embedding of the…

Machine Learning · Computer Science 2020-03-06 Jean-Marc Andreoli

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance. Toward a…

Computer Vision and Pattern Recognition · Computer Science 2019-04-15 Xizhou Zhu , Dazhi Cheng , Zheng Zhang , Stephen Lin , Jifeng Dai

Tensorformer: Normalized Matrix Attention Transformer for High-quality Point Cloud Reconstruction

Surface reconstruction from raw point clouds has been studied for decades in the computer graphics community, which is highly demanded by modeling and rendering applications nowadays. Classic solutions, such as Poisson surface…

Graphics · Computer Science 2023-10-11 Hui Tian , Zheng Qin , Renjiao Yi , Chenyang Zhu , Kai Xu

Higher-Order Transformers With Kronecker-Structured Attention

Modern datasets are increasingly high-dimensional and multiway, often represented as tensor-valued data with multi-indexed variables. While Transformers excel in sequence modeling and high-dimensional tasks, their direct application to…

Machine Learning · Computer Science 2025-11-19 Soroush Omranpour , Guillaume Rabusseau , Reihaneh Rabbany

Tensor Product Attention Is All You Need

Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference. In this paper, we propose Tensor Product Attention (TPA), a novel…

Computation and Language · Computer Science 2026-01-13 Yifan Zhang , Yifeng Liu , Huizhuo Yuan , Zhen Qin , Yang Yuan , Quanquan Gu , Andrew Chi-Chih Yao

Treeformer: Dense Gradient Trees for Efficient Attention Computation

Standard inference and training with transformer based architectures scale quadratically with input sequence length. This is prohibitively large for a variety of applications especially in web-page translation, query-answering etc.…

Computation and Language · Computer Science 2023-03-20 Lovish Madaan , Srinadh Bhojanapalli , Himanshu Jain , Prateek Jain

NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function

The attention mechanism is the primary component of the transformer architecture; it has led to significant advancements in deep learning spanning many domains and covering multiple tasks. In computer vision, the attention mechanism was…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Abdullah Nazhat Abdullah , Tarkan Aydin

Variational Structured Attention Networks for Deep Visual Representation Learning

Convolutional neural networks have enabled major progresses in addressing pixel-level prediction tasks such as semantic segmentation, depth estimation, surface normal prediction and so on, benefiting from their powerful capabilities in…

Computer Vision and Pattern Recognition · Computer Science 2021-12-16 Guanglei Yang , Paolo Rota , Xavier Alameda-Pineda , Dan Xu , Mingli Ding , Elisa Ricci

Energy Transformer

Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear…

Machine Learning · Computer Science 2023-11-02 Benjamin Hoover , Yuchen Liang , Bao Pham , Rameswar Panda , Hendrik Strobelt , Duen Horng Chau , Mohammed J. Zaki , Dmitry Krotov

Agglomerative Attention

Neural networks using transformer-based architectures have recently demonstrated great power and flexibility in modeling sequences of many types. One of the core components of transformer networks is the attention layer, which allows…

Machine Learning · Computer Science 2019-07-16 Matthew Spellings

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks

Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances. Since solutions to such problems do not depend on the order of elements of the set,…

Machine Learning · Computer Science 2019-05-28 Juho Lee , Yoonho Lee , Jungtaek Kim , Adam R. Kosiorek , Seungjin Choi , Yee Whye Teh

Transformer with Fourier Integral Attentions

Multi-head attention empowers the recent success of transformers, the state-of-the-art models that have achieved remarkable success in sequence modeling and beyond. These attention mechanisms compute the pairwise dot products between the…

Machine Learning · Computer Science 2022-06-02 Tan Nguyen , Minh Pham , Tam Nguyen , Khai Nguyen , Stanley J. Osher , Nhat Ho