Related papers: Transformer for Partial Differential Equations' Op…

Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning

Operator learning for Partial Differential Equations (PDEs) is rapidly emerging as a promising approach for surrogate modeling of intricate systems. Transformers with the self-attention mechanism$\unicode{x2013}$a powerful tool originally…

Machine Learning · Computer Science 2024-05-17 Junfeng Chen , Kailiang Wu

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks

Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances. Since solutions to such problems do not depend on the order of elements of the set,…

Machine Learning · Computer Science 2019-05-28 Juho Lee , Yoonho Lee , Jungtaek Kim , Adam R. Kosiorek , Seungjin Choi , Yee Whye Teh

DepthFormer: Multimodal Positional Encodings and Cross-Input Attention for Transformer-Based Segmentation Networks

Most approaches for semantic segmentation use only information from color cameras to parse the scenes, yet recent advancements show that using depth data allows to further improve performances. In this work, we focus on transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Francesco Barbato , Giulia Rizzoli , Pietro Zanuttigh

Channelformer: Attention based Neural Solution for Wireless Channel Estimation and Effective Online Training

In this paper, we propose an encoder-decoder neural architecture (called Channelformer) to achieve improved channel estimation for orthogonal frequency-division multiplexing (OFDM) waveforms in downlink scenarios. The self-attention…

Signal Processing · Electrical Eng. & Systems 2023-02-10 Dianxin Luan , John Thompson

Deriving Transformer Architectures as Implicit Multinomial Regression

While attention has been empirically shown to improve model performance, it lacks a rigorous mathematical justification. This short paper establishes a novel connection between attention mechanisms and multinomial regression. Specifically,…

Machine Learning · Computer Science 2025-10-28 Jonas A. Actor , Anthony Gruber , Eric C. Cyr

Evolving Attention with Residual Convolutions

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However,…

Machine Learning · Computer Science 2021-02-26 Yujing Wang , Yaming Yang , Jiangang Bai , Mingliang Zhang , Jing Bai , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong

Integrating Locality-Aware Attention with Transformers for General Geometry PDEs

Neural operators have emerged as promising frameworks for learning mappings governed by partial differential equations (PDEs), serving as data-driven alternatives to traditional numerical methods. While methods such as the Fourier neural…

Machine Learning · Computer Science 2025-04-21 Minsu Koh , Beom-Chul Park , Heejo Kong , Seong-Whan Lee

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance. Toward a…

Computer Vision and Pattern Recognition · Computer Science 2019-04-15 Xizhou Zhu , Dazhi Cheng , Zheng Zhang , Stephen Lin , Jifeng Dai

Transformers as Intrinsic Optimizers: Forward Inference through the Energy Principle

Attention-based Transformers have demonstrated strong adaptability across a wide range of tasks and have become the backbone of modern Large Language Models (LLMs). However, their underlying mechanisms remain open for further exploration.…

Machine Learning · Computer Science 2026-01-13 Ruifeng Ren , Sheng Ouyang , Huayi Tang , Yong Liu

Deep transfer operator learning for partial differential equations under conditional shift

Transfer learning (TL) enables the transfer of knowledge gained in learning to perform one task (source) to a related but different task (target), hence addressing the expense of data acquisition and labeling, potential computational power…

Machine Learning · Computer Science 2022-12-20 Somdatta Goswami , Katiana Kontolati , Michael D. Shields , George Em Karniadakis

DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation

Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. However, the self-attention mechanism, which is the core part of the Transformer model, usually suffers from…

Computer Vision and Pattern Recognition · Computer Science 2023-07-28 Reza Azad , René Arimond , Ehsan Khodapanah Aghdam , Amirhossein Kazerouni , Dorit Merhof

Vision Transformer with Deformable Attention

Transformers have recently shown superior performances on various vision tasks. The large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts. Nevertheless, simply…

Computer Vision and Pattern Recognition · Computer Science 2022-05-25 Zhuofan Xia , Xuran Pan , Shiji Song , Li Erran Li , Gao Huang

Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation

Transformer-based models have been achieving state-of-the-art results in several fields of Natural Language Processing. However, its direct application to speech tasks is not trivial. The nature of this sequences carries problems such as…

Computation and Language · Computer Science 2022-05-17 Gerard Sant , Gerard I. Gállego , Belen Alastruey , Marta R. Costa-Jussà

ODformer: Spatial-Temporal Transformers for Long Sequence Origin-Destination Matrix Forecasting Against Cross Application Scenario

Origin-Destination (OD) matrices record directional flow data between pairs of OD regions. The intricate spatiotemporal dependency in the matrices makes the OD matrix forecasting (ODMF) problem not only intractable but also non-trivial.…

Artificial Intelligence · Computer Science 2022-08-18 Jin Huang , Bosong Huang , Weihao Yu , Jing Xiao , Ruzhong Xie , Ke Ruan

From Complex Dynamics to DynFormer: Rethinking Transformers for PDEs

Partial differential equations (PDEs) are fundamental for modeling complex physical systems, yet classical numerical solvers face prohibitive computational costs in high-dimensional and multi-scale regimes. While Transformer-based neural…

Machine Learning · Computer Science 2026-03-04 Pengyu Lai , Yixiao Chen , Dewu Yang , Rui Wang , Feng Wang , Hui Xu

SparseBERT: Rethinking the Importance Analysis in Self-attention

Transformer-based models are popularly used in natural language processing (NLP). Its core component, self-attention, has aroused widespread interest. To understand the self-attention mechanism, a direct method is to visualize the attention…

Machine Learning · Computer Science 2021-07-02 Han Shi , Jiahui Gao , Xiaozhe Ren , Hang Xu , Xiaodan Liang , Zhenguo Li , James T. Kwok

Transformers for dynamical systems learn transfer operators in-context

Large-scale foundation models for scientific machine learning adapt to physical settings unseen during training, such as zero-shot transfer between turbulent scales. This phenomenon, in-context learning, challenges conventional…

Machine Learning · Computer Science 2026-04-14 Anthony Bao , Jeffrey Lai , William Gilpin

EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention

To capture user preference, transformer models have been widely applied to model sequential user behavior data. The core of transformer architecture lies in the self-attention mechanism, which computes the pairwise attention scores in a…

Information Retrieval · Computer Science 2024-04-05 Zhen Tian , Wayne Xin Zhao , Changwang Zhang , Xin Zhao , Zhongrui Ma , Ji-Rong Wen

A Tensorized Transformer for Language Modeling

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP)…

Computation and Language · Computer Science 2019-11-07 Xindian Ma , Peng Zhang , Shuai Zhang , Nan Duan , Yuexian Hou , Dawei Song , Ming Zhou

Improved Operator Learning by Orthogonal Attention

Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning. Among them, attention-based neural operators have become one of the…

Machine Learning · Computer Science 2024-12-30 Zipeng Xiao , Zhongkai Hao , Bokai Lin , Zhijie Deng , Hang Su