Related papers: Positional Encoding Field

Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers

Diffusion Transformers (DiTs) have greatly advanced text-to-image generation, but models still struggle to generate the correct spatial relations between objects as specified in the text prompt. In this study, we adopt a mechanistic…

Artificial Intelligence · Computer Science 2026-04-07 Binxu Wang , Jingxuan Fan , Xu Pan

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

Diffusion transformers (DiTs) struggle to generate images at resolutions higher than their training resolutions. The primary obstacle is that the explicit positional encodings(PE), such as RoPE, need extrapolating to unseen positions which…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Shen Zhang , Siyuan Liang , Yaning Tan , Zhaowei Chen , Linze Li , Ge Wu , Yuhao Chen , Shuheng Li , Zhenyu Zhao , Caihua Chen , Jiajun Liang , Yao Tang

Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing

Diffusion Transformers (DiTs) have recently achieved remarkable success in text-guided image generation. In image editing, DiTs project text and image inputs to a joint latent space, from which they decode and synthesize new images.…

Computer Vision and Pattern Recognition · Computer Science 2024-11-14 Zitao Shuai , Chenwei Wu , Zhengxu Tang , Bowen Song , Liyue Shen

Tab-PET: Graph-Based Positional Encodings for Tabular Transformers

Supervised learning with tabular data presents unique challenges, including low data sizes, the absence of structural cues, and heterogeneous features spanning both categorical and continuous domains. Unlike vision and language tasks, where…

Machine Learning · Computer Science 2025-12-18 Yunze Leng , Rohan Ghosh , Mehul Motani

ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers

Leveraging pre-trained Diffusion Transformers (DiTs) for high-resolution (HR) image synthesis often leads to spatial layout collapse and degraded texture fidelity. Prior work mitigates these issues with complex pipelines that first perform…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Yiyang Ma , Feng Zhou , Xuedan Yin , Pu Cao , Yonghao Dang , Jianqin Yin

Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders

Positional encoding (PE) underpins how permutation-invariant Transformers represent sequence order, yet how positional information is processed and stored remains poorly understood. Modern PE methods such as RoPE still struggle on tasks…

Computation and Language · Computer Science 2026-05-29 Pierre-Antoine Lequeu , Camille Barboule , Benjamin Piwowarski

A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models

In this study, we investigate the impact of positional encoding (PE) on source separation performance and the generalization ability to long sequences (length extrapolation) in Transformer-based time-frequency (TF) domain dual-path models.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Kohei Saijo , Tetsuji Ogawa

HyPE-GT: where Graph Transformers meet Hyperbolic Positional Encodings

Graph Transformers (GTs) facilitate the comprehension of graph-structured data by calculating the self-attention of node pairs without considering node position information. To address this limitation, we introduce an innovative and…

Machine Learning · Computer Science 2023-12-12 Kushal Bose , Swagatam Das

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Diffusion Transformers (DiTs) are a powerful yet underexplored class of generative models compared to U-Net-based diffusion architectures. We propose TIDE-Temporal-aware sparse autoencoders for Interpretable Diffusion transformErs-a…

Computer Vision and Pattern Recognition · Computer Science 2025-08-13 Victor Shea-Jay Huang , Le Zhuo , Yi Xin , Zhaokai Wang , Fu-Yun Wang , Yuchi Wang , Renrui Zhang , Peng Gao , Hongsheng Li

Positional Encodings for Light Curve Transformers: Playing with Positions and Attention

We conducted empirical experiments to assess the transferability of a light curve transformer to datasets with different cadences and magnitude distributions using various positional encodings (PEs). We proposed a new approach to…

Instrumentation and Methods for Astrophysics · Physics 2023-08-15 Daniel Moreno-Cartagena , Guillermo Cabrera-Vives , Pavlos Protopapas , Cristobal Donoso-Oliva , Manuel Pérez-Carrasco , Martina Cádiz-Leyton

Learning interpretable positional encodings in transformers depends on initialization

In transformers, the positional encoding (PE) provides essential information that distinguishes the position and order amongst tokens in a sequence. Most prior investigations of PE effects on generalization were tailored to 1D input…

Machine Learning · Computer Science 2025-06-24 Takuya Ito , Luca Cocchi , Tim Klinger , Parikshit Ram , Murray Campbell , Luke Hearne

DiT: Efficient Vision Transformers with Dynamic Token Routing

Recently, the tokens of images share the same static data flow in many dense networks. However, challenges arise from the variance among the objects in images, such as large variations in the spatial scale and difficulties of recognition…

Computer Vision and Pattern Recognition · Computer Science 2023-08-14 Yuchen Ma , Zhengcong Fei , Junshi Huang

Geometry without Position? When Positional Embeddings Help and Hurt Spatial Reasoning

This paper revisits the role of positional embeddings (PEs) within vision transformers (ViTs) from a geometric perspective. We show that PEs are not mere token indices but effectively function as geometric priors that shape the spatial…

Computer Vision and Pattern Recognition · Computer Science 2026-02-02 Jian Shi , Michael Birsak , Wenqing Cui , Zhenyu Li , Peter Wonka

Conditional Positional Encodings for Vision Transformers

We propose a conditional positional encoding (CPE) scheme for vision Transformers. Unlike previous fixed or learnable positional encodings, which are pre-defined and independent of input tokens, CPE is dynamically generated and conditioned…

Computer Vision and Pattern Recognition · Computer Science 2023-02-14 Xiangxiang Chu , Zhi Tian , Bo Zhang , Xinlong Wang , Chunhua Shen

The Locality and Symmetry of Positional Encodings

Positional Encodings (PEs) are used to inject word-order information into transformer-based language models. While they can significantly enhance the quality of sentence representations, their specific contribution to language models is not…

Computation and Language · Computer Science 2023-10-20 Lihu Chen , Gaël Varoquaux , Fabian M. Suchanek

2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models

Tables are ubiquitous across various domains for concisely representing structured information. Empowering large language models (LLMs) to reason over tabular data represents an actively explored direction. However, since typical LLMs only…

Computation and Language · Computer Science 2024-10-21 Jia-Nan Li , Jian Guan , Wei Wu , Zhengtao Yu , Rui Yan

D$^2$iT: Dynamic Diffusion Transformer for Accurate Image Generation

Diffusion models are widely recognized for their ability to generate high-fidelity images. Despite the excellent performance and scalability of the Diffusion Transformer (DiT) architecture, it applies fixed compression across different…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Weinan Jia , Mengqi Huang , Nan Chen , Lei Zhang , Zhendong Mao

Dynamic Position Encoding for Transformers

Recurrent models have been dominating the field of neural machine translation (NMT) for the past few years. Transformers \citep{vaswani2017attention}, have radically changed it by proposing a novel architecture that relies on a feed-forward…

Computation and Language · Computer Science 2022-10-25 Joyce Zheng , Mehdi Rezagholizadeh , Peyman Passban

Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot Fine-grained Semantic Editing

Diffusion Transformers (DiTs) have achieved remarkable success in diverse and high-quality text-to-image(T2I) generation. However, how text and image latents individually and jointly contribute to the semantics of generated images, remain…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Zitao Shuai , Chenwei Wu , Zhengxu Tang , Bowen Song , Liyue Shen

DC-DiT: Adaptive Compute and Elastic Inference for Visual Generation via Dynamic Chunking

Diffusion Transformers rely on static patchify tokenization, assigning the same token budget to smooth backgrounds, detailed object regions, noisy early timesteps, and late-stage refinements. We introduce the Dynamic Chunking Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-05-08 Akash Haridas , Utkarsh Saxena , Parsa Ashrafi Fashi , Mehdi Rezagholizadeh , Vikram Appia , Emad Barsoum