English
Related papers

Related papers: Positional Encoding Field

200 papers

Diffusion Transformers (DiTs) have greatly advanced text-to-image generation, but models still struggle to generate the correct spatial relations between objects as specified in the text prompt. In this study, we adopt a mechanistic…

Artificial Intelligence · Computer Science 2026-04-07 Binxu Wang , Jingxuan Fan , Xu Pan

Diffusion transformers (DiTs) struggle to generate images at resolutions higher than their training resolutions. The primary obstacle is that the explicit positional encodings(PE), such as RoPE, need extrapolating to unseen positions which…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Shen Zhang , Siyuan Liang , Yaning Tan , Zhaowei Chen , Linze Li , Ge Wu , Yuhao Chen , Shuheng Li , Zhenyu Zhao , Caihua Chen , Jiajun Liang , Yao Tang

Diffusion Transformers (DiTs) have recently achieved remarkable success in text-guided image generation. In image editing, DiTs project text and image inputs to a joint latent space, from which they decode and synthesize new images.…

Computer Vision and Pattern Recognition · Computer Science 2024-11-14 Zitao Shuai , Chenwei Wu , Zhengxu Tang , Bowen Song , Liyue Shen

Supervised learning with tabular data presents unique challenges, including low data sizes, the absence of structural cues, and heterogeneous features spanning both categorical and continuous domains. Unlike vision and language tasks, where…

Machine Learning · Computer Science 2025-12-18 Yunze Leng , Rohan Ghosh , Mehul Motani

Leveraging pre-trained Diffusion Transformers (DiTs) for high-resolution (HR) image synthesis often leads to spatial layout collapse and degraded texture fidelity. Prior work mitigates these issues with complex pipelines that first perform…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Yiyang Ma , Feng Zhou , Xuedan Yin , Pu Cao , Yonghao Dang , Jianqin Yin

Positional encoding (PE) underpins how permutation-invariant Transformers represent sequence order, yet how positional information is processed and stored remains poorly understood. Modern PE methods such as RoPE still struggle on tasks…

Computation and Language · Computer Science 2026-05-29 Pierre-Antoine Lequeu , Camille Barboule , Benjamin Piwowarski

In this study, we investigate the impact of positional encoding (PE) on source separation performance and the generalization ability to long sequences (length extrapolation) in Transformer-based time-frequency (TF) domain dual-path models.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Kohei Saijo , Tetsuji Ogawa

Graph Transformers (GTs) facilitate the comprehension of graph-structured data by calculating the self-attention of node pairs without considering node position information. To address this limitation, we introduce an innovative and…

Machine Learning · Computer Science 2023-12-12 Kushal Bose , Swagatam Das

Diffusion Transformers (DiTs) are a powerful yet underexplored class of generative models compared to U-Net-based diffusion architectures. We propose TIDE-Temporal-aware sparse autoencoders for Interpretable Diffusion transformErs-a…

Computer Vision and Pattern Recognition · Computer Science 2025-08-13 Victor Shea-Jay Huang , Le Zhuo , Yi Xin , Zhaokai Wang , Fu-Yun Wang , Yuchi Wang , Renrui Zhang , Peng Gao , Hongsheng Li

We conducted empirical experiments to assess the transferability of a light curve transformer to datasets with different cadences and magnitude distributions using various positional encodings (PEs). We proposed a new approach to…

In transformers, the positional encoding (PE) provides essential information that distinguishes the position and order amongst tokens in a sequence. Most prior investigations of PE effects on generalization were tailored to 1D input…

Machine Learning · Computer Science 2025-06-24 Takuya Ito , Luca Cocchi , Tim Klinger , Parikshit Ram , Murray Campbell , Luke Hearne

Recently, the tokens of images share the same static data flow in many dense networks. However, challenges arise from the variance among the objects in images, such as large variations in the spatial scale and difficulties of recognition…

Computer Vision and Pattern Recognition · Computer Science 2023-08-14 Yuchen Ma , Zhengcong Fei , Junshi Huang

This paper revisits the role of positional embeddings (PEs) within vision transformers (ViTs) from a geometric perspective. We show that PEs are not mere token indices but effectively function as geometric priors that shape the spatial…

Computer Vision and Pattern Recognition · Computer Science 2026-02-02 Jian Shi , Michael Birsak , Wenqing Cui , Zhenyu Li , Peter Wonka

We propose a conditional positional encoding (CPE) scheme for vision Transformers. Unlike previous fixed or learnable positional encodings, which are pre-defined and independent of input tokens, CPE is dynamically generated and conditioned…

Computer Vision and Pattern Recognition · Computer Science 2023-02-14 Xiangxiang Chu , Zhi Tian , Bo Zhang , Xinlong Wang , Chunhua Shen

Positional Encodings (PEs) are used to inject word-order information into transformer-based language models. While they can significantly enhance the quality of sentence representations, their specific contribution to language models is not…

Computation and Language · Computer Science 2023-10-20 Lihu Chen , Gaël Varoquaux , Fabian M. Suchanek

Tables are ubiquitous across various domains for concisely representing structured information. Empowering large language models (LLMs) to reason over tabular data represents an actively explored direction. However, since typical LLMs only…

Computation and Language · Computer Science 2024-10-21 Jia-Nan Li , Jian Guan , Wei Wu , Zhengtao Yu , Rui Yan

Diffusion models are widely recognized for their ability to generate high-fidelity images. Despite the excellent performance and scalability of the Diffusion Transformer (DiT) architecture, it applies fixed compression across different…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Weinan Jia , Mengqi Huang , Nan Chen , Lei Zhang , Zhendong Mao

Recurrent models have been dominating the field of neural machine translation (NMT) for the past few years. Transformers \citep{vaswani2017attention}, have radically changed it by proposing a novel architecture that relies on a feed-forward…

Computation and Language · Computer Science 2022-10-25 Joyce Zheng , Mehdi Rezagholizadeh , Peyman Passban

Diffusion Transformers (DiTs) have achieved remarkable success in diverse and high-quality text-to-image(T2I) generation. However, how text and image latents individually and jointly contribute to the semantics of generated images, remain…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Zitao Shuai , Chenwei Wu , Zhengxu Tang , Bowen Song , Liyue Shen

Diffusion Transformers rely on static patchify tokenization, assigning the same token budget to smooth backgrounds, detailed object regions, noisy early timesteps, and late-stage refinements. We introduce the Dynamic Chunking Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2026-05-08 Akash Haridas , Utkarsh Saxena , Parsa Ashrafi Fashi , Mehdi Rezagholizadeh , Vikram Appia , Emad Barsoum
‹ Prev 1 2 3 10 Next ›