Related papers: Learning Spatial-Frequency Transformer for Visual …

Separable Self and Mixed Attention Transformers for Efficient Object Tracking

The deployment of transformers for visual object tracking has shown state-of-the-art results on several benchmarks. However, the transformer-based models are under-utilized for Siamese lightweight tracking due to the computational…

Computer Vision and Pattern Recognition · Computer Science 2023-09-11 Goutam Yelluru Gopal , Maria A. Amer

SSF-Net: Spatial-Spectral Fusion Network with Spectral Angle Awareness for Hyperspectral Object Tracking

Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking. However, existing…

Computer Vision and Pattern Recognition · Computer Science 2025-06-02 Hanzheng Wang , Wei Li , Xiang-Gen Xia , Qian Du , Jing Tian

Transformer Tracking

Correlation acts as a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion manner to consider the similarity between the template and the search region.…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Xin Chen , Bin Yan , Jiawen Zhu , Dong Wang , Xiaoyun Yang , Huchuan Lu

Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

Recently, transformers have demonstrated great potential for modeling long-term dependencies from skeleton sequences and thereby gained ever-increasing attention in skeleton action recognition. However, the existing transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2024-07-31 Wenhan Wu , Ce Zheng , Zihao Yang , Chen Chen , Srijan Das , Aidong Lu

A Spatial-Spectral-Frequency Interactive Network for Multimodal Remote Sensing Classification

Deep learning-based methods have achieved significant success in remote sensing Earth observation data analysis. Numerous feature fusion techniques address multimodal remote sensing image classification by integrating global and local…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Hao Liu , Yunhao Gao , Wei Li , Mingyang Zhang , Maoguo Gong , Lorenzo Bruzzone

Focal Self-attention for Local-Global Interactions in Vision Transformers

Recently, Vision Transformer and its variants have shown great promise on various computer vision tasks. The ability of capturing short- and long-range visual dependencies through self-attention is arguably the main source for the success.…

Computer Vision and Pattern Recognition · Computer Science 2021-07-02 Jianwei Yang , Chunyuan Li , Pengchuan Zhang , Xiyang Dai , Bin Xiao , Lu Yuan , Jianfeng Gao

Spatial-Frequency Gated Swin Transformer for Remote Sensing Single-Image Super-Resolution

Remote Sensing (RS) single-image super-resolution aims to reconstruct high-resolution imagery from low-resolution observations while preserving fine spatial structures. Recent Swin Transformer-based models, including Swin2SR, provide strong…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Md Aminur Hossain , Parekh Valkesh , Ayush V. Patel , Yogesh Jethani , Sanjay K. Singh , Biplab Banerjee

SparseTT: Visual Tracking with Sparse Transformers

Transformers have been successfully applied to the visual tracking task and significantly promote tracking performance. The self-attention mechanism designed to model long-range dependencies is the key to the success of Transformers.…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Zhihong Fu , Zehua Fu , Qingjie Liu , Wenrui Cai , Yunhong Wang

High-Performance Transformer Tracking

Correlation has a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion method that considers the similarity between the template and the search region.…

Computer Vision and Pattern Recognition · Computer Science 2022-11-24 Xin Chen , Bin Yan , Jiawen Zhu , Huchuan Lu , Xiang Ruan , Dong Wang

Hybrid Focal and Full-Range Attention Based Graph Transformers

The paradigm of Transformers using the self-attention mechanism has manifested its advantage in learning graph-structured data. Yet, Graph Transformers are capable of modeling full range dependencies but are often deficient in extracting…

Machine Learning · Computer Science 2024-09-11 Minhong Zhu , Zhenhao Zhao , Weiran Cai

Deformable Siamese Attention Networks for Visual Object Tracking

Siamese-based trackers have achieved excellent performance on visual object tracking. However, the target template is not updated online, and the features of the target template and search image are computed independently in a Siamese…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Yuechen Yu , Yilei Xiong , Weilin Huang , Matthew R. Scott

Shunted Self-Attention via Multi-Scale Token Aggregation

Recent Vision Transformer~(ViT) models have demonstrated encouraging results across various computer vision tasks, thanks to their competence in modeling long-range dependencies of image patches or tokens via self-attention. These models,…

Computer Vision and Pattern Recognition · Computer Science 2022-04-14 Sucheng Ren , Daquan Zhou , Shengfeng He , Jiashi Feng , Xinchao Wang

Spiking Wavelet Transformer

Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by emulating the event-driven processing manner of the brain. Incorporating Transformers with SNNs has shown promise for accuracy. However,…

Neural and Evolutionary Computing · Computer Science 2024-09-05 Yuetong Fang , Ziqing Wang , Lingfeng Zhang , Jiahang Cao , Honglei Chen , Renjing Xu

SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning

Recent Few-Shot Learning (FSL) methods put emphasis on generating a discriminative embedding features to precisely measure the similarity between support and query sets. Current CNN-based cross-attention approaches generate discriminative…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Jinxiang Lai , Siqian Yang , Wenlong Wu , Tao Wu , Guannan Jiang , Xi Wang , Jun Liu , Bin-Bin Gao , Wei Zhang , Yuan Xie , Chengjie Wang

Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

In medical images, various types of lesions often manifest significant differences in their shape and texture. Accurate medical image segmentation demands deep learning models with robust capabilities in multi-scale and boundary feature…

Image and Video Processing · Electrical Eng. & Systems 2024-08-20 Zhenhuan Zhou , Along He , Yanlin Wu , Rui Yao , Xueshuo Xie , Tao Li

Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution

Transformer-based models have achieved remarkable results in low-level vision tasks including image super-resolution (SR). However, early Transformer-based approaches that rely on self-attention within non-overlapping windows encounter…

Image and Video Processing · Electrical Eng. & Systems 2024-04-18 Cansu Korkmaz , A. Murat Tekalp

SFANet: Spatial-Frequency Attention Network for Deepfake Detection

Detecting manipulated media has now become a pressing issue with the recent rise of deepfakes. Most existing approaches fail to generalize across diverse datasets and generation techniques. We thus propose a novel ensemble framework,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Vrushank Ahire , Aniruddh Muley , Shivam Zample , Siddharth Verma , Pranav Menon , Surbhi Madan , Abhinav Dhall

SCTransNet: Spatial-channel Cross Transformer Network for Infrared Small Target Detection

Infrared small target detection (IRSTD) has recently benefitted greatly from U-shaped neural models. However, largely overlooking effective global information modeling, existing techniques struggle when the target has high similarities with…

Computer Vision and Pattern Recognition · Computer Science 2024-05-01 Shuai Yuan , Hanlin Qin , Xiang Yan , Naveed AKhtar , Ajmal Mian

Vision Transformer with Deformable Attention

Transformers have recently shown superior performances on various vision tasks. The large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts. Nevertheless, simply…

Computer Vision and Pattern Recognition · Computer Science 2022-05-25 Zhuofan Xia , Xuran Pan , Shiji Song , Li Erran Li , Gao Huang

Spatial-Frequency Attention for Image Denoising

The recently developed transformer networks have achieved impressive performance in image denoising by exploiting the self-attention (SA) in images. However, the existing methods mostly use a relatively small window to compute SA due to the…

Computer Vision and Pattern Recognition · Computer Science 2023-02-28 Shi Guo , Hongwei Yong , Xindong Zhang , Jianqi Ma , Lei Zhang