Related papers: BatchFormerV2: Exploring Sample Relationships for …

BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning

Despite the success of deep neural networks, there are still many challenges in deep representation learning due to the data scarcity issues such as data imbalance, unseen distribution, and domain shift. To address the above-mentioned…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Zhi Hou , Baosheng Yu , Dacheng Tao

Rethinking Batch Sample Relationships for Data Representation: A Batch-Graph Transformer based Approach

Exploring sample relationships within each mini-batch has shown great potential for learning image representations. Existing works generally adopt the regular Transformer to model the visual content relationships, ignoring the cues of…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Xixi Wang , Bo Jiang , Xiao Wang , Bin Luo

Vision Transformer with Deformable Attention

Transformers have recently shown superior performances on various vision tasks. The large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts. Nevertheless, simply…

Computer Vision and Pattern Recognition · Computer Science 2022-05-25 Zhuofan Xia , Xuran Pan , Shiji Song , Li Erran Li , Gao Huang

A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks

Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. Unlike conventional neural networks or updated versions of Recurrent Neural Networks (RNNs) such…

Machine Learning · Computer Science 2023-06-14 Saidul Islam , Hanae Elmekki , Ahmed Elsebai , Jamal Bentahar , Najat Drawel , Gaith Rjoub , Witold Pedrycz

Dense Transformer Networks

The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by…

Computer Vision and Pattern Recognition · Computer Science 2017-06-09 Jun Li , Yongjun Chen , Lei Cai , Ian Davidson , Shuiwang Ji

Vision Transformers for Dense Prediction

We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks. We assemble tokens from various stages of the vision transformer into…

Computer Vision and Pattern Recognition · Computer Science 2021-03-26 René Ranftl , Alexey Bochkovskiy , Vladlen Koltun

Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

Attention-based models such as transformers have shown outstanding performance on dense prediction tasks, such as semantic segmentation, owing to their capability of capturing long-range dependency in an image. However, the benefit of…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Ashutosh Agarwal , Chetan Arora

A Practical Survey on Faster and Lighter Transformers

Recurrent neural networks are effective models to process sequences. However, they are unable to learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, a model…

Machine Learning · Computer Science 2023-03-28 Quentin Fournier , Gaétan Marceau Caron , Daniel Aloise

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

This paper does not attempt to design a state-of-the-art method for visual recognition but investigates a more efficient way to make use of convolutions to encode spatial features. By comparing the design principles of the recent…

Computer Vision and Pattern Recognition · Computer Science 2022-11-23 Qibin Hou , Cheng-Ze Lu , Ming-Ming Cheng , Jiashi Feng

Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation

Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query. We…

Computer Vision and Pattern Recognition · Computer Science 2023-07-31 Yiming Cui , Linjie Yang , Haichao Yu

DuoFormer: Leveraging Hierarchical Representations by Local and Global Attention Vision Transformer

Despite the widespread adoption of transformers in medical applications, the exploration of multi-scale learning through transformers remains limited, while hierarchical representations are considered advantageous for computer-aided medical…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Xiaoya Tang , Bodong Zhang , Man Minh Ho , Beatrice S. Knudsen , Tolga Tasdizen

Dense residual Transformer for image denoising

Image denoising is an important low-level computer vision task, which aims to reconstruct a noise-free and high-quality image from a noisy image. With the development of deep learning, convolutional neural network (CNN) has been gradually…

Computer Vision and Pattern Recognition · Computer Science 2022-05-17 Chao Yao , Shuo Jin , Meiqin Liu , Xiaojuan Ban

TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification

Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions, and the generalizability…

Computer Vision and Pattern Recognition · Computer Science 2021-12-08 Shengcai Liao , Ling Shao

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks

Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances. Since solutions to such problems do not depend on the order of elements of the set,…

Machine Learning · Computer Science 2019-05-28 Juho Lee , Yoonho Lee , Jungtaek Kim , Adam R. Kosiorek , Seungjin Choi , Yee Whye Teh

Deformer: Towards Displacement Field Learning for Unsupervised Medical Image Registration

Recently, deep-learning-based approaches have been widely studied for deformable image registration task. However, most efforts directly map the composite image representation to spatial transformation through the convolutional neural…

Image and Video Processing · Electrical Eng. & Systems 2022-07-08 Jiashun Chen , Donghuan Lu , Yu Zhang , Dong Wei , Munan Ning , Xinyu Shi , Zhe Xu , Yefeng Zheng

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

The recently developed vision transformer (ViT) has achieved promising results on image classification compared to convolutional neural networks. Inspired by this, in this paper, we study how to learn multi-scale feature representations in…

Computer Vision and Pattern Recognition · Computer Science 2021-08-24 Chun-Fu Chen , Quanfu Fan , Rameswar Panda

A Survey on Visual Transformer

Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Kai Han , Yunhe Wang , Hanting Chen , Xinghao Chen , Jianyuan Guo , Zhenhua Liu , Yehui Tang , An Xiao , Chunjing Xu , Yixing Xu , Zhaohui Yang , Yiman Zhang , Dacheng Tao

Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers

Much of recent Deep Reinforcement Learning success is owed to the neural architecture's potential to learn and use effective internal representations of the world. While many current algorithms access a simulator to train with a large…

Artificial Intelligence · Computer Science 2022-02-03 Amir Ardalan Kalantari , Mohammad Amini , Sarath Chandar , Doina Precup

PnP-DETR: Towards Efficient Visual Analysis with Transformers

Recently, DETR pioneered the solution of vision tasks with transformers, it directly translates the image feature map into the object detection result. Though effective, translating the full feature map can be costly due to redundant…

Computer Vision and Pattern Recognition · Computer Science 2022-03-03 Tao Wang , Li Yuan , Yunpeng Chen , Jiashi Feng , Shuicheng Yan

SDformer: Efficient End-to-End Transformer for Depth Completion

Depth completion aims to predict dense depth maps with sparse depth measurements from a depth sensor. Currently, Convolutional Neural Network (CNN) based models are the most popular methods applied to depth completion tasks. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-09-13 Jian Qian , Miao Sun , Ashley Lee , Jie Li , Shenglong Zhuo , Patrick Yin Chiang