Related papers: Gaze Estimation using Transformer

A survey of the Vision Transformers and their CNN-Transformer based Variants

Vision transformers have become popular as a possible substitute to convolutional neural networks (CNNs) for a variety of computer vision applications. These transformers, with their ability to focus on global relationships in images, offer…

Computer Vision and Pattern Recognition · Computer Science 2024-07-30 Asifullah Khan , Zunaira Rauf , Anabia Sohail , Abdul Rehman , Hifsa Asif , Aqsa Asif , Umair Farooq

Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work

Vision Transformers (ViTs) are becoming more popular and dominating technique for various vision tasks, compare to Convolutional Neural Networks (CNNs). As a demanding technique in computer vision, ViTs have been successfully solved various…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Khawar Islam

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Self-attention in Transformers comes with a high computational cost because of their quadratic computational complexity, but their effectiveness in addressing problems in language and vision has sparked extensive research aimed at enhancing…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Tobias Christian Nauen , Sebastian Palacio , Federico Raue , Andreas Dengel

A Survey on Visual Transformer

Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Kai Han , Yunhe Wang , Hanting Chen , Xinghao Chen , Jianyuan Guo , Zhenhua Liu , Yehui Tang , An Xiao , Chunjing Xu , Yixing Xu , Zhaohui Yang , Yiman Zhang , Dacheng Tao

On Convolutional Vision Transformers for Yield Prediction

While a variety of methods offer good yield prediction on histogrammed remote sensing data, vision Transformers are only sparsely represented in the literature. The Convolution vision Transformer (CvT) is being tested to evaluate vision…

Computer Vision and Pattern Recognition · Computer Science 2024-02-09 Alvin Inderka , Florian Huber , Volker Steinhage

A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos

Eye-tracking applications that utilize the human gaze in video understanding tasks have become increasingly important. To effectively automate the process of video analysis based on eye-tracking data, it is important to accurately replicate…

Computer Vision and Pattern Recognition · Computer Science 2024-04-15 Suleyman Ozdel , Yao Rong , Berat Mert Albaba , Yen-Ling Kuo , Xi Wang , Enkelejda Kasneci

Hands-on Evaluation of Visual Transformers for Object Recognition and Detection

Convolutional Neural Networks (CNNs) for computer vision sometimes struggle with understanding images in a global context, as they mainly focus on local patterns. On the other hand, Vision Transformers (ViTs), inspired by models originally…

Computer Vision and Pattern Recognition · Computer Science 2025-12-11 Dimitrios N. Vlachogiannis , Dimitrios A. Koutsomitropoulos

Efficiency in Real-time Webcam Gaze Tracking

Efficiency and ease of use are essential for practical applications of camera based eye/gaze-tracking. Gaze tracking involves estimating where a person is looking on a screen based on face images from a computer-facing camera. In this paper…

Computer Vision and Pattern Recognition · Computer Science 2020-09-04 Amogh Gudi , Xin Li , Jan van Gemert

Boosting vision transformers for image retrieval

Vision transformers have achieved remarkable progress in vision tasks such as image classification and detection. However, in instance-level image retrieval, transformers have not yet shown good performance compared to convolutional…

Computer Vision and Pattern Recognition · Computer Science 2022-10-24 Chull Hwan Song , Jooyoung Yoon , Shunghyun Choi , Yannis Avrithis

Gaze-Informed Vision Transformers: Predicting Driving Decisions Under Uncertainty

Vision Transformers (ViT) have advanced computer vision, yet their efficacy in complex tasks like driving remains less explored. This study enhances ViT by integrating human eye gaze, captured via eye-tracking, to increase prediction…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Sharath Koorathota , Nikolas Papadopoulos , Jia Li Ma , Shruti Kumar , Xiaoxiao Sun , Arunesh Mittal , Patrick Adelman , Paul Sajda

Glance-and-Gaze Vision Transformer

Recently, there emerges a series of vision Transformers, which show superior performance with a more compact model size than conventional convolutional neural networks, thanks to the strong ability of Transformers to model long-range…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Qihang Yu , Yingda Xia , Yutong Bai , Yongyi Lu , Alan Yuille , Wei Shen

Transformers For Recognition In Overhead Imagery: A Reality Check

There is evidence that transformers offer state-of-the-art recognition performance on tasks involving overhead imagery (e.g., satellite imagery). However, it is difficult to make unbiased empirical comparisons between competing deep…

Computer Vision and Pattern Recognition · Computer Science 2022-11-02 Francesco Luzi , Aneesh Gupta , Leslie Collins , Kyle Bradbury , Jordan Malof

PE-former: Pose Estimation Transformer

Vision transformer architectures have been demonstrated to work very effectively for image classification tasks. Efforts to solve more challenging vision tasks with transformers rely on convolutional backbones for feature extraction. In…

Computer Vision and Pattern Recognition · Computer Science 2021-12-10 Paschalis Panteleris , Antonis Argyros

Transformed CNNs: recasting pre-trained convolutional layers with self-attention

Vision Transformers (ViT) have recently emerged as a powerful alternative to convolutional networks (CNNs). Although hybrid models attempt to bridge the gap between these two architectures, the self-attention layers they rely on induce a…

Machine Learning · Computer Science 2021-06-11 Stéphane d'Ascoli , Levent Sagun , Giulio Biroli , Ari Morcos

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Dynamic attention mechanism and global modeling ability make Transformer show strong feature learning ability. In recent years, Transformer has become comparable to CNNs methods in computer vision. This review mainly investigates the…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Yuting Yang , Licheng Jiao , Xu Liu , Fang Liu , Shuyuan Yang , Zhixi Feng , Xu Tang

Evaluating Graphical Perception Capabilities of Vision Transformers

Vision Transformers, ViTs, have emerged as a powerful alternative to convolutional neural networks, CNNs, in a variety of image-based tasks. While CNNs have previously been evaluated for their ability to perform graphical perception tasks,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-23 Poonam Poonam , Pere-Pau Vázquez , Timo Ropinski

3D Vision with Transformers: A Survey

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its…

Computer Vision and Pattern Recognition · Computer Science 2022-08-09 Jean Lahoud , Jiale Cao , Fahad Shahbaz Khan , Hisham Cholakkal , Rao Muhammad Anwer , Salman Khan , Ming-Hsuan Yang

Lightweight Vision Transformer with Cross Feature Attention

Recent advances in vision transformers (ViTs) have achieved great performance in visual recognition tasks. Convolutional neural networks (CNNs) exploit spatial inductive bias to learn visual representations, but these networks are spatially…

Computer Vision and Pattern Recognition · Computer Science 2023-07-06 Youpeng Zhao , Huadong Tang , Yingying Jiang , Yong A , Qiang Wu

HybridGazeNet: Geometric model guided Convolutional Neural Networks for gaze estimation

As a critical cue for understanding human intention, human gaze provides a key signal for Human-Computer Interaction(HCI) applications. Appearance-based gaze estimation, which directly regresses the gaze vector from eye images, has made…

Computer Vision and Pattern Recognition · Computer Science 2021-11-24 Shaobo Guo , Xiao Jiang , Zhizhong Su , Rui Wu , Xin Wang

Vision Transformers for Cosmological Fields: Application to Weak Lensing Mass Maps

Weak gravitational lensing is a powerful probe of the universe's growth history. While traditional two-point statistics capture only the Gaussian features of the convergence field, deep learning methods such as convolutional neural networks…

Cosmology and Nongalactic Astrophysics · Physics 2025-12-09 Jash Kakadia , Shubh Agrawal , Kunhao Zhong , Bhuvnesh Jain