Related papers: Interpretability-Aware Vision Transformer

Interpretable Vision Transformers in Image Classification via SVDA

Vision Transformers (ViTs) have achieved state-of-the-art performance in image classification, yet their attention mechanisms often remain opaque and exhibit dense, non-structured behaviors. In this work, we adapt our previously proposed…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Vasileios Arampatzakis , George Pavlidis , Nikolaos Mitianoudis , Nikos Papamarkos

Towards Evaluating Explanations of Vision Transformers for Medical Imaging

As deep learning models increasingly find applications in critical domains such as medical imaging, the need for transparent and trustworthy decision-making becomes paramount. Many explainability methods provide insights into how these…

Computer Vision and Pattern Recognition · Computer Science 2023-11-09 Piotr Komorowski , Hubert Baniecki , Przemysław Biecek

ViTmiX: Vision Transformer Explainability Augmented by Mixed Visualization Methods

Recent advancements in Vision Transformers (ViT) have demonstrated exceptional results in various visual recognition tasks, owing to their ability to capture long-range dependencies in images through self-attention mechanisms. However, the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-20 Eduard Hogea , Darian M. Onchis , Ana Coporan , Adina Magda Florea , Codruta Istin

How Does Attention Work in Vision Transformers? A Visual Analytics Attempt

Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the…

Machine Learning · Computer Science 2023-03-27 Yiran Li , Junpeng Wang , Xin Dai , Liang Wang , Chin-Chia Michael Yeh , Yan Zheng , Wei Zhang , Kwan-Liu Ma

Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects

Though vision transformers (ViTs) have achieved state-of-the-art performance in a variety of settings, they exhibit surprising failures when performing tasks involving visual relations. This begs the question: how do ViTs attempt to perform…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Michael A. Lepori , Alexa R. Tartaglini , Wai Keen Vong , Thomas Serre , Brenden M. Lake , Ellie Pavlick

Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention

Vision Transformer(ViT) is one of the most widely used models in the computer vision field with its great performance on various tasks. In order to fully utilize the ViT-based architecture in various applications, proper visualization…

Computer Vision and Pattern Recognition · Computer Science 2024-02-08 Saebom Leem , Hyunseok Seo

ASCENT-ViT: Attention-based Scale-aware Concept Learning Framework for Enhanced Alignment in Vision Transformers

As Vision Transformers (ViTs) are increasingly adopted in sensitive vision applications, there is a growing demand for improved interpretability. This has led to efforts to forward-align these models with carefully annotated abstract,…

Computer Vision and Pattern Recognition · Computer Science 2025-02-05 Sanchit Sinha , Guangzhi Xiong , Aidong Zhang

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Shoufa Chen , Chongjian Ge , Zhan Tong , Jiangliu Wang , Yibing Song , Jue Wang , Ping Luo

eX-ViT: A Novel eXplainable Vision Transformer for Weakly Supervised Semantic Segmentation

Recently vision transformer models have become prominent models for a range of vision tasks. These models, however, are usually opaque with weak feature interpretability. Moreover, there is no method currently built for an intrinsically…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Lu Yu , Wei Xiang , Juan Fang , Yi-Ping Phoebe Chen , Lianhua Chi

Interpreting vision transformers via residual replacement model

How do vision transformers (ViTs) represent and process the world? This paper addresses this long-standing question through the first systematic analysis of 6.6K features across all layers, extracted via sparse autoencoders, and by…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Jinyeong Kim , Junhyeok Kim , Yumin Shim , Joohyeok Kim , Sunyoung Jung , Seong Jae Hwang

Vision Transformer Adapter for Dense Predictions

This work investigates a simple yet powerful dense prediction task adapter for Vision Transformer (ViT). Unlike recently advanced variants that incorporate vision-specific inductive biases into their architectures, the plain ViT suffers…

Computer Vision and Pattern Recognition · Computer Science 2023-02-14 Zhe Chen , Yuchen Duan , Wenhai Wang , Junjun He , Tong Lu , Jifeng Dai , Yu Qiao

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

Conventional wisdom suggests that pre-training Vision Transformers (ViT) improves downstream performance by learning useful representations. Is this actually true? We investigate this question and find that the features and representations…

Machine Learning · Computer Science 2024-11-15 Alexander C. Li , Yuandong Tian , Beidi Chen , Deepak Pathak , Xinlei Chen

Effective Vision Transformer Training: A Data-Centric Perspective

Vision Transformers (ViTs) have shown promising performance compared with Convolutional Neural Networks (CNNs), but the training of ViTs is much harder than CNNs. In this paper, we define several metrics, including Dynamic Data Proportion…

Computer Vision and Pattern Recognition · Computer Science 2022-09-30 Benjia Zhou , Pichao Wang , Jun Wan , Yanyan Liang , Fan Wang

Disentangling Visual Transformers: Patch-level Interpretability for Image Classification

Visual transformers have achieved remarkable performance in image classification tasks, but this performance gain has come at the cost of interpretability. One of the main obstacles to the interpretation of transformers is the…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Guillaume Jeanneret , Loïc Simon , Frédéric Jurie

Visualizing and Understanding Patch Interactions in Vision Transformer

Vision Transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions. Despite having…

Computer Vision and Pattern Recognition · Computer Science 2022-03-14 Jie Ma , Yalong Bai , Bineng Zhong , Wei Zhang , Ting Yao , Tao Mei

Vision Transformers with Natural Language Semantics

Tokens or patches within Vision Transformers (ViT) lack essential semantic information, unlike their counterparts in natural language processing (NLP). Typically, ViT tokens are associated with rectangular image patches that lack specific…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 Young Kyung Kim , J. Matías Di Martino , Guillermo Sapiro

Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work

Vision Transformers (ViTs) are becoming more popular and dominating technique for various vision tasks, compare to Convolutional Neural Networks (CNNs). As a demanding technique in computer vision, ViTs have been successfully solved various…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Khawar Islam

Vision Transformers: From Semantic Segmentation to Dense Prediction

The emergence of vision transformers (ViTs) in image classification has shifted the methodologies for visual representation learning. In particular, ViTs learn visual representation at full receptive field per layer across all the image…

Computer Vision and Pattern Recognition · Computer Science 2024-08-05 Li Zhang , Jiachen Lu , Sixiao Zheng , Xinxuan Zhao , Xiatian Zhu , Yanwei Fu , Tao Xiang , Jianfeng Feng , Philip H. S. Torr

Fine-tuning Vision Transformers for the Prediction of State Variables in Ising Models

Transformers are state-of-the-art deep learning models that are composed of stacked attention and point-wise, fully connected layers designed for handling sequential data. Transformers are not only ubiquitous throughout Natural Language…

Computer Vision and Pattern Recognition · Computer Science 2021-12-01 Onur Kara , Arijit Sehanobish , Hector H Corzo

Multi-Attribute Vision Transformers are Efficient and Robust Learners

Since their inception, Vision Transformers (ViTs) have emerged as a compelling alternative to Convolutional Neural Networks (CNNs) across a wide spectrum of tasks. ViTs exhibit notable characteristics, including global attention, resilience…

Computer Vision and Pattern Recognition · Computer Science 2024-07-22 Hanan Gani , Nada Saadi , Noor Hussein , Karthik Nandakumar