English
Related papers

Related papers: Disentangling Visual Transformers: Patch-level Int…

200 papers

Vision Transformers (ViTs) have become prominent models for solving various vision tasks. However, the interpretability of ViTs has not kept pace with their promising performance. While there has been a surge of interest in developing {\it…

Computer Vision and Pattern Recognition · Computer Science 2025-05-02 Yao Qiang , Chengyin Li , Prashant Khanduri , Dongxiao Zhu

Transformer has been applied in the field of computer vision due to its excellent performance in natural language processing, surpassing traditional convolutional neural networks and achieving new state-of-the-art. ViT divides an image into…

Computer Vision and Pattern Recognition · Computer Science 2024-04-23 Yuang Liu , Zhiheng Qiu , Xiaokai Qin

Neural networks have greatly boosted performance in computer vision by learning powerful representations of input data. The drawback of end-to-end training for maximal overall performance are black-box models whose hidden representations…

Computer Vision and Pattern Recognition · Computer Science 2020-04-29 Patrick Esser , Robin Rombach , Björn Ommer

Vision Transformers (ViTs) have achieved state-of-the-art performance in image classification, yet their attention mechanisms often remain opaque and exhibit dense, non-structured behaviors. In this work, we adapt our previously proposed…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Vasileios Arampatzakis , George Pavlidis , Nikolaos Mitianoudis , Nikos Papamarkos

Vision Transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions. Despite having…

Computer Vision and Pattern Recognition · Computer Science 2022-03-14 Jie Ma , Yalong Bai , Bineng Zhong , Wei Zhang , Ting Yao , Tao Mei

Vision Transformer (ViT) has brought new breakthroughs to the field of image classification by introducing the self-attention mechanism and Graph Convolutional Networks(GCN) have been proposed and successfully applied in data representation…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Haibin Jiao

The recently proposed Visual image Transformers (ViT) with pure attention have achieved promising performance on image recognition tasks, such as image classification. However, the routine of the current ViT model is to maintain a…

Computer Vision and Pattern Recognition · Computer Science 2021-08-19 Zizheng Pan , Bohan Zhuang , Jing Liu , Haoyu He , Jianfei Cai

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works…

Computer Vision and Pattern Recognition · Computer Science 2021-12-24 Zizheng Pan , Bohan Zhuang , Haoyu He , Jing Liu , Jianfei Cai

Mechanistic interpretability improves the safety, reliability, and robustness of large AI models. This study examined individual attention heads in vision transformers (ViTs) fine tuned on distorted 2D spectrogram images containing non…

Machine Learning · Computer Science 2025-03-25 Nooshin Bahador

Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well. In this paper, we explore the idea of nesting basic local transformers on non-overlapping…

Computer Vision and Pattern Recognition · Computer Science 2022-01-03 Zizhao Zhang , Han Zhang , Long Zhao , Ting Chen , Sercan O. Arik , Tomas Pfister

Explainability is a highly demanded requirement for applications in high-risk areas such as medicine. Vision Transformers have mainly been limited to attention extraction to provide insight into the model's reasoning. Our approach combines…

Computer Vision and Pattern Recognition · Computer Science 2025-02-14 Luisa Gallée , Catharina Silvia Lisson , Meinrad Beer , Michael Götz

We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a…

The emergence of vision transformers (ViTs) in image classification has shifted the methodologies for visual representation learning. In particular, ViTs learn visual representation at full receptive field per layer across all the image…

Computer Vision and Pattern Recognition · Computer Science 2024-08-05 Li Zhang , Jiachen Lu , Sixiao Zheng , Xinxuan Zhao , Xiatian Zhu , Yanwei Fu , Tao Xiang , Jianfeng Feng , Philip H. S. Torr

Unpaired image-to-image translation is to translate an image from a source domain to a target domain without paired training data. By utilizing CNN in extracting local semantics, various techniques have been developed to improve the…

Computer Vision and Pattern Recognition · Computer Science 2022-03-31 Wanfeng Zheng , Qiang Li , Guoxin Zhang , Pengfei Wan , Zhongyuan Wang

In the field of medical CT image processing, convolutional neural networks (CNNs) have been the dominant technique.Encoder-decoder CNNs utilise locality for efficiency, but they cannot simulate distant pixel interactions properly.Recent…

Image and Video Processing · Electrical Eng. & Systems 2022-11-03 Hongyang He , Feng Ziliang , Yuanhang Zheng , Shudong Huang , HaoBing Gao

The recently developed vision transformer (ViT) has achieved promising results on image classification compared to convolutional neural networks. Inspired by this, in this paper, we study how to learn multi-scale feature representations in…

Computer Vision and Pattern Recognition · Computer Science 2021-08-24 Chun-Fu Chen , Quanfu Fan , Rameswar Panda

This paper presents a novel knowledge distillation neural architecture leveraging efficient transformer networks for effective image classification. Natural images display intricate arrangements encompassing numerous extraneous elements.…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Dewan Tauhid Rahman , Yeahia Sarker , Antar Mazumder , Md. Shamim Anower

Although researchers' attention is more focused on the performance of Transformer models, the interpretation of Transformer can never be ignored. Gradient is widely utilized in Transformer interpretation. From the perspective of attention…

Artificial Intelligence · Computer Science 2026-05-13 Yongjin Cui , Xiaohui Fan , Huajun Chen

How do vision transformers (ViTs) represent and process the world? This paper addresses this long-standing question through the first systematic analysis of 6.6K features across all layers, extracted via sparse autoencoders, and by…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Jinyeong Kim , Junhyeok Kim , Yumin Shim , Joohyeok Kim , Sunyoung Jung , Seong Jae Hwang

One of the crucial challenges taken in document analysis is mathematical expression recognition. Unlike text recognition which only focuses on one-dimensional structure images, mathematical expression recognition is a much more complicated…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Anh Duy Le , Van Linh Pham , Vinh Loi Ly , Nam Quan Nguyen , Huu Thang Nguyen , Tuan Anh Tran
‹ Prev 1 2 3 10 Next ›