English
Related papers

Related papers: Patch-level Representation Learning for Self-super…

200 papers

Vision Transformers (ViTs) enabled the use of the transformer architecture on vision tasks showing impressive performances when trained on big datasets. However, on relatively small datasets, ViTs are less accurate given their lack of…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Guglielmo Camporese , Elena Izzo , Lamberto Ballan

Self-supervision has shown outstanding results for natural language processing, and more recently, for image recognition. Simultaneously, vision transformers and its variants have emerged as a promising and scalable alternative to…

Computer Vision and Pattern Recognition · Computer Science 2022-02-01 Prarthana Bhattacharyya , Chenge Li , Xiaonan Zhao , István Fehérvári , Jason Sun

Advances in deep learning are re-defining how visual data is processed and understand by the machines. Vision Transformers (ViTs) have recently demonstrated prominent performance in computer vision related tasks. However, their performance…

Vision Transformers (ViTs) dominate self-supervised learning (SSL). While they have proven highly effective for large-scale pretraining, they are computationally inefficient and scale poorly with image size. Consequently, foundational…

Computer Vision and Pattern Recognition · Computer Science 2026-04-23 Nedyalko Prisadnikov , Danda Pani Paudel , Yuqian Fu , Luc Van Gool

Self-Supervised Learning (SSL) for Vision Transformers (ViTs) has recently demonstrated considerable potential as a pre-training strategy for a variety of computer vision tasks, including image classification and segmentation, both in…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Yannis Kaltampanidis , Alexandros Doumanoglou , Dimitrios Zarpalas

Self-supervised learning (SSL) has attracted much interest in remote sensing and earth observation due to its ability to learn task-agnostic representations without human annotation. While most of the existing SSL works in remote sensing…

Computer Vision and Pattern Recognition · Computer Science 2022-06-15 Yi Wang , Conrad M Albrecht , Xiao Xiang Zhu

In this paper, we present a comparative analysis of various self-supervised Vision Transformers (ViTs), focusing on their local representative power. Inspired by large language models, we examine the abilities of ViTs to perform various…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Ani Vanyan , Alvard Barseghyan , Hakob Tamazyan , Vahan Huroyan , Hrant Khachatrian , Martin Danelljan

Self-supervised learning (SSL) has emerged as a powerful technique for learning visual representations. While recent SSL approaches achieve strong results in global image understanding, they are limited in capturing the structured…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Oussama Hadjerci , Antoine Letienne , Mohamed Abbas Hedjazi , Adel Hafiane

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Mathilde Caron , Hugo Touvron , Ishan Misra , Hervé Jégou , Julien Mairal , Piotr Bojanowski , Armand Joulin

Visual domain adaptation (DA) seeks to transfer trained models to unseen, unlabeled domains across distribution shift, but approaches typically focus on adapting convolutional neural network architectures initialized with supervised…

Computer Vision and Pattern Recognition · Computer Science 2022-06-17 Viraj Prabhu , Sriram Yenamandra , Aaditya Singh , Judy Hoffman

We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT) pretrained with self-supervision. Our proposed method takes token representations from the self-supervised ViT…

Computer Vision and Pattern Recognition · Computer Science 2023-07-10 Dahyun Kang , Piotr Koniusz , Minsu Cho , Naila Murray

Self-supervised visual representation learning traditionally focuses on image-level instance discrimination. Our study introduces an innovative, fine-grained dimension by integrating patch-level discrimination into these methodologies. This…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Ali Javidani , Mohammad Amin Sadeghi , Babak Nadjar Araabi

The few-shot learning ability of vision transformers (ViTs) is rarely investigated though heavily desired. In this work, we empirically find that with the same few-shot learning frameworks, \eg~Meta-Baseline, replacing the widely used CNN…

Computer Vision and Pattern Recognition · Computer Science 2022-06-10 Bowen Dong , Pan Zhou , Shuicheng Yan , Wangmeng Zuo

The emergence of vision transformers (ViTs) in image classification has shifted the methodologies for visual representation learning. In particular, ViTs learn visual representation at full receptive field per layer across all the image…

Computer Vision and Pattern Recognition · Computer Science 2024-08-05 Li Zhang , Jiachen Lu , Sixiao Zheng , Xinxuan Zhao , Xiatian Zhu , Yanwei Fu , Tao Xiang , Jianfeng Feng , Philip H. S. Torr

Vision transformers (ViTs) encoding an image as a sequence of patches bring new paradigms for semantic segmentation.We present an efficient framework of representation separation in local-patch level and global-region level for semantic…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Yuanduo Hong , Huihui Pan , Weichao Sun , Xinghu Yu , Huijun Gao

Self-supervised learning (SSL) has produced a diverse landscape of vision transformers (ViTs) whose pretrained representations support a wide range of downstream tasks. Towards a better understanding of these models, a body of work has…

Computer Vision and Pattern Recognition · Computer Science 2026-05-29 Xiaoyan Yu , Lisa Mais , Jannik Franzen , Peter Hirsch , Nick Lechtenbörger , Andreas Mardt , Dagmar Kainmüller

Tokens or patches within Vision Transformers (ViT) lack essential semantic information, unlike their counterparts in natural language processing (NLP). Typically, ViT tokens are associated with rectangular image patches that lack specific…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 Young Kyung Kim , J. Matías Di Martino , Guillermo Sapiro

Self-supervised learning (SSL) with vision transformers (ViTs) has proven effective for representation learning as demonstrated by the impressive performance on various downstream tasks. Despite these successes, existing ViT-based SSL…

Computer Vision and Pattern Recognition · Computer Science 2024-06-21 Chaitanya Devaguptapu , Sumukh Aithal , Shrinivas Ramasubramanian , Moyuru Yamada , Manohar Kaul

This paper investigates the effectiveness of self-supervised pre-trained vision transformers (ViTs) compared to supervised pre-trained ViTs and conventional neural networks (ConvNets) for detecting facial deepfake images and videos. It…

Computer Vision and Pattern Recognition · Computer Science 2024-08-12 Huy H. Nguyen , Junichi Yamagishi , Isao Echizen

Recent advances in image-level self-supervised learning (SSL) have made significant progress, yet learning dense representations for patches remains challenging. Mainstream methods encounter an over-dispersion phenomenon that patches from…

Computer Vision and Pattern Recognition · Computer Science 2025-09-12 Peisong Wen , Qianqian Xu , Siran Dai , Runmin Cong , Qingming Huang
‹ Prev 1 2 3 10 Next ›