English
Related papers

Related papers: Finding Distributed Object-Centric Properties in S…

200 papers

Many models of visual attention have been proposed so far. Traditional bottom-up models, like saliency models, fail to replicate human gaze patterns, and deep gaze prediction models lack biological plausibility due to their reliance on…

Neurons and Cognition · Quantitative Biology 2025-05-28 Takuto Yamamoto , Hirosato Akahoshi , Shigeru Kitazawa

Object-centric understanding is fundamental to human vision and required for complex reasoning. Traditional methods define slot-based bottlenecks to learn object properties explicitly, while recent self-supervised vision models like DINO…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Stefan Sylvius Wagner , Stefan Harmeling

In this paper, we propose a simple yet effective approach for self-supervised video object segmentation (VOS). Our key insight is that the inherent structural dependencies present in DINO-pretrained Transformers can be leveraged to…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Shuangrui Ding , Rui Qian , Haohang Xu , Dahua Lin , Hongkai Xiong

Vision foundation models trained with self-supervised objectives achieve strong performance across diverse tasks and exhibit emergent object segmentation properties. However, their alignment with human object perception remains poorly…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Hossein Adeli , Seoyoung Ahn , Andrew Luo , Mengmi Zhang , Nikolaus Kriegeskorte , Gregory Zelinsky

We present a novel method that extends the self-attention mechanism of a vision transformer (ViT) for more accurate object detection across diverse datasets. ViTs show strong capability for image understanding tasks such as object…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Tan Nguyen , Coy D. Heldermon , Corey Toler-Franklin

Unsupervised object discovery is becoming an essential line of research for tackling recognition problems that require decomposing an image into entities, such as semantic segmentation and object detection. Recently, object-centric methods…

Computer Vision and Pattern Recognition · Computer Science 2024-11-07 Rishav Pramanik , José-Fabian Villa-Vásquez , Marco Pedersoli

Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Yangtao Wang , Xi Shen , Shell Hu , Yuan Yuan , James Crowley , Dominique Vaufreydaz

The features of self-supervised vision transformers (ViTs) contain strong semantic and positional information relevant to downstream tasks like object localization and segmentation. Recent works combine these features with traditional…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Ronan Docherty , Antonis Vamvakeros , Samuel J. Cooper

In this paper, we present a comparative analysis of various self-supervised Vision Transformers (ViTs), focusing on their local representative power. Inspired by large language models, we examine the abilities of ViTs to perform various…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Ani Vanyan , Alvard Barseghyan , Hakob Tamazyan , Vahan Huroyan , Hrant Khachatrian , Martin Danelljan

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Mathilde Caron , Hugo Touvron , Ishan Misra , Hervé Jégou , Julien Mairal , Piotr Bojanowski , Armand Joulin

Vision transformers (ViTs) have been successfully applied in image classification tasks recently. In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Daquan Zhou , Bingyi Kang , Xiaojie Jin , Linjie Yang , Xiaochen Lian , Zihang Jiang , Qibin Hou , Jiashi Feng

Recent self-supervised learning (SSL) methods have shown impressive results in learning visual representations from unlabeled images. This paper aims to improve their performance further by utilizing the architectural advantages of the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-20 Sukmin Yun , Hankook Lee , Jaehyung Kim , Jinwoo Shin

We study the use of deep features extracted from a pretrained Vision Transformer (ViT) as dense visual descriptors. We observe and empirically demonstrate that such features, when extractedfrom a self-supervised ViT model (DINO-ViT),…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Shir Amir , Yossi Gandelsman , Shai Bagon , Tali Dekel

We propose an adaptation to the training of Vision Transformers (ViTs) that allows for an explicit modeling of objects during the attention computation. This is achieved by adding a new branch to selected attention layers that computes an…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Vivek Trivedy , Amani Almalki , Longin Jan Latecki

Unsupervised object discovery, the task of identifying and localizing objects in images without human-annotated labels, remains a significant challenge and a growing focus in computer vision. In this work, we introduce a novel model, DADO…

Computer Vision and Pattern Recognition · Computer Science 2025-10-09 Federico Gonzalez , Estefania Talavera , Petia Radeva

Small Object Detection (SOD) poses significant challenges due to limited information and the model's low class prediction score. While Transformer-based detectors have shown promising performance, their potential for SOD remains largely…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Guiping Cao , Wenjian Huang , Xiangyuan Lan , Jianguo Zhang , Dongmei Jiang , Yaowei Wang

Object binding, the brain's ability to bind the many features that collectively represent an object into a coherent whole, is central to human cognition. It groups low-level perceptual features into high-level object representations, stores…

Computer Vision and Pattern Recognition · Computer Science 2026-01-22 Yihao Li , Saeed Salehi , Lyle Ungar , Konrad P. Kording

Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs. Yet, they generally require much more data for model pre-training. Most of recent works thus are dedicated to designing more…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Daquan Zhou , Yujun Shi , Bingyi Kang , Weihao Yu , Zihang Jiang , Yuan Li , Xiaojie Jin , Qibin Hou , Jiashi Feng

Open-Vocabulary Segmentation (OVS) aims at segmenting images from free-form textual concepts without predefined training classes. While existing vision-language models such as CLIP can generate segmentation masks by leveraging coarse…

Computer Vision and Pattern Recognition · Computer Science 2025-09-17 Luca Barsellotti , Lorenzo Bianchi , Nicola Messina , Fabio Carrara , Marcella Cornia , Lorenzo Baraldi , Fabrizio Falchi , Rita Cucchiara

Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers, resulting in the vanishing of low-level visual features. However, such features can be helpful to accurately represent and identify…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Anxhelo Diko , Danilo Avola , Marco Cascio , Luigi Cinque
‹ Prev 1 2 3 10 Next ›