English
Related papers

Related papers: Vector-Quantized Vision Foundation Models for Obje…

200 papers

Object-centric learning (OCL) aims to learn structured scene representations that support compositional generalization and robustness to out-of-distribution (OOD) data. However, OCL models are often not evaluated regarding these goals.…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Krishnakant Singh , Simone Schaub-Meyer , Stefan Roth

Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. This approach underpins various aims, including out-of-distribution (OOD) generalization,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Alexander Rubinstein , Ameya Prabhu , Matthias Bethge , Seong Joon Oh

Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods…

Computer Vision and Pattern Recognition · Computer Science 2025-10-09 Hongjia Liu , Rongzhen Zhao , Haohan Chen , Joni Pajarinen

Representing images or videos as object-level feature vectors, rather than pixel-level feature maps, facilitates advanced visual tasks. Object-Centric Learning (OCL) primarily achieves this by reconstructing the input under the guidance of…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Rongzhen Zhao , Vivienne Wang , Juho Kannala , Joni Pajarinen

Object-centric (OC) representations, which model visual scenes as compositions of discrete objects, have the potential to be used in various downstream tasks to achieve systematic compositional generalization and facilitate reasoning.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Amir Mohammad Karimi Mamaghan , Samuele Papa , Karl Henrik Johansson , Stefan Bauer , Andrea Dittadi

This paper presents a detailed study of improving visual representations for vision language (VL) tasks and develops an improved object detection model to provide object-centric representations of images. Compared to the most widely used…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Pengchuan Zhang , Xiujun Li , Xiaowei Hu , Jianwei Yang , Lei Zhang , Lijuan Wang , Yejin Choi , Jianfeng Gao

Multimodal large language models (MLLMs) integrate image features from visual encoders with LLMs, demonstrating advanced comprehension capabilities. However, mainstream MLLMs are solely supervised by the next-token prediction of textual…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Yunnan Wang , Fan Lu , Kecheng Zheng , Ziyuan Huang , Ziqiang Li , Wenjun Zeng , Xin Jin

Object-Centric Learning (OCL) aims to discover objects in images or videos by reconstructing the input. Representative methods achieve this by reconstructing the input as its Variational Autoencoder (VAE) discrete representations, which…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Rongzhen Zhao , Vivienne Wang , Juho Kannala , Joni Pajarinen

Object-centric representation learning aims to decompose visual scenes into fixed-size vectors called "slots" or "object files", where each slot captures a distinct object. Current state-of-the-art object-centric models have shown…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Aniket Didolkar , Andrii Zadaianchuk , Rabiul Awal , Maximilian Seitzer , Efstratios Gavves , Aishwarya Agrawal

Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent…

Computer Vision and Pattern Recognition · Computer Science 2023-09-25 Ziyi Wu , Jingyu Hu , Wuyue Lu , Igor Gilitschenski , Animesh Garg

Similar to humans perceiving visual scenes as objects, Object-Centric Learning (OCL) can abstract dense images or videos into sparse object-level features. Transformer-based OCL handles complex textures well due to the decoding guidance of…

Computer Vision and Pattern Recognition · Computer Science 2024-12-23 Rongzhen Zhao , Vivienne Wang , Juho Kannala , Joni Pajarinen

Image fusion is a crucial technique in the field of computer vision, and its goal is to generate high-quality fused images and improve the performance of downstream tasks. However, existing fusion methods struggle to balance these two…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Hui Li , Congcong Bian , Zeyang Zhang , Xiaoning Song , Xi Li , Xiao-Jun Wu

Recent Large Vision Language Models (LVLMs) demonstrate promising capabilities in unifying visual understanding and generative modeling, enabling both accurate content understanding and flexible editing. However, current approaches treat…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Fan Yang , Yousong Zhu , Xin Li , Yufei Zhan , Hongyin Zhao , Shurong Zheng , Yaowei Wang , Ming Tang , Jinqiao Wang

Vision-Language Model (VLM) have gained widespread adoption in Open-Vocabulary (OV) object detection and segmentation tasks. Despite they have shown promise on OV-related tasks, their effectiveness in conventional vision tasks has thus far…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Yongchao Feng , Yajie Liu , Shuai Yang , Wenrui Cai , Jinqing Zhang , Qiqi Zhan , Ziyue Huang , Hongxi Yan , Qiao Wan , Chenguang Liu , Junzhe Wang , Jiahui Lv , Ziqi Liu , Tengyuan Shi , Qingjie Liu , Yunhong Wang

Unsupervised video Object-Centric Learning (OCL) is promising as it enables object-level scene representation and understanding as we humans do. Mainstream video OCL methods adopt a recurrent architecture: An aggregator aggregates current…

Computer Vision and Pattern Recognition · Computer Science 2026-04-20 Rongzhen Zhao , Jian Li , Juho Kannala , Joni Pajarinen

Vertical Federated Learning (VFL) enables collaborative analysis across parties holding complementary feature views of the same samples, yet existing approaches are largely restricted to distributed variants of $k$-means, requiring…

Machine Learning · Computer Science 2026-02-10 Bruno Belucci , Karim Lounici , Vladimir R. Kostic , Katia Meziani

Recent generalist vision-language models (VLMs) have demonstrated impressive reasoning capabilities across diverse multimodal tasks. However, these models still struggle with fine-grained object-level understanding and grounding. In terms…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Timothy Ossowski , Junjie Hu

Federated Learning (FL) is a distributed learning paradigm that can learn a global or personalized model from decentralized datasets on edge devices. However, in the computer vision domain, model performance in FL is far behind centralized…

One of the central challenges in visual place recognition (VPR) is learning a robust global representation that remains discriminative under large viewpoint changes, illumination variations, and severe domain shifts. While visual foundation…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Hanyu Zhu , Zhihao Zhan , Yuhang Ming , Liang Li , Dibo Hou , Javier Civera , Wanzeng Kong

The increasing demand for wireless communication underscores the need to optimize radio frequency spectrum utilization. An effective strategy for leveraging underutilized licensed frequency bands is cooperative spectrum sensing (CSS), which…

Machine Learning · Computer Science 2023-12-19 Heqiang Wang , Jie Xu
‹ Prev 1 2 3 10 Next ›