Related papers: Vector-Quantized Vision Foundation Models for Obje…

Evaluating Object-Centric Models beyond Object Discovery

Object-centric learning (OCL) aims to learn structured scene representations that support compositional generalization and robustness to out-of-distribution (OOD) data. However, OCL models are often not evaluated regarding these goals.…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Krishnakant Singh , Simone Schaub-Meyer , Stefan Roth

Are We Done with Object-Centric Learning?

Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. This approach underpins various aims, including out-of-distribution (OOD) generalization,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Alexander Rubinstein , Ameya Prabhu , Matthias Bethge , Seong Joon Oh

MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning

Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods…

Computer Vision and Pattern Recognition · Computer Science 2025-10-09 Hongjia Liu , Rongzhen Zhao , Haohan Chen , Joni Pajarinen

Multi-Scale Fusion for Object Representation

Representing images or videos as object-level feature vectors, rather than pixel-level feature maps, facilitates advanced visual tasks. Object-Centric Learning (OCL) primarily achieves this by reconstructing the input under the guidance of…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Rongzhen Zhao , Vivienne Wang , Juho Kannala , Joni Pajarinen

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

Object-centric (OC) representations, which model visual scenes as compositions of discrete objects, have the potential to be used in various downstream tasks to achieve systematic compositional generalization and facilitate reasoning.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Amir Mohammad Karimi Mamaghan , Samuele Papa , Karl Henrik Johansson , Stefan Bauer , Andrea Dittadi

VinVL: Revisiting Visual Representations in Vision-Language Models

This paper presents a detailed study of improving visual representations for vision language (VL) tasks and develops an improved object detection model to provide object-centric representations of images. Compared to the most widely used…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Pengchuan Zhang , Xiujun Li , Xiaowei Hu , Jianwei Yang , Lei Zhang , Lijuan Wang , Yejin Choi , Jianfeng Gao

Vision-Centric Activation and Coordination for Multimodal Large Language Models

Multimodal large language models (MLLMs) integrate image features from visual encoders with LLMs, demonstrating advanced comprehension capabilities. However, mainstream MLLMs are solely supervised by the next-token prediction of textual…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Yunnan Wang , Fan Lu , Kecheng Zheng , Ziyuan Huang , Ziqiang Li , Wenjun Zeng , Xin Jin

Grouped Discrete Representation for Object-Centric Learning

Object-Centric Learning (OCL) aims to discover objects in images or videos by reconstructing the input. Representative methods achieve this by reconstructing the input as its Variational Autoencoder (VAE) discrete representations, which…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Rongzhen Zhao , Vivienne Wang , Juho Kannala , Joni Pajarinen

CTRL-O: Language-Controllable Object-Centric Visual Representation Learning

Object-centric representation learning aims to decompose visual scenes into fixed-size vectors called "slots" or "object files", where each slot captures a distinct object. Current state-of-the-art object-centric models have shown…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Aniket Didolkar , Andrii Zadaianchuk , Rabiul Awal , Maximilian Seitzer , Efstratios Gavves , Aishwarya Agrawal

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent…

Computer Vision and Pattern Recognition · Computer Science 2023-09-25 Ziyi Wu , Jingyu Hu , Wuyue Lu , Igor Gilitschenski , Animesh Garg

Grouped Discrete Representation Guides Object-Centric Learning

Similar to humans perceiving visual scenes as objects, Object-Centric Learning (OCL) can abstract dense images or videos into sparse object-level features. Transformer-based OCL handles complex textures well due to the decoding guidance of…

Computer Vision and Pattern Recognition · Computer Science 2024-12-23 Rongzhen Zhao , Vivienne Wang , Juho Kannala , Joni Pajarinen

OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning

Image fusion is a crucial technique in the field of computer vision, and its goal is to generate high-quality fused images and improve the performance of downstream tasks. However, existing fusion methods struggle to balance these two…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Hui Li , Congcong Bian , Zeyang Zhang , Xiaoning Song , Xi Li , Xiao-Jun Wu

FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation

Recent Large Vision Language Models (LVLMs) demonstrate promising capabilities in unifying visual understanding and generative modeling, enabling both accurate content understanding and flexible editing. However, current approaches treat…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Fan Yang , Yousong Zhu , Xin Li , Yufei Zhan , Hongyin Zhao , Shurong Zheng , Yaowei Wang , Ming Tang , Jinqiao Wang

Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation

Vision-Language Model (VLM) have gained widespread adoption in Open-Vocabulary (OV) object detection and segmentation tasks. Despite they have shown promise on OV-related tasks, their effectiveness in conventional vision tasks has thus far…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Yongchao Feng , Yajie Liu , Shuai Yang , Wenrui Cai , Jinqing Zhang , Qiqi Zhan , Ziyue Huang , Hongxi Yan , Qiao Wan , Chenguang Liu , Junzhe Wang , Jiahui Lv , Ziqi Liu , Tengyuan Shi , Qingjie Liu , Yunhong Wang

Predicting Video Slot Attention Queries from Random Slot-Feature Pairs

Unsupervised video Object-Centric Learning (OCL) is promising as it enables object-level scene representation and understanding as we humans do. Mainstream video OCL methods adopt a recurrent architecture: An aggregator aggregates current…

Computer Vision and Pattern Recognition · Computer Science 2026-04-20 Rongzhen Zhao , Jian Li , Juho Kannala , Joni Pajarinen

VertCoHiRF: Decentralized Vertical Clustering Beyond k-means

Vertical Federated Learning (VFL) enables collaborative analysis across parties holding complementary feature views of the same samples, yet existing approaches are largely restricted to distributed variants of $k$-means, requiring…

Machine Learning · Computer Science 2026-02-10 Bruno Belucci , Karim Lounici , Vladimir R. Kostic , Katia Meziani

OLIVE: Object Level In-Context Visual Embeddings

Recent generalist vision-language models (VLMs) have demonstrated impressive reasoning capabilities across diverse multimodal tasks. However, these models still struggle with fine-grained object-level understanding and grounding. In terms…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Timothy Ossowski , Junjie Hu

FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks

Federated Learning (FL) is a distributed learning paradigm that can learn a global or personalized model from decentralized datasets on edge devices. However, in the computer vision domain, model performance in FL is far behind centralized…

Computer Vision and Pattern Recognition · Computer Science 2021-11-23 Chaoyang He , Alay Dilipbhai Shah , Zhenheng Tang , Di Fan1Adarshan Naiynar Sivashunmugam , Keerti Bhogaraju , Mita Shimpi , Li Shen , Xiaowen Chu , Mahdi Soltanolkotabi , Salman Avestimehr

DC-VLAQ: Query-Residual Aggregation for Robust Visual Place Recognition

One of the central challenges in visual place recognition (VPR) is learning a robust global representation that remains discriminative under large viewpoint changes, illumination variations, and severe domain shifts. While visual foundation…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Hanyu Zhu , Zhihao Zhan , Yuhang Ming , Liang Li , Dibo Hou , Javier Civera , Wanzeng Kong

Online Vertical Federated Learning for Cooperative Spectrum Sensing

The increasing demand for wireless communication underscores the need to optimize radio frequency spectrum utilization. An effective strategy for leveraging underutilized licensed frequency bands is cooperative spectrum sensing (CSS), which…

Machine Learning · Computer Science 2023-12-19 Heqiang Wang , Jie Xu