Related papers: Multi-View Foundation Models

Deep Models for Multi-View 3D Object Recognition: A Review

Human decision-making often relies on visual information from multiple perspectives or views. In contrast, machine learning-based object recognition utilizes information from a single image of the object. However, the information conveyed…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Mona Alzahrani , Muhammad Usman , Salma Kammoun , Saeed Anwar , Tarek Helmy

Foundational Models for 3D Point Clouds: A Survey and Outlook

The 3D point cloud representation plays a crucial role in preserving the geometric fidelity of the physical world, enabling more accurate complex 3D environments. While humans naturally comprehend the intricate relationships between objects…

Computer Vision and Pattern Recognition · Computer Science 2025-01-31 Vishal Thengane , Xiatian Zhu , Salim Bouzerdoum , Son Lam Phung , Yunpeng Li

3D-LFM: Lifting Foundation Model

The lifting of 3D structure and camera from 2D landmarks is at the cornerstone of the entire discipline of computer vision. Traditional methods have been confined to specific rigid objects, such as those in Perspective-n-Point (PnP)…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Mosam Dabhi , Laszlo A. Jeni , Simon Lucey

Evaluating Foundation Models' 3D Understanding Through Multi-View Correspondence Analysis

Benchmarking 3D spatial understanding of foundation models is essential for real-world applications such as robotics and autonomous driving. Existing evaluations often rely on downstream fine-tuning with linear heads or task-specific…

Computer Vision and Pattern Recognition · Computer Science 2026-01-19 Valentina Lilova , Toyesh Chakravorty , Julian I. Bibo , Emma Boccaletti , Brandon Li , Lívia Baxová , Cees G. M. Snoek , Mohammadreza Salehi

Understanding Transformer-based Vision Models through Inversion

Understanding the mechanisms underlying deep neural networks remains a fundamental challenge in machine learning and computer vision. One promising, yet only preliminarily explored approach, is feature inversion, which attempts to…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Jan Rathjens , Shirin Reyhanian , David Kappel , Laurenz Wiskott

DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation

Vision foundation models (VFMs) trained on large-scale image datasets provide high-quality features that have significantly advanced 2D visual recognition. However, their potential in 3D scene segmentation remains largely untapped, despite…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Karim Knaebel , Kadir Yilmaz , Daan de Geus , Alexander Hermans , David Adrian , Timm Linder , Bastian Leibe

MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models

Vision Foundation Models (VFMs) have become the cornerstone of modern computer vision, offering robust representations across a wide array of tasks. While recent advances allow these models to handle varying input sizes during training,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Bocheng Zou , Mu Cai , Mark Stanley , Dingfu Lu , Yong Jae Lee

Metric-Guided Feature Fusion of Visual Foundation Models for Segmentation Tasks

Although large-scale visual foundation models (VFMs) achieve remarkable performance in semantic understanding, they still underperform in instance-aware dense prediction tasks. They exhibit different biases in representation: for instance,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Yachan Guo , JoseLuis Gomez Zurita , Danna Xue , Yi Xiao , AntonioManuel Lopez Pena

Single-Frame based Deep View Synchronization for Unsynchronized Multi-Camera Surveillance

Multi-camera surveillance has been an active research topic for understanding and modeling scenes. Compared to a single camera, multi-cameras provide larger field-of-view and more object cues, and the related applications are multi-view…

Computer Vision and Pattern Recognition · Computer Science 2022-05-03 Qi Zhang , Antoni B. Chan

Probing the 3D Awareness of Visual Foundation Models

Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their intermediate representations are useful for…

Computer Vision and Pattern Recognition · Computer Science 2024-04-15 Mohamed El Banani , Amit Raj , Kevis-Kokitsi Maninis , Abhishek Kar , Yuanzhen Li , Michael Rubinstein , Deqing Sun , Leonidas Guibas , Justin Johnson , Varun Jampani

A Genealogy of Foundation Models in Remote Sensing

Foundation models have garnered increasing attention for representation learning in remote sensing. Many such foundation models adopt approaches that have demonstrated success in computer vision with minimal domain-specific modification.…

Computer Vision and Pattern Recognition · Computer Science 2026-01-28 Kevin Lane , Morteza Karimzadeh

Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models

In this paper, we analyze the viewpoint stability of foundational models - specifically, their sensitivity to changes in viewpoint- and define instability as significant feature variations resulting from minor changes in viewing angle,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-31 Mateusz Michalkiewicz , Sheena Bai , Mahsa Baktashmotlagh , Varun Jampani , Guha Balakrishnan

Diffusion Models in 3D Vision: A Survey

In recent years, 3D vision has become a crucial field within computer vision, powering a wide range of applications such as autonomous driving, robotics, augmented reality, and medical imaging. This field relies on accurate perception,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-02 Zhen Wang , Dongyuan Li , Yaozu Wu , Tianyu He , Jiang Bian , Renhe Jiang

MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View

Generating consistent multiple views for 3D reconstruction tasks is still a challenge to existing image-to-3D diffusion models. Generally, incorporating 3D representations into diffusion model decrease the model's speed as well as…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Emmanuelle Bourigault , Pauline Bourigault

3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection

3D visual perception tasks based on multi-camera images are essential for autonomous driving systems. Latest work in this field performs 3D object detection by leveraging multi-view images as an input and iteratively enhancing object…

Computer Vision and Pattern Recognition · Computer Science 2023-07-31 Jongwoo Park , Apoorv Singh , Varun Bankiti

Enhancing Representation in Medical Vision-Language Foundation Models via Multi-Scale Information Extraction Techniques

The development of medical vision-language foundation models has attracted significant attention in the field of medicine and healthcare due to their promising prospect in various clinical applications. While previous studies have commonly…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Weijian Huang , Cheng Li , Hong-Yu Zhou , Jiarun Liu , Hao Yang , Yong Liang , Guangming Shi , Hairong Zheng , Shanshan Wang

SegMASt3R: Geometry Grounded Segment Matching

Segment matching is an important intermediate task in computer vision that establishes correspondences between semantically or geometrically coherent regions across images. Unlike keypoint matching, which focuses on localized features,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Rohit Jayanti , Swayam Agrawal , Vansh Garg , Siddharth Tourani , Muhammad Haris Khan , Sourav Garg , Madhava Krishna

Bridging the Gap Between Multimodal Foundation Models and World Models

Humans understand the world through the integration of multiple sensory modalities, enabling them to perceive, reason about, and imagine dynamic physical processes. Inspired by this capability, multimodal foundation models (MFMs) have…

Artificial Intelligence · Computer Science 2025-10-07 Xuehai He

Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation

Vision Foundation Models (VFMs) have become a de facto choice for many downstream vision tasks, like image classification, image segmentation, and object localization. However, they can also provide significant utility for downstream 3D…

Computer Vision and Pattern Recognition · Computer Science 2025-04-22 Johannes Spoecklberger , Wei Lin , Pedro Hermosilla , Sivan Doveh , Horst Possegger , M. Jehanzeb Mirza

Foundation Models in Medical Imaging: A Review and Outlook

Foundation models (FMs) are changing the way medical images are analyzed by learning from large collections of unlabeled data. Instead of relying on manually annotated examples, FMs are pre-trained to learn general-purpose visual features…

Image and Video Processing · Electrical Eng. & Systems 2025-11-19 Vivien van Veldhuizen , Vanessa Botha , Chunyao Lu , Melis Erdal Cesur , Kevin Groot Lipman , Edwin D. de Jong , Hugo Horlings , Clárisa I. Sanchez , Cees G. M. Snoek , Lodewyk Wessels , Ritse Mann , Eric Marcus , Jonas Teuwen