Related papers: Multiple View Performers for Shape Completion

Multi-view Pyramid Transformer: Look Coarser to See Broader

We propose Multi-view Pyramid Transformer (MVP), a scalable multi-view transformer architecture that directly reconstructs large 3D scenes from tens to hundreds of images in a single forward pass. Drawing on the idea of ``looking broader to…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Gyeongjin Kang , Seungkwon Yang , Seungtae Nam , Younggeun Lee , Jungwoo Kim , Eunbyung Park

Multiple View Geometry Transformers for 3D Human Pose Estimation

In this work, we aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation. Recent works have focused on end-to-end learning-based transformer designs, which struggle to resolve geometric information…

Computer Vision and Pattern Recognition · Computer Science 2023-11-21 Ziwei Liao , Jialiang Zhu , Chunyu Wang , Han Hu , Steven L. Waslander

Render4Completion: Synthesizing Multi-View Depth Maps for 3D Shape Completion

We propose a novel approach for 3D shape completion by synthesizing multi-view depth maps. While previous work for shape completion relies on volumetric representations, meshes, or point clouds, we propose to use multi-view depth maps from…

Computer Vision and Pattern Recognition · Computer Science 2019-09-24 Tao Hu , Zhizhong Han , Abhinav Shrivastava , Matthias Zwicker

ViewFormer: View Set Attention for Multi-view 3D Shape Understanding

This paper presents ViewFormer, a simple yet effective model for multi-view 3d shape recognition and retrieval. We systematically investigate the existing methods for aggregating multi-view information and propose a novel ``view set"…

Computer Vision and Pattern Recognition · Computer Science 2023-05-02 Hongyu Sun , Yongcai Wang , Peng Wang , Xudong Cai , Deying Li

ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers

3D occupancy, an advanced perception technology for driving scenarios, represents the entire scene without distinguishing between foreground and background by quantifying the physical space into a grid map. The widely adopted…

Computer Vision and Pattern Recognition · Computer Science 2024-07-15 Jinke Li , Xiao He , Chonghua Zhou , Xiaoqiang Cheng , Yang Wen , Dan Zhang

Mixture of Volumetric Primitives for Efficient Neural Rendering

Real-time rendering and animation of humans is a core function in games, movies, and telepresence applications. Existing methods have a number of drawbacks we aim to address with our work. Triangle meshes have difficulty modeling thin…

Graphics · Computer Science 2021-05-07 Stephen Lombardi , Tomas Simon , Gabriel Schwartz , Michael Zollhoefer , Yaser Sheikh , Jason Saragih

Direct Multi-view Multi-person 3D Pose Estimation

We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images. Instead of estimating 3D joint locations from costly volumetric representation or reconstructing the per-person 3D pose from multiple…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Tao Wang , Jianfeng Zhang , Yujun Cai , Shuicheng Yan , Jiashi Feng

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

This paper presents Volumetric Transformer Pose estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly…

Computer Vision and Pattern Recognition · Computer Science 2023-08-07 Yuxing Chen , Renshu Gu , Ouhan Huang , Gangyong Jia

3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

Recent works have shown that visual pretraining on egocentric datasets using masked autoencoders (MAE) can improve generalization for downstream robotics tasks. However, these approaches pretrain only on 2D images, while many robotics…

Robotics · Computer Science 2025-03-25 Shengyi Qian , Kaichun Mo , Valts Blukis , David F. Fouhey , Dieter Fox , Ankit Goyal

Multi-view 3D Reconstruction with Transformer

Deep CNN-based methods have so far achieved the state of the art results in multi-view 3D object reconstruction. Despite the considerable progress, the two core modules of these methods - multi-view feature extraction and fusion, are…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Dan Wang , Xinrui Cui , Xun Chen , Zhengxia Zou , Tianyang Shi , Septimiu Salcudean , Z. Jane Wang , Rabab Ward

Semantic Estimation of 3D Body Shape and Pose using Minimal Cameras

We aim to simultaneously estimate the 3D articulated pose and high fidelity volumetric occupancy of human performance, from multiple viewpoint video (MVV) with as few as two views. We use a multi-channel symmetric 3D convolutional…

Computer Vision and Pattern Recognition · Computer Science 2020-09-08 Andrew Gilbert , Matthew Trumble , Adrian Hilton , John Collomosse

3D Shape Completion with Multi-view Consistent Inference

3D shape completion is important to enable machines to perceive the complete geometry of objects from partial observations. To address this problem, view-based methods have been presented. These methods represent shapes as multiple depth…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Tao Hu , Zhizhong Han , Matthias Zwicker

VSFormer: Mining Correlations in Flexible View Set for Multi-view 3D Shape Understanding

View-based methods have demonstrated promising performance in 3D shape understanding. However, they tend to make strong assumptions about the relations between views or learn the multi-view correlations indirectly, which limits the…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Hongyu Sun , Yongcai Wang , Peng Wang , Haoran Deng , Xudong Cai , Deying Li

Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs

Multi-View Representation Learning (MVRL) aims to derive a unified representation from multi-view data by leveraging shared and complementary information across views. However, when views are irregularly missing, the incomplete data can…

Machine Learning · Computer Science 2025-03-03 Xin Gao , Jian Pu

MVP: Enhancing Video Large Language Models via Self-supervised Masked Video Prediction

Reinforcement learning based post-training paradigms for Video Large Language Models (VideoLLMs) have achieved significant success by optimizing for visual-semantic tasks such as captioning or VideoQA. However, while these approaches…

Computer Vision and Pattern Recognition · Computer Science 2026-01-08 Xiaokun Sun , Zezhong Wu , Zewen Ding , Linli Xu

360{\deg} Volumetric Portrait Avatar

We propose 360{\deg} Volumetric Portrait (3VP) Avatar, a novel method for reconstructing 360{\deg} photo-realistic portrait avatars of human subjects solely based on monocular video inputs. State-of-the-art monocular avatar reconstruction…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Jalees Nehvi , Berna Kabadayi , Julien Valentin , Justus Thies

MVT: Multi-view Vision Transformer for 3D Object Recognition

Inspired by the great success achieved by CNN in image recognition, view-based methods applied CNNs to model the projected views for 3D object understanding and achieved excellent performance. Nevertheless, multi-view CNN models cannot…

Computer Vision and Pattern Recognition · Computer Science 2021-10-26 Shuo Chen , Tan Yu , Ping Li

Deep Learning Multi-View Representation for Face Recognition

Various factors, such as identities, views (poses), and illuminations, are coupled in face images. Disentangling the identity and view representations is a major challenge in face recognition. Existing face recognition systems either use…

Computer Vision and Pattern Recognition · Computer Science 2014-06-27 Zhenyao Zhu , Ping Luo , Xiaogang Wang , Xiaoou Tang

Multiview Transformers for Video Recognition

Video understanding requires reasoning at multiple spatiotemporal resolutions -- from short fine-grained motions to events taking place over longer durations. Although transformer architectures have recently advanced the state-of-the-art,…

Computer Vision and Pattern Recognition · Computer Science 2022-06-01 Shen Yan , Xuehan Xiong , Anurag Arnab , Zhichao Lu , Mi Zhang , Chen Sun , Cordelia Schmid

ShapeFormer: Transformer-based Shape Completion via Sparse Representation

We present ShapeFormer, a transformer-based network that produces a distribution of object completions, conditioned on incomplete, and possibly noisy, point clouds. The resultant distribution can then be sampled to generate likely…

Computer Vision and Pattern Recognition · Computer Science 2022-05-24 Xingguang Yan , Liqiang Lin , Niloy J. Mitra , Dani Lischinski , Daniel Cohen-Or , Hui Huang