Related papers: G3T Up! Gravity Aligned Coordinate Frames Simplify…

GGPT: Geometry Grounded Point Transformer

Recent feed-forward networks have achieved remarkable progress in sparse-view 3D reconstruction by predicting dense point maps directly from RGB images. However, they often suffer from geometric inconsistencies and limited fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Yutong Chen , Yiming Wang , Xucong Zhang , Sergey Prokudin , Siyu Tang

VG3T: Visual Geometry Grounded Gaussian Transformer

Generating a coherent 3D scene representation from multi-view images is a fundamental yet challenging task. Existing methods often struggle with multi-view fusion, leading to fragmented 3D representations and sub-optimal performance. To…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Junho Kim , Seongwon Lee

Gravity-aligned Rotation Averaging with Circular Regression

Reconstructing a 3D scene from unordered images is pivotal in computer vision and robotics, with applications spanning crowd-sourced mapping and beyond. While global Structure-from-Motion (SfM) techniques are scalable and fast, they often…

Computer Vision and Pattern Recognition · Computer Science 2024-10-17 Linfei Pan , Marc Pollefeys , Dániel Baráth

G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

Worldwide geolocalization aims to locate the precise location at the coordinate level of photos taken anywhere on the Earth. It is very challenging due to 1) the difficulty of capturing subtle location-aware visual semantics, and 2) the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Pengyue Jia , Yiding Liu , Xiaopeng Li , Yuhao Wang , Yantong Du , Xiao Han , Xuetao Wei , Shuaiqiang Wang , Dawei Yin , Xiangyu Zhao

VGGT: Visual Geometry Grounded Transformer

We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views. This approach is a…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Jianyuan Wang , Minghao Chen , Nikita Karaev , Andrea Vedaldi , Christian Rupprecht , David Novotny

G3R: Gradient Guided Generalizable Reconstruction

Large scale 3D scene reconstruction is important for applications such as virtual reality and simulation. Existing neural rendering approaches (e.g., NeRF, 3DGS) have achieved realistic reconstructions on large scenes, but optimize per…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Yun Chen , Jingkang Wang , Ze Yang , Sivabalan Manivasagam , Raquel Urtasun

PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

Panoramic imagery offers a full 360{\deg} field of view and is increasingly common in consumer devices. However, it introduces non-pinhole distortions that challenge joint pose estimation and 3D reconstruction. Existing feed-forward models,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Yijing Guo , Mengjun Chao , Luo Wang , Tianyang Zhao , Haizhao Dai , Yingliang Zhang , Jingyi Yu , Yujiao Shi

HD-VGGT: High-Resolution Visual Geometry Transformer

High-resolution imagery is essential for accurate 3D reconstruction, as many geometric details only emerge at fine spatial scales. Recent feed-forward approaches, such as the Visual Geometry Grounded Transformer (VGGT), have demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 Tianrun Chen , Yuanqi Hu , Yidong Han , Hanjie Xu , Deyi Ji , Qi Zhu , Chunan Yu , Xin Zhang , Cheng Chen , Chaotao Ding , Ying Zang , Xuanfu Li , Jin Ma , Lanyun Zhu

Gaussian Alignment for Relative Camera Pose Estimation via Single-View Reconstruction

Estimating metric relative camera pose from a pair of images is of great importance for 3D reconstruction and localisation. However, conventional two-view pose estimation methods are not metric, with camera translation known only up to a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-18 Yumin Li , Dylan Campbell

SegMASt3R: Geometry Grounded Segment Matching

Segment matching is an important intermediate task in computer vision that establishes correspondences between semantically or geometrically coherent regions across images. Unlike keypoint matching, which focuses on localized features,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Rohit Jayanti , Swayam Agrawal , Vansh Garg , Siddharth Tourani , Muhammad Haris Khan , Sourav Garg , Madhava Krishna

Geometric Point Attention Transformer for 3D Shape Reassembly

Shape assembly, which aims to reassemble separate parts into a complete object, has gained significant interest in recent years. Existing methods primarily rely on networks to predict the poses of individual parts, but often fail to…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Jiahan Li , Chaoran Cheng , Jianzhu Ma , Ge Liu

G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration

We introduce G-CUT3R, a novel feed-forward approach for guided 3D scene reconstruction that enhances the CUT3R model by integrating prior information. Unlike existing feed-forward methods that rely solely on input images, our method…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Ramil Khafizov , Artem Komarichev , Ruslan Rakhimov , Peter Wonka , Evgeny Burnaev

TrajVG: 3D Trajectory-Coupled Visual Geometry Learning

Feed-forward multi-frame 3D reconstruction models often degrade on videos with object motion. Global-reference becomes ambiguous under multiple motions, while the local pointmap relies heavily on estimated relative poses and can drift,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Xingyu Miao , Weiguang Zhao , Tao Lu , Linning Xu , Mulin Yu , Yang Long , Jiangmiao Pang , Junting Dong

Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction

3D spatial perception is fundamental to generalizable robotic manipulation, yet obtaining reliable, high-quality 3D geometry remains challenging. Depth sensors suffer from noise and material sensitivity, while existing reconstruction models…

Robotics · Computer Science 2026-05-05 Sizhe Yang , Linning Xu , Hao Li , Juncheng Mu , Jia Zeng , Dahua Lin , Jiangmiao Pang

Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer

Image retrieval-based cross-view localization methods often lead to very coarse camera pose estimation, due to the limited sampling density of the database satellite images. In this paper, we propose a method to increase the accuracy of a…

Computer Vision and Pattern Recognition · Computer Science 2023-07-21 Yujiao Shi , Fei Wu , Akhil Perincherry , Ankit Vora , Hongdong Li

GPA-VGGT:Adapting VGGT to Large Scale Localization by Self-Supervised Learning with Geometry and Physics Aware Loss

Transformer-based general visual geometry frameworks have shown promising performance in camera pose estimation and 3D scene understanding. Recent advancements in Visual Geometry Grounded Transformer (VGGT) models have shown great promise…

Computer Vision and Pattern Recognition · Computer Science 2026-04-03 Yangfan Xu , Lilian Zhang , Xiaofeng He , Pengdong Wu , Wenqi Wu , Jun Mao

EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting Scenes

Robust 3D geometry estimation from videos is critical for applications such as autonomous navigation, SLAM, and 3D scene reconstruction. Recent methods like DUSt3R demonstrate that regressing dense pointmaps from image pairs enables…

Computer Vision and Pattern Recognition · Computer Science 2026-02-05 Xiaoshan Wu , Yifei Yu , Xiaoyang Lyu , Yihua Huang , Bo Wang , Baoheng Zhang , Zhongrui Wang , Xiaojuan Qi

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

This paper presents GGRt, a novel approach to generalizable novel view synthesis that alleviates the need for real camera poses, complexity in processing high-resolution images, and lengthy optimization processes, thus facilitating stronger…

Computer Vision and Pattern Recognition · Computer Science 2024-05-13 Hao Li , Yuanyuan Gao , Chenming Wu , Dingwen Zhang , Yalun Dai , Chen Zhao , Haocheng Feng , Errui Ding , Jingdong Wang , Junwei Han

VersaQ-3D: A Reconfigurable Accelerator Enabling Feed-Forward and Generalizable 3D Reconstruction via Versatile Quantization

The Visual Geometry Grounded Transformer (VGGT) enables strong feed-forward 3D reconstruction without per-scene optimization. However, its billion-parameter scale creates high memory and compute demands, hindering on-device deployment.…

Hardware Architecture · Computer Science 2026-01-29 Yipu Zhang , Jintao Cheng , Xingyu Liu , Zeyu Li , Carol Jingyi Li , Jin Wu , Lin Jiang , Yuan Xie , Jiang Xu , Wei Zhang

VGGT-World: Transforming VGGT into an Autoregressive Geometry World Model

World models that forecast scene evolution by generating future video frames devote the bulk of their capacity to photometric details, yet the resulting predictions often remain geometrically inconsistent. We present VGGT-World, a geometry…

Computer Vision and Pattern Recognition · Computer Science 2026-03-16 Xiangyu Sun , Shijie Wang , Fengyi Zhang , Lin Liu , Caiyan Jia , Ziying Song , Zi Huang , Yadan Luo