English
Related papers

Related papers: G3T Up! Gravity Aligned Coordinate Frames Simplify…

200 papers

Recent feed-forward networks have achieved remarkable progress in sparse-view 3D reconstruction by predicting dense point maps directly from RGB images. However, they often suffer from geometric inconsistencies and limited fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Yutong Chen , Yiming Wang , Xucong Zhang , Sergey Prokudin , Siyu Tang

Generating a coherent 3D scene representation from multi-view images is a fundamental yet challenging task. Existing methods often struggle with multi-view fusion, leading to fragmented 3D representations and sub-optimal performance. To…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Junho Kim , Seongwon Lee

Reconstructing a 3D scene from unordered images is pivotal in computer vision and robotics, with applications spanning crowd-sourced mapping and beyond. While global Structure-from-Motion (SfM) techniques are scalable and fast, they often…

Computer Vision and Pattern Recognition · Computer Science 2024-10-17 Linfei Pan , Marc Pollefeys , Dániel Baráth

Worldwide geolocalization aims to locate the precise location at the coordinate level of photos taken anywhere on the Earth. It is very challenging due to 1) the difficulty of capturing subtle location-aware visual semantics, and 2) the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Pengyue Jia , Yiding Liu , Xiaopeng Li , Yuhao Wang , Yantong Du , Xiao Han , Xuetao Wei , Shuaiqiang Wang , Dawei Yin , Xiangyu Zhao

We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views. This approach is a…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Jianyuan Wang , Minghao Chen , Nikita Karaev , Andrea Vedaldi , Christian Rupprecht , David Novotny

Large scale 3D scene reconstruction is important for applications such as virtual reality and simulation. Existing neural rendering approaches (e.g., NeRF, 3DGS) have achieved realistic reconstructions on large scenes, but optimize per…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Yun Chen , Jingkang Wang , Ze Yang , Sivabalan Manivasagam , Raquel Urtasun

Panoramic imagery offers a full 360{\deg} field of view and is increasingly common in consumer devices. However, it introduces non-pinhole distortions that challenge joint pose estimation and 3D reconstruction. Existing feed-forward models,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Yijing Guo , Mengjun Chao , Luo Wang , Tianyang Zhao , Haizhao Dai , Yingliang Zhang , Jingyi Yu , Yujiao Shi

High-resolution imagery is essential for accurate 3D reconstruction, as many geometric details only emerge at fine spatial scales. Recent feed-forward approaches, such as the Visual Geometry Grounded Transformer (VGGT), have demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 Tianrun Chen , Yuanqi Hu , Yidong Han , Hanjie Xu , Deyi Ji , Qi Zhu , Chunan Yu , Xin Zhang , Cheng Chen , Chaotao Ding , Ying Zang , Xuanfu Li , Jin Ma , Lanyun Zhu

Estimating metric relative camera pose from a pair of images is of great importance for 3D reconstruction and localisation. However, conventional two-view pose estimation methods are not metric, with camera translation known only up to a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-18 Yumin Li , Dylan Campbell

Segment matching is an important intermediate task in computer vision that establishes correspondences between semantically or geometrically coherent regions across images. Unlike keypoint matching, which focuses on localized features,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Rohit Jayanti , Swayam Agrawal , Vansh Garg , Siddharth Tourani , Muhammad Haris Khan , Sourav Garg , Madhava Krishna

Shape assembly, which aims to reassemble separate parts into a complete object, has gained significant interest in recent years. Existing methods primarily rely on networks to predict the poses of individual parts, but often fail to…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Jiahan Li , Chaoran Cheng , Jianzhu Ma , Ge Liu

We introduce G-CUT3R, a novel feed-forward approach for guided 3D scene reconstruction that enhances the CUT3R model by integrating prior information. Unlike existing feed-forward methods that rely solely on input images, our method…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Ramil Khafizov , Artem Komarichev , Ruslan Rakhimov , Peter Wonka , Evgeny Burnaev

Feed-forward multi-frame 3D reconstruction models often degrade on videos with object motion. Global-reference becomes ambiguous under multiple motions, while the local pointmap relies heavily on estimated relative poses and can drift,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Xingyu Miao , Weiguang Zhao , Tao Lu , Linning Xu , Mulin Yu , Yang Long , Jiangmiao Pang , Junting Dong

3D spatial perception is fundamental to generalizable robotic manipulation, yet obtaining reliable, high-quality 3D geometry remains challenging. Depth sensors suffer from noise and material sensitivity, while existing reconstruction models…

Robotics · Computer Science 2026-05-05 Sizhe Yang , Linning Xu , Hao Li , Juncheng Mu , Jia Zeng , Dahua Lin , Jiangmiao Pang

Image retrieval-based cross-view localization methods often lead to very coarse camera pose estimation, due to the limited sampling density of the database satellite images. In this paper, we propose a method to increase the accuracy of a…

Computer Vision and Pattern Recognition · Computer Science 2023-07-21 Yujiao Shi , Fei Wu , Akhil Perincherry , Ankit Vora , Hongdong Li

Transformer-based general visual geometry frameworks have shown promising performance in camera pose estimation and 3D scene understanding. Recent advancements in Visual Geometry Grounded Transformer (VGGT) models have shown great promise…

Computer Vision and Pattern Recognition · Computer Science 2026-04-03 Yangfan Xu , Lilian Zhang , Xiaofeng He , Pengdong Wu , Wenqi Wu , Jun Mao

Robust 3D geometry estimation from videos is critical for applications such as autonomous navigation, SLAM, and 3D scene reconstruction. Recent methods like DUSt3R demonstrate that regressing dense pointmaps from image pairs enables…

Computer Vision and Pattern Recognition · Computer Science 2026-02-05 Xiaoshan Wu , Yifei Yu , Xiaoyang Lyu , Yihua Huang , Bo Wang , Baoheng Zhang , Zhongrui Wang , Xiaojuan Qi

This paper presents GGRt, a novel approach to generalizable novel view synthesis that alleviates the need for real camera poses, complexity in processing high-resolution images, and lengthy optimization processes, thus facilitating stronger…

Computer Vision and Pattern Recognition · Computer Science 2024-05-13 Hao Li , Yuanyuan Gao , Chenming Wu , Dingwen Zhang , Yalun Dai , Chen Zhao , Haocheng Feng , Errui Ding , Jingdong Wang , Junwei Han

The Visual Geometry Grounded Transformer (VGGT) enables strong feed-forward 3D reconstruction without per-scene optimization. However, its billion-parameter scale creates high memory and compute demands, hindering on-device deployment.…

Hardware Architecture · Computer Science 2026-01-29 Yipu Zhang , Jintao Cheng , Xingyu Liu , Zeyu Li , Carol Jingyi Li , Jin Wu , Lin Jiang , Yuan Xie , Jiang Xu , Wei Zhang

World models that forecast scene evolution by generating future video frames devote the bulk of their capacity to photometric details, yet the resulting predictions often remain geometrically inconsistent. We present VGGT-World, a geometry…

Computer Vision and Pattern Recognition · Computer Science 2026-03-16 Xiangyu Sun , Shijie Wang , Fengyi Zhang , Lin Liu , Caiyan Jia , Ziying Song , Zi Huang , Yadan Luo
‹ Prev 1 2 3 10 Next ›