Related papers: Multi-Frame Self-Supervised Depth with Transformer…
Although both self-supervised single-frame and multi-frame depth estimation methods only require unlabeled monocular videos for training, the information they leverage varies because single-frame methods mainly rely on appearance-based…
Multi-view stereo depth estimation based on cost volume usually works better than self-supervised monocular depth estimation except for moving objects and low-textured surfaces. So in this paper, we propose a multi-frame depth estimation…
Self-supervised monocular depth estimation networks are trained to predict scene depth using nearby frames as a supervision signal during training. However, for many applications, sequence information in the form of video frames is also…
Self-supervised monocular depth estimation has gathered notable interest since it can liberate training from dependency on depth annotations. In monocular video training case, recent methods only conduct view synthesis between existing…
Recent advances in end-to-end unsupervised learning has significantly improved the performance of monocular depth prediction and alleviated the requirement of ground truth depth. Although a plethora of work has been done in enforcing…
Self-supervised methods have showed promising results on depth estimation task. However, previous methods estimate the target depth map and camera ego-motion simultaneously, underusing multi-frame correlation information and ignoring the…
Self-supervised monocular depth estimation has emerged as a promising approach since it does not rely on labeled training data. Most methods combine convolution and Transformer to model long-distance dependencies to estimate depth…
Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we…
Recent techniques in self-supervised monocular depth estimation are approaching the performance of supervised methods, but operate in low resolution only. We show that high resolution is key towards high-fidelity self-supervised monocular…
Multi-frame methods improve monocular depth estimation over single-frame approaches by aggregating spatial-temporal information via feature matching. However, the spatial-temporal feature leads to accuracy degradation in dynamic scenes. To…
The advent of autonomous driving and advanced driver assistance systems necessitates continuous developments in computer vision for 3D scene understanding. Self-supervised monocular depth estimation, a method for pixel-wise distance…
With the frequent use of self-supervised monocular depth estimation in robotics and autonomous driving, the model's efficiency is becoming increasingly important. Most current approaches apply much larger and more complex networks to…
This paper aims to address the problem of supervised monocular depth estimation. We start with a meticulous pilot study to demonstrate that the long-range correlation is essential for accurate depth estimation. Therefore, we propose to…
Monocular depth estimation has become one of the most studied applications in computer vision, where the most accurate approaches are based on fully supervised learning models. However, the acquisition of accurate and large ground truth…
Self-supervised learning for depth estimation uses geometry in image sequences for supervision and shows promising results. Like many computer vision tasks, depth network performance is determined by the capability to learn accurate spatial…
Self-supervised monocular depth estimation has been widely studied recently. Most of the work has focused on improving performance on benchmark datasets, such as KITTI, but has offered a few experiments on generalization performance. In…
Depth estimation plays an important role in the robotic perception system. Self-supervised monocular paradigm has gained significant attention since it can free training from the reliance on depth annotations. Despite recent advancements,…
Attention-based models such as transformers have shown outstanding performance on dense prediction tasks, such as semantic segmentation, owing to their capability of capturing long-range dependency in an image. However, the benefit of…
In self-supervised monocular depth estimation, the depth discontinuity and motion objects' artifacts are still challenging problems. Existing self-supervised methods usually utilize a single view to train the depth estimation network.…
Recent advances in self-supervised learning havedemonstrated that it is possible to learn accurate monoculardepth reconstruction from raw video data, without using any 3Dground truth for supervision. However, in robotics…