Related papers: Visual-Geometry GP-based Navigable Space for Auton…
We propose a new method for autonomous navigation in uneven terrains by utilizing a sparse Gaussian Process (SGP) based local perception model. The SGP local perception model is trained on local ranging observation (pointcloud) to learn the…
Feed-forward surround-view autonomous driving scene reconstruction offers fast, generalizable inference ability, which faces the core challenge of ensuring generalization while elevating novel view quality. Due to the surround-view with…
Efficient navigation through uneven terrain remains a challenging endeavor for autonomous robots. We propose a new geometric-based uneven terrain mapless navigation framework combining a Sparse Gaussian Process (SGP) local map with a…
We present MG-Nav (Memory-Guided Navigation), a dual-scale framework for zero-shot visual navigation that unifies global memory-guided planning with local geometry-enhanced control. At its core is the Sparse Spatial Memory Graph (SMG), a…
Terrain analysis is critical for the practical ap- plication of ground mobile robots in real-world tasks, espe- cially in outdoor unstructured environments. In this paper, we propose a novel spatial-temporal traversability assessment…
Understanding the geometric and semantic structure of environments is essential for embodied navigation and reasoning. Existing semantic mapping methods trade off between explicit geometry and multi-scale semantics, and lack a native…
Vision-Language Navigation (VLN) requires an agent to navigate 3D environments following natural language instructions. During navigation, existing agents commonly encounter perceptual uncertainty, such as insufficient evidence for reliable…
3D Gaussian Splatting (3DGS), a 3D representation method with photorealistic real-time rendering capabilities, is regarded as an effective tool for narrowing the sim-to-real gap. However, it lacks fine-grained semantics and physical…
Detecting navigable space is a fundamental capability for mobile robots navigating in unknown or unmapped environments. In this work, we treat visual navigable space segmentation as a scene decomposition problem and propose Polyline…
Vision-language navigation (VLN) requires an agent to traverse complex 3D environments based on natural language instructions, necessitating a thorough scene understanding. While existing works equip agents with various scene…
Large, multi-dimensional spatio-temporal datasets are omnipresent in modern science and engineering. An effective framework for handling such data are Gaussian process deep generative models (GP-DGMs), which employ GP priors over the latent…
Visual target navigation is a critical capability for autonomous robots operating in unknown environments, particularly in human-robot interaction scenarios. While classical and learning-based methods have shown promise, most existing…
Self-supervised pre-training based on next-token prediction has enabled large language models to capture the underlying structure of text, and has led to unprecedented performance on a large array of tasks when applied at scale. Similarly,…
In the autonomous ocean monitoring task, the sampling robot moves in the environment and accumulates data continuously. The widely adopted spatial modeling method - standard Gaussian process (GP) regression - becomes inadequate in…
3D occupancy prediction is critical for comprehensive scene understanding in vision-centric autonomous driving. Recent advances have explored utilizing 3D semantic Gaussians to model occupancy while reducing computational overhead, but they…
To achieve autonomy in unknown and unstructured environments, we propose a method for semantic-based planning under perceptual uncertainty. This capability is crucial for safe and efficient robot navigation in environment with…
In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary challenge of IIN arises from the need to recognize the target…
The ability to perform effective planning is crucial for building an instruction-following agent. When navigating through a new environment, an agent is challenged with (1) connecting the natural language instructions with its progressively…
This paper proposes an online learning method of Gaussian process state-space model (GP-SSM). GP-SSM is a probabilistic representation learning scheme that represents unknown state transition and/or measurement models as Gaussian processes…
Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to…