English

Learning Generalizable Feature Fields for Mobile Manipulation

Robotics 2024-11-27 v2 Computer Vision and Pattern Recognition Machine Learning

Abstract

An open problem in mobile manipulation is how to represent objects and scenes in a unified manner so that robots can use both for navigation and manipulation. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherent at an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We quantitatively evaluate GeFF's ability for open-vocabulary object-/part-level manipulation and show that GeFF outperforms point-based baselines in runtime and storage-accuracy trade-offs, with qualitative examples of semantics-aware navigation and articulated object manipulation.

Keywords

Cite

@article{arxiv.2403.07563,
  title  = {Learning Generalizable Feature Fields for Mobile Manipulation},
  author = {Ri-Zhao Qiu and Yafei Hu and Yuchen Song and Ge Yang and Yang Fu and Jianglong Ye and Jiteng Mu and Ruihan Yang and Nikolay Atanasov and Sebastian Scherer and Xiaolong Wang},
  journal= {arXiv preprint arXiv:2403.07563},
  year   = {2024}
}

Comments

Preprint. Project website is at: https://geff-b1.github.io/

R2 v1 2026-06-28T15:17:08.396Z