English

VLM-driven Behavior Tree for Context-aware Task Planning

Robotics 2025-01-13 v2 Artificial Intelligence Computer Vision and Pattern Recognition Human-Computer Interaction

Abstract

The use of Large Language Models (LLMs) for generating Behavior Trees (BTs) has recently gained attention in the robotics community, yet remains in its early stages of development. In this paper, we propose a novel framework that leverages Vision-Language Models (VLMs) to interactively generate and edit BTs that address visual conditions, enabling context-aware robot operations in visually complex environments. A key feature of our approach lies in the conditional control through self-prompted visual conditions. Specifically, the VLM generates BTs with visual condition nodes, where conditions are expressed as free-form text. Another VLM process integrates the text into its prompt and evaluates the conditions against real-world images during robot execution. We validated our framework in a real-world cafe scenario, demonstrating both its feasibility and limitations.

Keywords

Cite

@article{arxiv.2501.03968,
  title  = {VLM-driven Behavior Tree for Context-aware Task Planning},
  author = {Naoki Wake and Atsushi Kanehira and Jun Takamatsu and Kazuhiro Sasabuchi and Katsushi Ikeuchi},
  journal= {arXiv preprint arXiv:2501.03968},
  year   = {2025}
}

Comments

10 pages, 11 figures, 5 tables. Last updated on January 9th, 2024