Task-Oriented Grasp Prediction with Visual-Language Inputs

Chao Tang; Dehao Huang; Lingxiao Meng; Weiyu Liu; Hong Zhang

Task-Oriented Grasp Prediction with Visual-Language Inputs

Robotics 2023-03-01 v1

Authors: Chao Tang , Dehao Huang , Lingxiao Meng , Weiyu Liu , Hong Zhang

Abstract

To perform household tasks, assistive robots receive commands in the form of user language instructions for tool manipulation. The initial stage involves selecting the intended tool (i.e., object grounding) and grasping it in a task-oriented manner (i.e., task grounding). Nevertheless, prior researches on visual-language grasping (VLG) focus on object grounding, while disregarding the fine-grained impact of tasks on object grasping. Task-incompatible grasping of a tool will inevitably limit the success of subsequent manipulation steps. Motivated by this problem, this paper proposes GraspCLIP, which addresses the challenge of task grounding in addition to object grounding to enable task-oriented grasp prediction with visual-language inputs. Evaluation on a custom dataset demonstrates that GraspCLIP achieves superior performance over established baselines with object grounding only. The effectiveness of the proposed method is further validated on an assistive robotic arm platform for grasping previously unseen kitchen tools given the task specification. Our presentation video is available at: https://www.youtube.com/watch?v=e1wfYQPeAXU.

Keywords

robotic grasping robotic manipulation visual grounding

Cite

@article{arxiv.2302.14355,
  title  = {Task-Oriented Grasp Prediction with Visual-Language Inputs},
  author = {Chao Tang and Dehao Huang and Lingxiao Meng and Weiyu Liu and Hong Zhang},
  journal= {arXiv preprint arXiv:2302.14355},
  year   = {2023}
}

Comments

8 pages, 8 figures, submitted to IROS 2023

Task-Oriented Grasp Prediction with Visual-Language Inputs

Abstract

Keywords

Cite

Comments

Related papers