Related papers: Training-free Task-oriented Grasp Generation

SegGrasp: Zero-Shot Task-Oriented Grasping via Semantic and Geometric Guided Segmentation

Task-oriented grasping, which involves grasping specific parts of objects based on their functions, is crucial for developing advanced robotic systems capable of performing complex tasks in dynamic environments. In this paper, we propose a…

Robotics · Computer Science 2024-10-15 Haosheng Li , Weixin Mao , Weipeng Deng , Chenyu Meng , Rui Zhang , Fan Jia , Tiancai Wang , Haoqiang Fan , Hongan Wang , Xiaoming Deng

GRIM: Task-Oriented Grasping with Conditioning on Generative Examples

Task-Oriented Grasping (TOG) requires robots to select grasps that are functionally appropriate for a specified task - a challenge that demands an understanding of task semantics, object affordances, and functional constraints. We present…

Robotics · Computer Science 2025-11-18 Shailesh , Alok Raj , Nayan Kumar , Priya Shukla , Andrew Melnik , Michael Beetz , Gora Chand Nandi

GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping

Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods incorporate semantic…

Robotics · Computer Science 2023-09-21 Chao Tang , Dehao Huang , Wenqi Ge , Weiyu Liu , Hong Zhang

VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models

Robotic grasping is a fundamental capability for enabling autonomous manipulation, with usually infinite solutions. State-of-the-art approaches for grasping rely on learning from large-scale datasets comprising expert annotations of…

Robotics · Computer Science 2026-03-17 Manav Kulshrestha , S. Talha Bukhari , Damon Conover , Aniket Bera

Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations

Robust grasping in cluttered, unstructured environments remains challenging for mobile legged manipulators due to occlusions that lead to partial observations, unreliable depth estimates, and the need for collision-free, execution-feasible…

Robotics · Computer Science 2026-05-06 Dilermando Almeida , Juliano Negri , Guilherme Lazzarini , Thiago H. Segreto , Ranulfo Bezerra , Ricardo V. Godoy , Marcelo Becker

Training-free Generation of Temporally Consistent Rewards from VLMs

Recent advances in vision-language models (VLMs) have significantly improved performance in embodied tasks such as goal decomposition and visual comprehension. However, providing accurate rewards for robotic manipulation without fine-tuning…

Robotics · Computer Science 2025-07-08 Yinuo Zhao , Jiale Yuan , Zhiyuan Xu , Xiaoshuai Hao , Xinyi Zhang , Kun Wu , Zhengping Che , Chi Harold Liu , Jian Tang

RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills

Endowing robots with tool design abilities is critical for enabling them to solve complex manipulation tasks that would otherwise be intractable. While recent generative frameworks can automatically synthesize task settings, such as 3D…

Robotics · Computer Science 2025-06-18 Chunru Lin , Haotian Yuan , Yian Wang , Xiaowen Qiu , Tsun-Hsuan Wang , Minghao Guo , Bohan Wang , Yashraj Narang , Dieter Fox , Chuang Gan

GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation

We present GrasMolmo, a generalizable open-vocabulary task-oriented grasping (TOG) model. GraspMolmo predicts semantically appropriate, stable grasps conditioned on a natural language instruction and a single RGB-D frame. For instance,…

Robotics · Computer Science 2025-09-16 Abhay Deshpande , Yuquan Deng , Arijit Ray , Jordi Salvador , Winson Han , Jiafei Duan , Kuo-Hao Zeng , Yuke Zhu , Ranjay Krishna , Rose Hendrix

Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation

Multi-hand semantic grasp generation aims to generate feasible and semantically appropriate grasp poses for different robotic hands based on natural language instructions. Although the task is highly valuable, due to the lack of multihand…

Robotics · Computer Science 2025-06-10 Haosheng Li , Weixin Mao , Weipeng Deng , Chenyu Meng , Haoqiang Fan , Tiancai Wang , Yoshie Osamu , Ping Tan , Hongan Wang , Xiaoming Deng

LuciBot: Automated Robot Policy Learning from Generated Videos

Automatically generating training supervision for embodied tasks is crucial, as manual designing is tedious and not scalable. While prior works use large language models (LLMs) or vision-language models (VLMs) to generate rewards, these…

Computer Vision and Pattern Recognition · Computer Science 2025-03-14 Xiaowen Qiu , Yian Wang , Jiting Cai , Zhehuan Chen , Chunru Lin , Tsun-Hsuan Wang , Chuang Gan

Free-form language-based robotic reasoning and grasping

Performing robotic grasping from a cluttered bin based on human instructions is a challenging task, as it requires understanding both the nuances of free-form language and the spatial relationships between objects. Vision-Language Models…

Robotics · Computer Science 2025-07-29 Runyu Jiao , Alice Fasoli , Francesco Giuliari , Matteo Bortolon , Sergio Povoli , Guofeng Mei , Yiming Wang , Fabio Poiesi

Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

Reinforcement Learning (RL) has shown great potential in refining robotic manipulation policies, yet its efficacy remains strongly bottlenecked by the difficulty of designing generalizable reward functions. In this paper, we propose a…

Robotics · Computer Science 2026-03-24 Yanru Wu , Weiduo Yuan , Ang Qi , Vitor Guizilini , Jiageng Mao , Yue Wang

Domain Randomization and Generative Models for Robotic Grasping

Deep learning-based robotic grasping has made significant progress thanks to algorithmic improvements and increased data availability. However, state-of-the-art models are often trained on as few as hundreds or thousands of unique object…

Robotics · Computer Science 2018-04-04 Joshua Tobin , Lukas Biewald , Rocky Duan , Marcin Andrychowicz , Ankur Handa , Vikash Kumar , Bob McGrew , Jonas Schneider , Peter Welinder , Wojciech Zaremba , Pieter Abbeel

Evolution without Large Models: Training Language Model with Task Principles

A common training approach for language models involves using a large-scale language model to expand a human-provided dataset, which is subsequently used for model training.This method significantly reduces training costs by eliminating the…

Computation and Language · Computer Science 2025-07-09 Minghang Zhu , Shen Gao , Zhengliang Shi , Jiabao Fang , Pengjie Ren , Zhaochun Ren , Zhumin Chen , Shuo Shang

Reasoning Grasping via Multimodal Large Language Model

Despite significant progress in robotic systems for operation within human-centric environments, existing models still heavily rely on explicit human commands to identify and manipulate specific objects. This limits their effectiveness in…

Robotics · Computer Science 2024-10-16 Shiyu Jin , Jinxuan Xu , Yutian Lei , Liangjun Zhang

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance

Defining reward functions for skill learning has been a long-standing challenge in robotics. Recently, vision-language models (VLMs) have shown promise in defining reward signals for teaching robots manipulation skills. However, existing…

Robotics · Computer Science 2025-02-13 Kaifeng Zhang , Zhao-Heng Yin , Weirui Ye , Yang Gao

Task-Oriented Grasp Prediction with Visual-Language Inputs

To perform household tasks, assistive robots receive commands in the form of user language instructions for tool manipulation. The initial stage involves selecting the intended tool (i.e., object grounding) and grasping it in a…

Robotics · Computer Science 2023-03-01 Chao Tang , Dehao Huang , Lingxiao Meng , Weiyu Liu , Hong Zhang

TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification

Vision and Language Models (VLMs), such as CLIP, have enabled visual recognition of a potentially unlimited set of categories described by text prompts. However, for the best visual recognition performance, these models still require tuning…

Computer Vision and Pattern Recognition · Computer Science 2023-09-14 M. Jehanzeb Mirza , Leonid Karlinsky , Wei Lin , Horst Possegger , Rogerio Feris , Horst Bischof

Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision

Tool manipulation is vital for facilitating robots to complete challenging task goals. It requires reasoning about the desired effect of the task and thus properly grasping and manipulating the tool to achieve the task. Task-agnostic…

Robotics · Computer Science 2018-06-26 Kuan Fang , Yuke Zhu , Animesh Garg , Andrey Kurenkov , Viraj Mehta , Li Fei-Fei , Silvio Savarese

VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments

We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in human-centered environments. Such environments contain rich features like crosswalks, grass, and curbs, which are easily…

Robotics · Computer Science 2025-05-19 Daeun Song , Jing Liang , Xuesu Xiao , Dinesh Manocha