English
Related papers

Related papers: SIMPACT: Simulation-Enabled Action Planning using …

200 papers

Motion planning involves determining a sequence of robot configurations to reach a desired pose, subject to movement and safety constraints. Traditional motion planning finds collision-free paths, but this is overly restrictive in clutter,…

Robotics · Computer Science 2026-03-10 Yiyang Ling , Karan Owalekar , Oluwatobiloba Adesanya , Erdem Bıyık , Daniel Seita

Robotic manipulation requires sophisticated commonsense reasoning, a capability naturally possessed by large-scale Vision-Language Models (VLMs). While VLMs show promise as zero-shot planners, their lack of grounded physical understanding…

Robotics · Computer Science 2026-03-18 Emily Yue-Ting Jia , Weiduo Yuan , Tianheng Shi , Vitor Guizilini , Jiageng Mao , Yue Wang

Large Language Models (LLMs) demonstrate strong reasoning and task planning capabilities but remain fundamentally limited in physical interaction modeling. Existing approaches integrate perception via Vision-Language Models (VLMs) or…

Robotics · Computer Science 2025-10-17 Wanjing Huang , Weixiang Yan , Zhen Zhang , Ambuj Singh

Solving complex long-horizon robotic manipulation problems requires sophisticated high-level planning capabilities, the ability to reason about the physical world, and reactively choose appropriate motor skills. Vision-language models…

Robotics · Computer Science 2025-02-25 Yunhai Feng , Jiaming Han , Zhuoran Yang , Xiangyu Yue , Sergey Levine , Jianlan Luo

Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs…

Vision-language-action (VLA) reasoning tasks require agents to interpret multimodal instructions, perform long-horizon planning, and act adaptively in dynamic environments. Existing approaches typically train VLA models in an end-to-end…

Computer Vision and Pattern Recognition · Computer Science 2025-09-19 Chi-Pin Huang , Yueh-Hua Wu , Min-Hung Chen , Yu-Chiang Frank Wang , Fu-En Yang

Visual Language Models (VLMs) have emerged as pivotal tools for robotic systems, enabling cross-task generalization, dynamic environmental interaction, and long-horizon planning through multimodal perception and semantic reasoning. However,…

Robotics · Computer Science 2025-04-04 Zhiyuan Zhang , Yuxin He , Yong Sun , Junyu Shi , Lijiang Liu , Qiang Nie

Bridging the gap between natural language commands and autonomous execution in unstructured environments remains an open challenge for robotics. This requires robots to perceive and reason over the current task scene through multiple…

Robotics · Computer Science 2025-12-23 Jin Wang , Kim Tien Ly , Jacques Cloete , Nikos Tsagarakis , Ioannis Havoutis

Recent advances in vision-language models (VLMs) have led to improved performance on tasks such as visual question answering and image captioning. Consequently, these models are now well-positioned to reason about the physical world,…

Robotics · Computer Science 2024-03-05 Jensen Gao , Bidipta Sarkar , Fei Xia , Ted Xiao , Jiajun Wu , Brian Ichter , Anirudha Majumdar , Dorsa Sadigh

Vision language models (VLMs) exhibit vast knowledge of the physical world, including intuition of physical and spatial properties, affordances, and motion. With fine-tuning, VLMs can also natively produce robot trajectories. We demonstrate…

Robotics · Computer Science 2025-05-16 William Xie , Max Conway , Yutong Zhang , Nikolaus Correll

The advancement of embodied intelligence is accelerating the integration of robots into daily life as human assistants. This evolution requires robots to not only interpret high-level instructions and plan tasks but also perceive and adapt…

Robotics · Computer Science 2025-08-19 Zhichen Lou , Kechun Xu , Zhongxiang Zhou , Rong Xiong

Advancements in large language models (LLMs) have demonstrated their potential in facilitating high-level reasoning, logical reasoning and robotics planning. Recently, LLMs have also been able to generate reward functions for low-level…

Robotics · Computer Science 2024-02-21 Marta Skreta , Zihan Zhou , Jia Lin Yuan , Kourosh Darvish , Alán Aspuru-Guzik , Animesh Garg

Although Model Predictive Control (MPC) can effectively predict the future states of a system and thus is widely used in robotic manipulation tasks, it does not have the capability of environmental perception, leading to the failure in some…

Robotics · Computer Science 2024-07-16 Wentao Zhao , Jiaming Chen , Ziyu Meng , Donghui Mao , Ran Song , Wei Zhang

Foundation models like Vision-Language Models (VLMs) excel at common sense vision and language tasks such as visual question answering. However, they cannot yet directly solve complex, long-horizon robot manipulation problems requiring…

Solving complex, long-horizon robotic manipulation tasks requires a deep understanding of physical interactions, reasoning about their long-term consequences, and precise high-level planning. Vision-Language Models (VLMs) offer a general…

Robotics · Computer Science 2026-02-24 Yanting Yang , Shenyuan Gao , Qingwen Bu , Li Chen , Dimitris N. Metaxas

Vision-Language Models (VLMs) are increasingly pivotal for generalist robot manipulation, enabling tasks such as physical reasoning, policy generation, and failure detection. However, their proficiency in these high-level applications often…

Robotics · Computer Science 2025-07-01 Atharva Gundawar , Som Sagar , Ransalu Senanayake

While vision-language models (VLMs) have demonstrated remarkable performance across various tasks combining textual and visual information, they continue to struggle with fine-grained visual perception tasks that require detailed…

Computation and Language · Computer Science 2025-11-12 Zhehao Zhang , Ryan Rossi , Tong Yu , Franck Dernoncourt , Ruiyi Zhang , Jiuxiang Gu , Sungchul Kim , Xiang Chen , Zichao Wang , Nedim Lipka

We present a framework for perspective-aware reasoning in vision-language models (VLMs) through mental imagery simulation. Perspective-taking, the ability to perceive an environment or situation from an alternative viewpoint, is a key…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Phillip Y. Lee , Jihyeon Je , Chanho Park , Mikaela Angelina Uy , Leonidas Guibas , Minhyuk Sung

Vision-Language Model (VLM) is an important component to enable robust robot manipulation. Yet, using it to translate human instructions into an action-resolvable intermediate representation often needs a tradeoff between…

Vision-Language models (VLMs) achieve strong performance on multimodal tasks but often fail at systematic visual reasoning tasks, leading to inconsistent or illogical outputs. Neuro-symbolic methods promise to address this by inducing…

Artificial Intelligence · Computer Science 2025-11-25 Antonia Wüst , Wolfgang Stammer , Hikaru Shindo , Lukas Helff , Devendra Singh Dhami , Kristian Kersting
‹ Prev 1 2 3 10 Next ›