中文
相关论文

相关论文: Action with Visual Primitives

200 篇论文

Vision-Language-Action (VLA) models mark a transformative advancement in artificial intelligence, aiming to unify perception, natural language understanding, and embodied action within a single computational framework. This foundational…

计算机视觉与模式识别 · 计算机科学 2026-02-02 Ranjan Sapkota , Yang Cao , Konstantinos I. Roumeliotis , Manoj Karkee

Vision-Language-Action (VLA) models offer a promising paradigm for generalist robotic policies, yet their adaptation is hindered by data inefficiency and poor generalization. We argue that these bottlenecks stem from the prevailing Direct…

机器人学 · 计算机科学 2026-05-28 Yutai Li , Shaohui Peng , Jiaming Guo , Di Huang , Zihao Zhang , Yuxuan Guo , Yunkai Gao , Siming Lan , Ling Li , Xing Hu , Yunji Chen

We propose Avi, a novel 3D Vision-Language-Action (VLA) architecture that reframes robotic action generation as a problem of 3D perception and spatial reasoning, rather than low-level policy learning. While existing VLA models primarily…

机器人学 · 计算机科学 2025-10-28 Harris Song , Long Le

Vision-Language-Action (VLA) models offer a compelling framework for tackling complex robotic manipulation tasks, but they are often expensive to train. In this paper, we propose a novel VLA approach that leverages the competitive…

机器人学 · 计算机科学 2025-12-23 Max Argus , Jelena Bratulic , Houman Masnavi , Maxim Velikanov , Nick Heppert , Abhinav Valada , Thomas Brox

Vision-language-action (VLA) models finetuned from vision-language models (VLMs) hold the promise of leveraging rich pretrained representations to build generalist robots across diverse tasks and environments. However, direct fine-tuning on…

机器人学 · 计算机科学 2025-09-18 Shresth Grover , Akshay Gopalkrishnan , Bo Ai , Henrik I. Christensen , Hao Su , Xuanlin Li

The development of general robotic systems capable of manipulating in unstructured environments is a significant challenge. While Vision-Language Models(VLM) excel in high-level commonsense reasoning, they lack the fine-grained 3D spatial…

机器人学 · 计算机科学 2025-01-08 Mingjie Pan , Jiyao Zhang , Tianshu Wu , Yinghao Zhao , Wenlong Gao , Hao Dong

Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robotic manipulation, leveraging large-scale pre-training to achieve strong performance. The field has rapidly evolved with additional spatial…

机器人学 · 计算机科学 2026-02-23 Yuankai Luo , Woping Chen , Tong Liang , Baiqiao Wang , Zhenguo Li

Autonomous navigation in highly constrained environments remains challenging for mobile robots. Classical navigation approaches offer safety assurances but require environment-specific parameter tuning; end-to-end learning bypasses…

机器人学 · 计算机科学 2026-03-11 Yuanjie Lu , Beichen Wang , Zhengqi Wu , Yang Li , Xiaomin Lin , Chengzhi Mao , Xuesu Xiao

The advancement of large Vision-Language-Action (VLA) models has significantly improved robotic manipulation in terms of language-guided task execution and generalization to unseen scenarios. While existing VLAs adapted from pretrained…

Vision-Language-Action (VLA) models typically bridge the gap between perceptual and action spaces by pre-training a large-scale Vision-Language Model (VLM) on robotic data. While this approach greatly enhances performance, it also incurs…

Vision-Language-Action (VLA) models have shown remarkable progress in embodied tasks recently, but most methods process visual observations independently at each timestep. This history-agnostic design treats robot manipulation as a Markov…

机器学习 · 计算机科学 2026-04-13 Lei Xiao , Jifeng Li , Juntao Gao , Feiyang Ye , Yan Jin , Jingjing Qian , Jing Zhang , Yong Wu , Xiaoyuan Yu

Autonomous driving has long relied on modular "Perception-Decision-Action" pipelines, where hand-crafted interfaces and rule-based components often break down in complex or long-tailed scenarios. Their cascaded design further propagates…

Developing robust and general-purpose manipulation policies represents a fundamental objective in robotics research. While Vision-Language-Action (VLA) models have demonstrated promising capabilities for end-to-end robot control, existing…

Vision-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent works have begun to explore the incorporation of…

Amid growing efforts to leverage advances in large language models (LLMs) and vision-language models (VLMs) for robotics, Vision-Language-Action (VLA) models have recently gained significant attention. By unifying vision, language, and…

机器人学 · 计算机科学 2025-10-09 Kento Kawaharazuka , Jihoon Oh , Jun Yamada , Ingmar Posner , Yuke Zhu

Vision-language-action models (VLAs) have garnered significant attention for their potential in advancing robotic manipulation. However, previous approaches predominantly rely on the general comprehension capabilities of vision-language…

计算机视觉与模式识别 · 计算机科学 2025-06-25 Yuqi Wang , Xinghang Li , Wenxuan Wang , Junbo Zhang , Yingyan Li , Yuntao Chen , Xinlong Wang , Zhaoxiang Zhang

Vision-Language Action (VLA) models significantly advance robotic manipulation by leveraging the strong perception capabilities of pretrained vision-language models (VLMs). By integrating action modules into these pretrained models, VLA…

计算机视觉与模式识别 · 计算机科学 2025-10-20 Shaoqi Dong , Chaoyou Fu , Haihan Gao , Yi-Fan Zhang , Chi Yan , Chu Wu , Xiaoyu Liu , Yunhang Shen , Jing Huo , Deqiang Jiang , Haoyu Cao , Yang Gao , Xing Sun , Ran He , Caifeng Shan

The emergence of Vision Language Action (VLA) models marks a paradigm shift from traditional policy-based control to generalized robotics, reframing Vision Language Models (VLMs) from passive sequence generators into active agents for…

机器人学 · 计算机科学 2025-11-11 Dapeng Zhang , Jing Sun , Chenghui Hu , Xiaoyan Wu , Zhenlong Yuan , Rui Zhou , Fei Shen , Qingguo Zhou

One promise that Vision-Language-Action (VLA) models hold over traditional imitation learning for robotics is to leverage the broad generalization capabilities of large Vision-Language Models (VLMs) to produce versatile, "generalist" robot…

机器人学 · 计算机科学 2025-06-12 Irving Fang , Juexiao Zhang , Shengbang Tong , Chen Feng

Although Vision-Language Models (VLM) have demonstrated impressive planning and reasoning capabilities, translating these abilities into the physical world introduces significant challenges. Conventional Vision-Language-Action (VLA) models,…

计算机视觉与模式识别 · 计算机科学 2025-10-07 Mingyu Liu , Zheng Huang , Xiaoyi Lin , Muzhi Zhu , Canyu Zhao , Zongze Du , Yating Wang , Haoyi Zhu , Hao Chen , Chunhua Shen
‹ 上一页 1 2 3 10 下一页 ›