English
Related papers

Related papers: Continually Evolving Skill Knowledge in Vision Lan…

200 papers

Vision-language-action (VLA) models provide a powerful approach to training control policies for physical systems, such as robots, by combining end-to-end learning with transfer of semantic knowledge from web-scale vision-language model…

Achieving truly adaptive embodied intelligence requires agents that learn not just by imitating static demonstrations, but by continuously improving through environmental interaction, which is akin to how humans master skills through…

Robotics · Computer Science 2025-12-17 Zechen Bai , Chen Gao , Mike Zheng Shou

Lifelong learning is critical for embodied agents in open-world environments, where reinforcement learning fine-tuning has emerged as an important paradigm to enable Vision-Language-Action (VLA) models to master dexterous manipulation…

Artificial Intelligence · Computer Science 2026-02-04 Qixin Zeng , Shuo Zhang , Hongyin Zhang , Renjie Wang , Han Zhao , Libang Zhao , Runze Li , Donglin Wang , Chao Huang

Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. The increasing size and computational demands of fine-tuning large pre-trained transformer neural…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Yuliang Cai , Mohammad Rostami

The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications. Continual learning (CL) can serve as a remedy through enabling…

Machine Learning · Computer Science 2023-03-28 Yuliang Cai , Jesse Thomason , Mohammad Rostami

When deployed in open-ended robotic environments, Vision--Language--Action (VLA) models need to continually acquire new skills, yet suffer from severe catastrophic forgetting. We observe that this degradation is related to the deterioration…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Libang Zhao , Qixin Zeng , Hongyin Zhang , Donglin Wang

To teach robots complex manipulation tasks, a common approach is to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for…

Robotics · Computer Science 2026-05-18 Ralf Römer , Yi Zhang , Yuming Li , Angela P. Schoellig

Vision-language-action (VLA) models provide a promising foundation for general-purpose robotics. However, their successful deployment in real-world scenarios requires the ability to continually acquire new skills while retaining previously…

Robotics · Computer Science 2026-05-27 Jiarun Zhu , Yijun Hong , Xiaoquan Sun , Zetian Xu , Mingqi Yuan , Zhiyong Wang , Wenjun Zeng , Jiayu Chen

Recent high-capacity vision-language-action (VLA) models have demonstrated impressive performance on a range of robotic manipulation tasks by imitating human demonstrations. However, exploiting offline data with limited visited states will…

Robotics · Computer Science 2025-05-27 Guanxing Lu , Wenkai Guo , Chubin Zhang , Yuheng Zhou , Haonan Jiang , Zifeng Gao , Yansong Tang , Ziwei Wang

Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning…

Robotics · Computer Science 2026-05-29 Wenhao Li , Xiu Su , Dan Niu , Yichao Cao , Hongyan Xu , Zhe Qu , Lei Fan , Shan You , Chang Xu

Vision-Language-Action (VLA) models have demonstrated potential in autonomous driving. However, two critical challenges hinder their development: (1) Existing VLA architectures are typically based on imitation learning in open-loop setup…

Artificial Intelligence · Computer Science 2025-08-18 Anqing Jiang , Yu Gao , Yiru Wang , Zhigang Sun , Shuo Wang , Yuwen Heng , Hao Sun , Shichen Tang , Lijuan Zhu , Jinhao Chai , Jijun Wang , Zichong Gu , Hao Jiang , Li Sun

Vision-Language-Action (VLA) models have shown promising capabilities for embodied intelligence, but most existing approaches rely on text-based chain-of-thought reasoning where visual inputs are treated as static context. This limits the…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Chaoyang Wang , Wenrui Bao , Sicheng Gao , Bingxin Xu , Yu Tian , Yogesh S. Rawat , Yunhao Ge , Yuzhang Shang

Vision-Language-Action models have recently emerged as a powerful paradigm for general-purpose robot learning, enabling agents to map visual observations and natural-language instructions into executable robotic actions. Though popular,…

Scaling vision-language-action (VLA) model pre-training requires large volumes of diverse, high-quality manipulation trajectories. Most current data is obtained via human teleoperation, which is expensive and difficult to scale.…

Robotics · Computer Science 2025-11-26 Rushuai Yang , Zhiyuan Feng , Tianxiang Zhang , Kaixin Wang , Chuheng Zhang , Li Zhao , Xiu Su , Yi Chen , Jiang Bian

Vision-Language-Action (VLA) models are a promising paradigm for generalist robotic manipulation by grounding high-level semantic instructions into executable physical actions. However, prevailing approaches typically adopt a monolithic…

Robotics · Computer Science 2026-04-29 Yifei Wei , Linqing Zhong , Yi Liu , Yuxiang Lu , Xindong He , Maoqing Yao , Guanghui Ren

Recent advances in vision-language-action (VLA) models have motivated the extension of their capabilities to embodied settings, where reinforcement learning (RL) offers a principled way to optimize task success through interaction. However,…

Vision-Language-Action (VLA) models have recently emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental…

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual…

Machine Learning · Computer Science 2026-03-13 Jiaheng Hu , Jay Shim , Chen Tang , Yoonchang Sung , Bo Liu , Peter Stone , Roberto Martin-Martin

The emergence of vision-language-action (VLA) models has given rise to foundation models for robot manipulation. Although these models have achieved significant improvements, their generalization in multi-task manipulation remains limited.…

Recent vision-language-action (VLA) models have significantly advanced robotic manipulation by unifying perception, reasoning, and control. To achieve such integration, recent studies adopt a predictive paradigm that models future visual…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Yijie Zhu , Jie He , Rui Shao , Kaishen Yuan , Tao Tan , Xiaochen Yuan , Zitong Yu
‹ Prev 1 2 3 10 Next ›