Related papers: Continually Evolving Skill Knowledge in Vision Lan…

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

Vision-language-action (VLA) models provide a powerful approach to training control policies for physical systems, such as robots, by combining end-to-end learning with transfer of semantic knowledge from web-scale vision-language model…

Machine Learning · Computer Science 2025-05-30 Danny Driess , Jost Tobias Springenberg , Brian Ichter , Lili Yu , Adrian Li-Bell , Karl Pertsch , Allen Z. Ren , Homer Walke , Quan Vuong , Lucy Xiaoyang Shi , Sergey Levine

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models

Achieving truly adaptive embodied intelligence requires agents that learn not just by imitating static demonstrations, but by continuously improving through environmental interaction, which is akin to how humans master skills through…

Robotics · Computer Science 2025-12-17 Zechen Bai , Chen Gao , Mike Zheng Shou

CRL-VLA: Continual Vision-Language-Action Learning

Lifelong learning is critical for embodied agents in open-world environments, where reinforcement learning fine-tuning has emerged as an important paradigm to enable Vision-Language-Action (VLA) models to master dexterous manipulation…

Artificial Intelligence · Computer Science 2026-02-04 Qixin Zeng , Shuo Zhang , Hongyin Zhang , Renjie Wang , Han Zhao , Libang Zhao , Runze Li , Donglin Wang , Chao Huang

Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks

Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. The increasing size and computational demands of fine-tuning large pre-trained transformer neural…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Yuliang Cai , Mohammad Rostami

Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation

The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications. Continual learning (CL) can serve as a remedy through enabling…

Machine Learning · Computer Science 2023-03-28 Yuliang Cai , Jesse Thomason , Mohammad Rostami

Information-Theoretic Constraints for Continual Vision-Language-Action Alignment

When deployed in open-ended robotic environments, Vision--Language--Action (VLA) models need to continually acquire new skills, yet suffer from severe catastrophic forgetting. We observe that this degradation is related to the deterioration…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Libang Zhao , Qixin Zeng , Hongyin Zhang , Donglin Wang

CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion

To teach robots complex manipulation tasks, a common approach is to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for…

Robotics · Computer Science 2026-05-18 Ralf Römer , Yi Zhang , Yuming Li , Angela P. Schoellig

Can VLA Models Learn from Real-World Data Continually without Forgetting?

Vision-language-action (VLA) models provide a promising foundation for general-purpose robotics. However, their successful deployment in real-world scenarios requires the ability to continually acquire new skills while retaining previously…

Robotics · Computer Science 2026-05-27 Jiarun Zhu , Yijun Hong , Xiaoquan Sun , Zetian Xu , Mingqi Yuan , Zhiyong Wang , Wenjun Zeng , Jiayu Chen

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning

Recent high-capacity vision-language-action (VLA) models have demonstrated impressive performance on a range of robotic manipulation tasks by imitating human demonstrations. However, exploiting offline data with limited visited states will…

Robotics · Computer Science 2025-05-27 Guanxing Lu , Wenkai Guo , Chubin Zhang , Yuheng Zhou , Haonan Jiang , Zifeng Gao , Yansong Tang , Ziwei Wang

Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery

Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning…

Robotics · Computer Science 2026-05-29 Wenhao Li , Xiu Su , Dan Niu , Yichao Cao , Hongyan Xu , Zhe Qu , Lei Fan , Shan You , Chang Xu

IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model

Vision-Language-Action (VLA) models have demonstrated potential in autonomous driving. However, two critical challenges hinder their development: (1) Existing VLA architectures are typically based on imitation learning in open-loop setup…

Artificial Intelligence · Computer Science 2025-08-18 Anqing Jiang , Yu Gao , Yiru Wang , Zhigang Sun , Shuo Wang , Yuwen Heng , Hao Sun , Shichen Tang , Lijuan Zhu , Jinhao Chai , Jijun Wang , Zichong Gu , Hao Jiang , Li Sun

VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning

Vision-Language-Action (VLA) models have shown promising capabilities for embodied intelligence, but most existing approaches rely on text-based chain-of-thought reasoning where visual inputs are treated as static context. This limits the…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Chaoyang Wang , Wenrui Bao , Sicheng Gao , Bingxin Xu , Yu Tian , Yogesh S. Rawat , Yunhao Ge , Yuzhang Shang

On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning

Vision-Language-Action models have recently emerged as a powerful paradigm for general-purpose robot learning, enabling agents to map visual observations and natural-language instructions into executable robotic actions. Though popular,…

Robotics · Computer Science 2026-04-08 Changyu Liu , Yiyang Liu , Taowen Wang , Qiao Zhuang , James Chenhao Liang , Wenhao Yang , Renjing Xu , Qifan Wang , Dongfang Liu , Cheng Han

Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories

Scaling vision-language-action (VLA) model pre-training requires large volumes of diverse, high-quality manipulation trajectories. Most current data is obtained via human teleoperation, which is expensive and difficult to scale.…

Robotics · Computer Science 2025-11-26 Rushuai Yang , Zhiyuan Feng , Tianxiang Zhang , Kaixin Wang , Chuheng Zhang , Li Zhao , Xiu Su , Yi Chen , Jiang Bian

Libra-VLA: Achieving Learning Equilibrium via Asynchronous Coarse-to-Fine Dual-System

Vision-Language-Action (VLA) models are a promising paradigm for generalist robotic manipulation by grounding high-level semantic instructions into executable physical actions. However, prevailing approaches typically adopt a monolithic…

Robotics · Computer Science 2026-04-29 Yifei Wei , Linqing Zhong , Yi Liu , Yuxiang Lu , Xindong He , Maoqing Yao , Guanghui Ren

RLinf-VLA: A Unified and Efficient Framework for Reinforcement Learning of Vision-Language-Action Models

Recent advances in vision-language-action (VLA) models have motivated the extension of their capabilities to embodied settings, where reinforcement learning (RL) offers a principled way to optimize task success through interaction. However,…

Robotics · Computer Science 2026-02-10 Hongzhi Zang , Mingjie Wei , Si Xu , Yongji Wu , Zhen Guo , Yuanqing Wang , Hao Lin , Peihong Wang , Liangzhi Shi , Yuqing Xie , Zhexuan Xu , Zhihao Liu , Kang Chen , Wenhao Tang , Quanlu Zhang , Weinan Zhang , Chao Yu , Yu Wang

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Vision-Language-Action (VLA) models have recently emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental…

Robotics · Computer Science 2025-09-12 Haozhan Li , Yuxin Zuo , Jiale Yu , Yuhao Zhang , Zhaohui Yang , Kaiyan Zhang , Xuekai Zhu , Yuchen Zhang , Tianxing Chen , Ganqu Cui , Dehui Wang , Dingxiang Luo , Yuchen Fan , Youbang Sun , Jia Zeng , Jiangmiao Pang , Shanghang Zhang , Yu Wang , Yao Mu , Bowen Zhou , Ning Ding

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual…

Machine Learning · Computer Science 2026-03-13 Jiaheng Hu , Jay Shim , Chen Tang , Yoonchang Sung , Bo Liu , Peter Stone , Roberto Martin-Martin

VLA Model-Expert Collaboration for Bi-directional Manipulation Learning

The emergence of vision-language-action (VLA) models has given rise to foundation models for robot manipulation. Although these models have achieved significant improvements, their generalization in multi-task manipulation remains limited.…

Robotics · Computer Science 2025-03-07 Tian-Yu Xiang , Ao-Qun Jin , Xiao-Hu Zhou , Mei-Jiang Gui , Xiao-Liang Xie , Shi-Qi Liu , Shuang-Yi Wang , Sheng-Bin Duang , Si-Cheng Wang , Zheng Lei , Zeng-Guang Hou

$\Delta$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation

Recent vision-language-action (VLA) models have significantly advanced robotic manipulation by unifying perception, reasoning, and control. To achieve such integration, recent studies adopt a predictive paradigm that models future visual…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Yijie Zhu , Jie He , Rui Shao , Kaishen Yuan , Tao Tan , Xiaochen Yuan , Zitong Yu