Related papers: CoFL: Continuous Flow Fields for Language-Conditio…

CoLF: Learning Consistent Leader-Follower Policies for Vision-Language-Guided Multi-Robot Cooperative Transport

In this study, we address vision-language-guided multi-robot cooperative transport, where each robot grounds natural-language instructions from onboard camera observations. A key challenge in this decentralized setting is perceptual…

Robotics · Computer Science 2026-02-10 Joachim Yann Despature , Kazuki Shibata , Takamitsu Matsubara

CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making

Generative models have emerged as a promising paradigm for offline multi-agent reinforcement learning (MARL), but existing approaches require many iterative sampling steps. Recent few-step acceleration methods either distill a joint teacher…

Artificial Intelligence · Computer Science 2026-05-14 Guowei Zou , Haitao Wang , Beiwen Zhang , Boning Zhang , Hejun Wu

CurveFlow: Curvature-Guided Flow Matching for Image Generation

Existing rectified flow models are based on linear trajectories between data and noise distributions. This linearity enforces zero curvature, which can inadvertently force the image generation process through low-probability regions of the…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Yan Luo , Drake Du , Hao Huang , Yi Fang , Mengyu Wang

CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning

Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine…

Software Engineering · Computer Science 2025-02-11 Cuong Chi Le , Hoang Nhat Phan , Huy Nhat Phan , Tien N. Nguyen , Nghi D. Q. Bui

Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs

Vision-and-Language Navigation (VLN) tasks require an agent to follow textual instructions to navigate through 3D environments. Traditional approaches use supervised learning methods, relying heavily on domain-specific datasets to train VLN…

Robotics · Computer Science 2025-02-12 Yanyuan Qiao , Wenqi Lyu , Hui Wang , Zixu Wang , Zerui Li , Yuan Zhang , Mingkui Tan , Qi Wu

Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

A few recent studies have demonstrated that leveraging centrally pre-trained models can offer advantageous initializations for federated learning (FL). However, existing pre-training methods do not generalize well when faced with an…

Machine Learning · Computer Science 2024-12-12 Yun-Wei Chu , Dong-Jun Han , Seyyedali Hosseinalipour , Christopher G. Brinton

Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models

Visual target navigation is a critical capability for autonomous robots operating in unknown environments, particularly in human-robot interaction scenarios. While classical and learning-based methods have shown promise, most existing…

Robotics · Computer Science 2025-05-07 Bangguo Yu , Qihao Yuan , Kailai Li , Hamidreza Kasaei , Ming Cao

MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding

Visual navigation in unknown environments based solely on natural language descriptions is a key capability for intelligent robots. In this work, we propose a navigation framework built upon off-the-shelf Visual Language Models (VLMs),…

Robotics · Computer Science 2025-08-08 Weifan Zhang , Tingguang Li , Yuzhen Liu

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification

Recent Vision-Language-Action (VLA) models built on pre-trained Vision-Language Models (VLMs) require extensive post-training, resulting in high computational overhead that limits scalability and deployment.We propose CogVLA, a…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Wei Li , Renshan Zhang , Rui Shao , Jie He , Liqiang Nie

Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning

We introduce Correspondence-Oriented Imitation Learning (COIL), a conditional policy learning framework for visuomotor control with a flexible task representation in 3D. At the core of our approach, each task is defined by the intended…

Robotics · Computer Science 2025-12-08 Yunhao Cao , Zubin Bhaumik , Jessie Jia , Xingyi He , Kuan Fang

REFOL: Resource-Efficient Federated Online Learning for Traffic Flow Forecasting

Multiple federated learning (FL) methods are proposed for traffic flow forecasting (TFF) to avoid heavy-transmission and privacy-leaking concerns resulting from the disclosure of raw data in centralized methods. However, these FL methods…

Machine Learning · Computer Science 2024-11-22 Qingxiang Liu , Sheng Sun , Yuxuan Liang , Xiaolong Xu , Min Liu , Muhammad Bilal , Yuwei Wang , Xujing Li , Yu Zheng

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation

UAV vision-language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous multi-step instructions over long horizons. Existing zero-shot methods remain limited, as…

Computer Vision and Pattern Recognition · Computer Science 2026-04-20 Dian Shao , Zhengzheng Xu , Peiyang Wang , Like Liu , Yule Wang , Jieqi Shi , Jing Huo

Learning Direct Control Policies with Flow Matching for Autonomous Driving

We present a flow-matching planner for autonomous driving that directly outputs actionable control trajectories defined by acceleration and curvature profiles. The model is conditioned on a bird's-eye-view (BEV) raster of the surrounding…

Robotics · Computer Science 2026-05-15 Marcello Ceresini , Federico Pirazzoli , Andrea Bertogalli , Lorenzo Cipelli , Filippo D'Addeo , Anthony Dell'Eva , Alessandro Paolo Capasso , Alberto Broggi

Beyond BEV: Optimizing Point-Level Tokens for Collaborative Perception

Collaborative perception allows agents to enhance their perceptual capabilities by exchanging intermediate features. Existing methods typically organize these intermediate features as 2D bird's-eye-view (BEV) representations, which discard…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Yang Li , Quan Yuan , Guiyang Luo , Xiaoyuan Fu , Rui Pan , Yujia Yang , Congzhang Shao , Yuewen Liu , Jinglin Li

CoT-AMFlow: Adaptive Modulation Network with Co-Teaching Strategy for Unsupervised Optical Flow Estimation

The interpretation of ego motion and scene change is a fundamental task for mobile robots. Optical flow information can be employed to estimate motion in the surroundings. Recently, unsupervised optical flow estimation has become a research…

Computer Vision and Pattern Recognition · Computer Science 2020-11-05 Hengli Wang , Rui Fan , Ming Liu

GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation

In this paper, we propose a training-free framework for vision-and-language navigation (VLN). Existing zero-shot VLN methods are mainly designed for discrete environments or involve unsupervised training in continuous simulator…

Robotics · Computer Science 2025-09-15 Hang Yin , Haoyu Wei , Xiuwei Xu , Wenxuan Guo , Jie Zhou , Jiwen Lu

AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild

Vision-language navigation (VLN) requires intelligent agents to navigate environments by interpreting linguistic instructions alongside visual observations, serving as a cornerstone task in Embodied AI. Current VLN research for unmanned…

Robotics · Computer Science 2026-02-11 Xiaolou Sun , Wufei Si , Wenhui Ni , Yuntian Li , Dongming Wu , Fei Xie , Runwei Guan , He-Yang Xu , Henghui Ding , Yuan Wu , Yutao Yue , Yongming Huang , Hui Xiong

CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning

This paper explores the problem of continual learning (CL) of vision-language models (VLMs) in open domains, where the models need to perform continual updating and inference on a streaming of datasets from diverse seen and unseen domains…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Yukun Li , Guansong Pang , Wei Suo , Chenchen Jing , Yuling Xi , Lingqiao Liu , Hao Chen , Guoqiang Liang , Peng Wang

DreamFlow: Local Navigation Beyond Observation via Conditional Flow Matching in the Latent Space

Local navigation in cluttered environments often suffers from dense obstacles and frequent local minima. Conventional local planners rely on heuristics and are prone to failure, while deep reinforcement learning(DRL)based approaches provide…

Robotics · Computer Science 2026-03-18 Jiwon Park , Dongkyu Lee , I Made Aswin Nahrendra , Jaeyoung Lim , Hyun Myung

CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving

Autonomous driving, particularly navigating complex and unanticipated scenarios, demands sophisticated reasoning and planning capabilities. While Multi-modal Large Language Models (MLLMs) offer a promising avenue for this, their use has…

Computer Vision and Pattern Recognition · Computer Science 2025-10-15 Hidehisa Arai , Keita Miwa , Kento Sasaki , Yu Yamaguchi , Kohei Watanabe , Shunsuke Aoki , Issei Yamamoto