Learning Predictive Visuomotor Coordination

Wenqi Jia; Bolin Lai; Miao Liu; Danfei Xu; James M. Rehg

Learning Predictive Visuomotor Coordination

Computer Vision and Pattern Recognition 2025-04-01 v1 Robotics

Authors: Wenqi Jia , Bolin Lai , Miao Liu , Danfei Xu , James M. Rehg

Abstract

Understanding and predicting human visuomotor coordination is crucial for applications in robotics, human-computer interaction, and assistive technologies. This work introduces a forecasting-based task for visuomotor modeling, where the goal is to predict head pose, gaze, and upper-body motion from egocentric visual and kinematic observations. We propose a \textit{Visuomotor Coordination Representation} (VCR) that learns structured temporal dependencies across these multimodal signals. We extend a diffusion-based motion modeling framework that integrates egocentric vision and kinematic sequences, enabling temporally coherent and accurate visuomotor predictions. Our approach is evaluated on the large-scale EgoExo4D dataset, demonstrating strong generalization across diverse real-world activities. Our results highlight the importance of multimodal integration in understanding visuomotor coordination, contributing to research in visuomotor learning and human behavior modeling.

Keywords

human motion prediction egocentric video understanding trajectory prediction

Cite

@article{arxiv.2503.23300,
  title  = {Learning Predictive Visuomotor Coordination},
  author = {Wenqi Jia and Bolin Lai and Miao Liu and Danfei Xu and James M. Rehg},
  journal= {arXiv preprint arXiv:2503.23300},
  year   = {2025}
}

Learning Predictive Visuomotor Coordination

Abstract

Keywords

Cite

Related papers