English

Learning Predictive Visuomotor Coordination

Computer Vision and Pattern Recognition 2025-04-01 v1 Robotics

Abstract

Understanding and predicting human visuomotor coordination is crucial for applications in robotics, human-computer interaction, and assistive technologies. This work introduces a forecasting-based task for visuomotor modeling, where the goal is to predict head pose, gaze, and upper-body motion from egocentric visual and kinematic observations. We propose a \textit{Visuomotor Coordination Representation} (VCR) that learns structured temporal dependencies across these multimodal signals. We extend a diffusion-based motion modeling framework that integrates egocentric vision and kinematic sequences, enabling temporally coherent and accurate visuomotor predictions. Our approach is evaluated on the large-scale EgoExo4D dataset, demonstrating strong generalization across diverse real-world activities. Our results highlight the importance of multimodal integration in understanding visuomotor coordination, contributing to research in visuomotor learning and human behavior modeling.

Keywords

Cite

@article{arxiv.2503.23300,
  title  = {Learning Predictive Visuomotor Coordination},
  author = {Wenqi Jia and Bolin Lai and Miao Liu and Danfei Xu and James M. Rehg},
  journal= {arXiv preprint arXiv:2503.23300},
  year   = {2025}
}