English

Hand-Object Interaction Reasoning

Computer Vision and Pattern Recognition 2022-01-14 v1

Abstract

This paper proposes an interaction reasoning network for modelling spatio-temporal relationships between hands and objects in video. The proposed interaction unit utilises a Transformer module to reason about each acting hand, and its spatio-temporal relation to the other hand as well as objects being interacted with. We show that modelling two-handed interactions are critical for action recognition in egocentric video, and demonstrate that by using positionally-encoded trajectories, the network can better recognise observed interactions. We evaluate our proposal on EPIC-KITCHENS and Something-Else datasets, with an ablation study.

Keywords

Cite

@article{arxiv.2201.04906,
  title  = {Hand-Object Interaction Reasoning},
  author = {Jian Ma and Dima Damen},
  journal= {arXiv preprint arXiv:2201.04906},
  year   = {2022}
}