Related papers: Modelling Spatio-Temporal Interactions For Composi…

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Joanna Materzynska , Tete Xiao , Roei Herzig , Huijuan Xu , Xiaolong Wang , Trevor Darrell

Revisiting spatio-temporal layouts for compositional action recognition

Recognizing human actions is fundamentally a spatio-temporal reasoning problem, and should be, at least to some extent, invariant to the appearance of the human and the objects involved. Motivated by this hypothesis, in this work, we take…

Computer Vision and Pattern Recognition · Computer Science 2021-11-04 Gorjan Radevski , Marie-Francine Moens , Tinne Tuytelaars

A Grammatical Compositional Model for Video Action Detection

Analysis of human actions in videos demands understanding complex human dynamics, as well as the interaction between actors and context. However, these interaction relationships usually exhibit large intra-class variations from diverse…

Computer Vision and Pattern Recognition · Computer Science 2023-10-05 Zhijun Zhang , Xu Zou , Jiahuan Zhou , Sheng Zhong , Ying Wu

Attentive Action and Context Factorization

We propose a method for human action recognition, one that can localize the spatiotemporal regions that `define' the actions. This is a challenging task due to the subtlety of human actions in video and the co-occurrence of contextual…

Computer Vision and Pattern Recognition · Computer Science 2019-04-12 Yang Wang , Vinh Tran , Gedas Bertasius , Lorenzo Torresani , Minh Hoai

Compositional Learning in Transformer-Based Human-Object Interaction Detection

Human-object interaction (HOI) detection is an important part of understanding human activities and visual scenes. The long-tailed distribution of labeled instances is a primary challenge in HOI detection, promoting research in few-shot and…

Computer Vision and Pattern Recognition · Computer Science 2023-08-14 Zikun Zhuang , Ruihao Qian , Chi Xie , Shuang Liang

Continuous Human Action Recognition for Human-Machine Interaction: A Review

With advances in data-driven machine learning research, a wide variety of prediction models have been proposed to capture spatio-temporal features for the analysis of video streams. Recognising actions and detecting action transitions…

Computer Vision and Pattern Recognition · Computer Science 2024-03-06 Harshala Gammulle , David Ahmedt-Aristizabal , Simon Denman , Lachlan Tychsen-Smith , Lars Petersson , Clinton Fookes

Video action detection by learning graph-based spatio-temporal interactions

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has been addressed by processing fine-grained features extracted from a video classification backbone. Recently, thanks to the…

Computer Vision and Pattern Recognition · Computer Science 2021-03-02 Matteo Tomei , Lorenzo Baraldi , Simone Calderara , Simone Bronzin , Rita Cucchiara

Spatio-temporal Action Recognition: A Survey

The task of action recognition or action detection involves analyzing videos and determining what action or motion is being performed. The primary subject of these videos are predominantly humans performing some action. However, this…

Computer Vision and Pattern Recognition · Computer Science 2019-01-29 Amlaan Bhoi

Compositional Structure Learning for Action Understanding

The focus of the action understanding literature has predominately been classification, how- ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification,…

Computer Vision and Pattern Recognition · Computer Science 2014-10-23 Ran Xu , Gang Chen , Caiming Xiong , Wei Chen , Jason J. Corso

Motion Guided Attention Fusion to Recognize Interactions from Videos

We present a dual-pathway approach for recognizing fine-grained interactions from videos. We build on the success of prior dual-stream approaches, but make a distinction between the static and dynamic representations of objects and their…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Tae Soo Kim , Jonathan Jones , Gregory D. Hager

Interpretable Action Recognition on Hard to Classify Actions

We investigate a human-like interpretable model of video understanding. Humans recognise complex activities in video by recognising critical spatio-temporal relations among explicitly recognised objects and parts, for example, an object…

Computer Vision and Pattern Recognition · Computer Science 2024-09-23 Anastasia Anichenko , Frank Guerin , Andrew Gilbert

Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition

The interactions between human and objects are important for recognizing object-centric actions. Existing methods usually adopt a two-stage pipeline, where object proposals are first detected using a pretrained detector, and then are fed to…

Computer Vision and Pattern Recognition · Computer Science 2024-04-19 Xunsong Li , Pengzhan Sun , Yangcen Liu , Lixin Duan , Wen Li

How can objects help action recognition?

Current state-of-the-art video models process a video clip as a long sequence of spatio-temporal tokens. However, they do not explicitly model objects, their interactions across the video, and instead process all the tokens in the video. In…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Xingyi Zhou , Anurag Arnab , Chen Sun , Cordelia Schmid

Automatic Interaction and Activity Recognition from Videos of Human Manual Demonstrations with Application to Anomaly Detection

This paper presents a new method to describe spatio-temporal relations between objects and hands, to recognize both interactions and activities within video demonstrations of manual tasks. The approach exploits Scene Graphs to extract key…

Computer Vision and Pattern Recognition · Computer Science 2023-07-10 Elena Merlo , Marta Lagomarsino , Edoardo Lamon , Arash Ajoudani

Human Interaction Recognition Framework based on Interacting Body Part Attention

Human activity recognition in videos has been widely studied and has recently gained significant advances with deep learning approaches; however, it remains a challenging task. In this paper, we propose a novel framework that simultaneously…

Computer Vision and Pattern Recognition · Computer Science 2021-01-25 Dong-Gyu Lee , Seong-Whan Lee

Understanding hand-object manipulation by modeling the contextual relationship between actions, grasp types and object attributes

This paper proposes a novel method for understanding daily hand-object manipulation by developing computer vision-based techniques. Specifically, we focus on recognizing hand grasp types, object attributes and manipulation actions within an…

Computer Vision and Pattern Recognition · Computer Science 2018-07-24 Minjie Cai , Kris Kitani , Yoichi Sato

Learning Latent Spatio-Temporal Compositional Model for Human Action Recognition

Action recognition is an important problem in multimedia understanding. This paper addresses this problem by building an expressive compositional action model. We model one action instance in the video with an ensemble of spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2015-02-03 Xiaodan Liang , Liang Lin , Liangliang Cao

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics

With the availability of egocentric 3D hand-object interaction datasets, there is increasing interest in developing unified models for hand-object pose estimation and action recognition. However, existing methods still struggle to recognise…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Tze Ho Elden Tse , Runyang Feng , Linfang Zheng , Jiho Park , Yixing Gao , Jihie Kim , Ales Leonardis , Hyung Jin Chang

SAFCAR: Structured Attention Fusion for Compositional Action Recognition

We present a general framework for compositional action recognition -- i.e. action recognition where the labels are composed out of simpler components such as subjects, atomic-actions and objects. The main challenge in compositional action…

Computer Vision and Pattern Recognition · Computer Science 2020-12-21 Tae Soo Kim , Gregory D. Hager

Zero-Shot Action Recognition from Diverse Object-Scene Compositions

This paper investigates the problem of zero-shot action recognition, in the setting where no training videos with seen actions are available. For this challenging scenario, the current leading approach is to transfer knowledge from the…

Computer Vision and Pattern Recognition · Computer Science 2021-10-27 Carlo Bretti , Pascal Mettes