Related papers: Compositional Structure Learning for Action Unders…

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Joanna Materzynska , Tete Xiao , Roei Herzig , Huijuan Xu , Xiaolong Wang , Trevor Darrell

Modelling Spatio-Temporal Interactions For Compositional Action Recognition

Humans have the natural ability to recognize actions even if the objects involved in the action or the background are changed. Humans can abstract away the action from the appearance of the objects which is referred to as compositionality…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Ramanathan Rajendiran , Debaditya Roy , Basura Fernando

Learning Additively Compositional Latent Actions for Embodied AI

Latent action learning infers pseudo-action labels from visual transitions, providing an approach to leverage internet-scale video for embodied AI. However, most methods learn latent actions without structural priors that encode the…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Hangxing Wei , Xiaoyu Chen , Chuheng Zhang , Tim Pearce , Jianyu Chen , Alex Lamb , Li Zhao , Jiang Bian

Learning Latent Spatio-Temporal Compositional Model for Human Action Recognition

Action recognition is an important problem in multimedia understanding. This paper addresses this problem by building an expressive compositional action model. We model one action instance in the video with an ensemble of spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2015-02-03 Xiaodan Liang , Liang Lin , Liangliang Cao

A Grammatical Compositional Model for Video Action Detection

Analysis of human actions in videos demands understanding complex human dynamics, as well as the interaction between actors and context. However, these interaction relationships usually exhibit large intra-class variations from diverse…

Computer Vision and Pattern Recognition · Computer Science 2023-10-05 Zhijun Zhang , Xu Zou , Jiahuan Zhou , Sheng Zhong , Ying Wu

Revisiting spatio-temporal layouts for compositional action recognition

Recognizing human actions is fundamentally a spatio-temporal reasoning problem, and should be, at least to some extent, invariant to the appearance of the human and the objects involved. Motivated by this hypothesis, in this work, we take…

Computer Vision and Pattern Recognition · Computer Science 2021-11-04 Gorjan Radevski , Marie-Francine Moens , Tinne Tuytelaars

Dynamic Matrix Decomposition for Action Recognition

Designing a technique for the automatic analysis of different actions in videos in order to detect the presence of interested activities is of high significance nowadays. In this paper, we explore a robust and dynamic appearance technique…

Computer Vision and Pattern Recognition · Computer Science 2019-02-21 Abdul Basit

SAFCAR: Structured Attention Fusion for Compositional Action Recognition

We present a general framework for compositional action recognition -- i.e. action recognition where the labels are composed out of simpler components such as subjects, atomic-actions and objects. The main challenge in compositional action…

Computer Vision and Pattern Recognition · Computer Science 2020-12-21 Tae Soo Kim , Gregory D. Hager

Compositional Law Parsing with Latent Random Functions

Human cognition has compositionality. We understand a scene by decomposing the scene into different concepts (e.g., shape and position of an object) and learning the respective laws of these concepts, which may be either natural (e.g., laws…

Computer Vision and Pattern Recognition · Computer Science 2023-02-28 Fan Shi , Bin Li , Xiangyang Xue

LAC: Latent Action Composition for Skeleton-based Action Segmentation

Skeleton-based action segmentation requires recognizing composable actions in untrimmed videos. Current approaches decouple this problem by first extracting local visual features from skeleton sequences and then processing them by a…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Di Yang , Yaohui Wang , Antitza Dantcheva , Quan Kong , Lorenzo Garattoni , Gianpiero Francesca , Francois Bremond

Structured Attention Composition for Temporal Action Localization

Temporal action localization aims at localizing action instances from untrimmed videos. Existing works have designed various effective modules to precisely localize action instances based on appearance and motion features. However, by…

Computer Vision and Pattern Recognition · Computer Science 2022-05-30 Le Yang , Junwei Han , Tao Zhao , Nian Liu , Dingwen Zhang

Alignment-based compositional semantics for instruction following

This paper describes an alignment-based model for interpreting natural language instructions in context. We approach instruction following as a search over plans, scoring sequences of actions conditioned on structured observations of text…

Computation and Language · Computer Science 2017-04-14 Jacob Andreas , Dan Klein

Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition

Multimodal human action understanding is a significant problem in computer vision, with the central challenge being the effective utilization of the complementarity among diverse modalities while maintaining model efficiency. However, most…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Hongsong Wang , Heng Fei , Bingxuan Dai , Jie Gui

Learning Compositional Representation for 4D Captures with Neural ODE

Learning based representation has become the key to the success of many computer vision systems. While many 3D representations have been proposed, it is still an unaddressed problem how to represent a dynamically changing 3D object. In this…

Computer Vision and Pattern Recognition · Computer Science 2021-04-21 Boyan Jiang , Yinda Zhang , Xingkui Wei , Xiangyang Xue , Yanwei Fu

Learning Human Motion Models for Long-term Predictions

We propose a new architecture for the learning of predictive spatio-temporal motion models from data alone. Our approach, dubbed the Dropout Autoencoder LSTM, is capable of synthesizing natural looking motion sequences over long time…

Computer Vision and Pattern Recognition · Computer Science 2017-12-05 Partha Ghosh , Jie Song , Emre Aksan , Otmar Hilliges

Measuring Compositionality in Representation Learning

Many machine learning algorithms represent input data with vector embeddings or discrete codes. When inputs exhibit compositional structure (e.g. objects built from parts or procedures from subroutines), it is natural to ask whether this…

Machine Learning · Computer Science 2019-04-09 Jacob Andreas

Learning Compositional Representations for Few-Shot Recognition

One of the key limitations of modern deep learning approaches lies in the amount of data required to train them. Humans, by contrast, can learn to recognize novel categories from just a few examples. Instrumental to this rapid learning…

Computer Vision and Pattern Recognition · Computer Science 2019-08-20 Pavel Tokmakov , Yu-Xiong Wang , Martial Hebert

Hierarchical Compositional Representations for Few-shot Action Recognition

Recently action recognition has received more and more attention for its comprehensive and practical applications in intelligent surveillance and human-computer interaction. However, few-shot action recognition has not been well explored…

Computer Vision and Pattern Recognition · Computer Science 2024-01-22 Changzhen Li , Jie Zhang , Shuzhe Wu , Xin Jin , Shiguang Shan

FLD: Fourier Latent Dynamics for Structured Motion Representation and Learning

Motion trajectories offer reliable references for physics-based motion learning but suffer from sparsity, particularly in regions that lack sufficient data coverage. To address this challenge, we introduce a self-supervised, structured…

Machine Learning · Computer Science 2024-02-22 Chenhao Li , Elijah Stanger-Jones , Steve Heim , Sangbae Kim

Pedestrian Trajectory Prediction with Structured Memory Hierarchies

This paper presents a novel framework for human trajectory prediction based on multimodal data (video and radar). Motivated by recent neuroscience discoveries, we propose incorporating a structured memory component in the human trajectory…

Computer Vision and Pattern Recognition · Computer Science 2018-07-24 Tharindu Fernando , Simon Denman , Sridha Sridharan , Clinton Fookes