English
Related papers

Related papers: Temporal Reasoning Graph for Activity Recognition

200 papers

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the…

Computer Vision and Pattern Recognition · Computer Science 2018-07-26 Bolei Zhou , Alex Andonian , Aude Oliva , Antonio Torralba

Temporal relational modeling in video is essential for human action understanding, such as action recognition and action segmentation. Although Graph Convolution Networks (GCNs) have shown promising advantages in relation reasoning on many…

Computer Vision and Pattern Recognition · Computer Science 2020-12-15 Dong Wang , Di Hu , Xingjian Li , Dejing Dou

Modeling relation between actors is important for recognizing group activity in a multi-person scene. This paper aims at learning discriminative relation between actors efficiently using deep models. To this end, we propose to build a…

Computer Vision and Pattern Recognition · Computer Science 2019-04-24 Jianchao Wu , Limin Wang , Li Wang , Jie Guo , Gangshan Wu

Knowledge is inherently time-sensitive and continuously evolves over time. Although current Retrieval-Augmented Generation (RAG) systems enrich LLMs with external knowledge, they largely ignore this temporal nature. This raises two…

Information Retrieval · Computer Science 2025-10-16 Jiale Han , Austin Cheung , Yubai Wei , Zheng Yu , Xusheng Wang , Bing Zhu , Yi Yang

In this paper, we propose an approach that spatially localizes the activities in a video frame where each person can perform multiple activities at the same time. Our approach takes the temporal scene context as well as the relations of the…

Computer Vision and Pattern Recognition · Computer Science 2021-01-22 Sovan Biswas , Yaser Souri , Juergen Gall

In this paper, we newly introduce the concept of temporal attention filters, and describe how they can be used for human activity recognition from videos. Many high-level activities are often composed of multiple temporal parts (e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2016-12-28 AJ Piergiovanni , Chenyou Fan , Michael S. Ryoo

This thesis explore different approaches using Convolutional and Recurrent Neural Networks to classify and temporally localize activities on videos, furthermore an implementation to achieve it has been proposed. As the first step, features…

Computer Vision and Pattern Recognition · Computer Science 2017-03-06 Alberto Montes , Amaia Salvador , Santiago Pascual , Xavier Giro-i-Nieto

Temporal reasoning is an important aspect of video analysis. 3D CNN shows good performance by exploring spatial-temporal features jointly in an unconstrained way, but it also increases the computational cost a lot. Previous works try to…

Computer Vision and Pattern Recognition · Computer Science 2019-10-01 Chenxu Luo , Alan Yuille

Many human activities take minutes to unfold. To represent them, related works opt for statistical pooling, which neglects the temporal structure. Others opt for convolutional methods, as CNN and Non-Local. While successful in learning…

Computer Vision and Pattern Recognition · Computer Science 2019-10-15 Noureldien Hussein , Efstratios Gavves , Arnold W. M. Smeulders

Consider the scenario where a human cleans a table and a robot observing the scene is instructed with the task "Remove the cloth using which I wiped the table". Instruction following with temporal reasoning requires the robot to identify…

Recent large vision-language models have achieved strong performance on short- and medium-length video understanding, yet they remain inadequate for ultra-long or even infinite video reasoning, where models must preserve coherent memory…

Artificial Intelligence · Computer Science 2026-05-08 Peizheng Yan , Yu Zhao , Liang Xie , Juntong Qi , Mingming Wang , Erwei Yin

Graph Convolutional Networks (GCNs), which model skeleton data as graphs, have obtained remarkable performance for skeleton-based action recognition. Particularly, the temporal dynamic of skeleton sequence conveys significant information in…

Computer Vision and Pattern Recognition · Computer Science 2020-12-17 Jianan Li , Xuemei Xie , Zhifu Zhao , Yuhan Cao , Qingzhe Pan , Guangming Shi

In this paper, we propose a method for activity recognition from videos based on sparse local features and hypergraph matching. We benefit from special properties of the temporal domain in the data to derive a sequential and fast graph…

Computer Vision and Pattern Recognition · Computer Science 2015-05-05 Eric Lombardi , Christian Wolf , Oya Celiktutan , Bülent Sankur

Interpretation and understanding of video presents a challenging computer vision task in numerous fields - e.g. autonomous driving and sports analytics. Existing approaches to interpreting the actions taking place within a video clip are…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Salman Khan , Izzeddin Teeti , Andrew Bradley , Mohamed Elhoseiny , Fabio Cuzzolin

Given an object of interest, visual navigation aims to reach the object's location based on a sequence of partial observations. To this end, an agent needs to 1) learn a piece of certain knowledge about the relations of object categories in…

Computer Vision and Pattern Recognition · Computer Science 2023-12-07 Xiaobo Hu , Youfang Lin , HeHe Fan , Shuo Wang , Zhihao Wu , Kai Lv

Dynamic graph learning methods have recently emerged as powerful tools for modelling relational data evolving through time. However, despite extensive benchmarking efforts, it remains unclear whether current Temporal Graph Neural Networks…

Machine Learning · Computer Science 2025-07-23 Alireza Dizaji , Benedict Aaron Tjandra , Mehrab Hamidi , Shenyang Huang , Guillaume Rabusseau

Discovering the underlying structures present in large real world graphs is a fundamental scientific problem. Recent work at the intersection of formal language theory and graph theory has found that a Hyperedge Replacement Grammar (HRG)…

Social and Information Networks · Computer Science 2017-06-30 Corey Pennycuff , Salvador Aguinaga , Tim Weninger

Large Video Language Models (LVLMs) have rapidly emerged as the focus of multimedia AI research. Nonetheless, when confronted with lengthy videos, these models struggle: their temporal windows are narrow, and they fail to notice…

Computer Vision and Pattern Recognition · Computer Science 2025-12-30 Zongsheng Cao , Yangfan He , Anran Liu , Feng Chen , Zepeng Wang , Jun Xie

Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence. This task has achieved significant momentum in the computer vision community as it enables activity grounding beyond…

Computer Vision and Pattern Recognition · Computer Science 2023-05-16 Juncheng Li , Siliang Tang , Linchao Zhu , Wenqiao Zhang , Yi Yang , Tat-Seng Chua , Fei Wu , Yueting Zhuang

Large language models (LLMs) have demonstrated strong performance in natural language generation but remain limited in knowle- dge-intensive tasks due to outdated or incomplete internal knowledge. Retrieval-Augmented Generation (RAG)…

Artificial Intelligence · Computer Science 2025-08-05 Dong Li , Yichen Niu , Ying Ai , Xiang Zou , Biqing Qi , Jianxing Liu
‹ Prev 1 2 3 10 Next ›