Related papers: Learning Temporal Embeddings for Complex Video Ana…

Context-Aware Temporal Embedding of Objects in Video Data

In video analysis, understanding the temporal context is crucial for recognizing object interactions, event patterns, and contextual changes over time. The proposed model leverages adjacency and semantic similarities between objects from…

Computer Vision and Pattern Recognition · Computer Science 2024-08-26 Ahnaf Farhan , M. Shahriar Hossain

Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences

Understanding the structure of complex activities in untrimmed videos is a challenging task in the area of action recognition. One problem here is that this task usually requires a large amount of hand-annotated minute- or even hour-long…

Computer Vision and Pattern Recognition · Computer Science 2020-10-01 Rosaura G. VidalMata , Walter J. Scheirer , Anna Kukleva , David Cox , Hilde Kuehne

Exploring Temporal Information for Improved Video Understanding

In this dissertation, I present my work towards exploring temporal information for better video understanding. Specifically, I have worked on two problems: action recognition and semantic segmentation. For action recognition, I have…

Computer Vision and Pattern Recognition · Computer Science 2019-05-28 Yi Zhu

Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

This paper presents TCE: Temporally Coherent Embeddings for self-supervised video representation learning. The proposed method exploits inherent structure of unlabeled video data to explicitly enforce temporal coherency in the embedding…

Computer Vision and Pattern Recognition · Computer Science 2020-11-18 Joshua Knights , Ben Harwood , Daniel Ward , Anthony Vanderkop , Olivia Mackenzie-Ross , Peyman Moghadam

Video Understanding: Through A Temporal Lens

This thesis explores the central question of how to leverage temporal relations among video elements to advance video understanding. Addressing the limitations of existing methods, the work presents a five-fold contribution: (1) an…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Thong Thanh Nguyen

Exploiting Temporality for Semi-Supervised Video Segmentation

In recent years, there has been remarkable progress in supervised image segmentation. Video segmentation is less explored, despite the temporal dimension being highly informative. Semantic labels, e.g. that cannot be accurately detected in…

Computer Vision and Pattern Recognition · Computer Science 2019-08-30 Radu Sibechi , Olaf Booij , Nora Baka , Peter Bloem

Learning by Aligning Videos in Time

We present a self-supervised approach for learning video representations using temporal video alignment as a pretext task, while exploiting both frame-level and video-level information. We leverage a novel combination of temporal alignment…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Sanjay Haresh , Sateesh Kumar , Huseyin Coskun , Shahram Najam Syed , Andrey Konin , Muhammad Zeeshan Zia , Quoc-Huy Tran

Incorporating Scalability in Unsupervised Spatio-Temporal Feature Learning

Deep neural networks are efficient learning machines which leverage upon a large amount of manually labeled data for learning discriminative features. However, acquiring substantial amount of supervised data, especially for videos can be a…

Computer Vision and Pattern Recognition · Computer Science 2018-08-16 Sujoy Paul , Sourya Roy , Amit K. Roy-Chowdhury

Leveraging Local Temporal Information for Multimodal Scene Classification

Robust video scene classification models should capture the spatial (pixel-wise) and temporal (frame-wise) characteristics of a video effectively. Transformer models with self-attention which are designed to get contextualized…

Computer Vision and Pattern Recognition · Computer Science 2021-10-28 Saurabh Sahu , Palash Goyal

Temporal Perceiving Video-Language Pre-training

Video-Language Pre-training models have recently significantly improved various multi-modal downstream tasks. Previous dominant works mainly adopt contrastive learning to achieve global feature alignment across modalities. However, the…

Computer Vision and Pattern Recognition · Computer Science 2023-01-19 Fan Ma , Xiaojie Jin , Heng Wang , Jingjia Huang , Linchao Zhu , Jiashi Feng , Yi Yang

Unsupervised learning from videos using temporal coherency deep networks

In this work we address the challenging problem of unsupervised learning from videos. Existing methods utilize the spatio-temporal continuity in contiguous video frames as regularization for the learning process. Typically, this temporal…

Computer Vision and Pattern Recognition · Computer Science 2018-10-12 Carolina Redondo-Cabrera , Roberto J. López-Sastre

TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering

Temporal action segmentation in untrimmed videos has gained increased attention recently. However, annotating action classes and frame-wise boundaries is extremely time consuming and cost intensive, especially on large-scale datasets. To…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Wei Lin , Anna Kukleva , Horst Possegger , Hilde Kuehne , Horst Bischof

Unsupervised learning of action classes with continuous temporal embedding

The task of temporally detecting and segmenting actions in untrimmed videos has seen an increased attention recently. One problem in this context arises from the need to define and label action boundaries to create annotations for training…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Anna Kukleva , Hilde Kuehne , Fadime Sener , Juergen Gall

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

Understanding temporal information and how the visual world changes over time is a fundamental ability of intelligent systems. In video understanding, temporal information is at the core of many current challenges, including compression,…

Computer Vision and Pattern Recognition · Computer Science 2019-10-31 Laura Sevilla-Lara , Shengxin Zha , Zhicheng Yan , Vedanuj Goswami , Matt Feiszli , Lorenzo Torresani

Efficient Video Segmentation Models with Per-frame Inference

Most existing real-time deep models trained with each frame independently may produce inconsistent results across the temporal axis when tested on a video sequence. A few methods take the correlations in the video sequence into…

Computer Vision and Pattern Recognition · Computer Science 2022-02-28 Yifan Liu , Chunhua Shen , Changqian Yu , Jingdong Wang

VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models

Video-based multimodal large language models (Video-LLMs) possess significant potential for video understanding tasks. However, most Video-LLMs treat videos as a sequential set of individual frames, which results in insufficient…

Computer Vision and Pattern Recognition · Computer Science 2024-10-16 Xiaohan Lan , Yitian Yuan , Zequn Jie , Lin Ma

Multimodal Long Video Modeling Based on Temporal Dynamic Context

Recent advances in Large Language Models (LLMs) have led to significant breakthroughs in video understanding. However, existing models still struggle with long video processing due to the context length constraint of LLMs and the vast…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Haoran Hao , Jiaming Han , Yiyuan Zhang , Xiangyu Yue

Multi-View Video-Based Learning: Leveraging Weak Labels for Frame-Level Perception

For training a video-based action recognition model that accepts multi-view video, annotating frame-level labels is tedious and difficult. However, it is relatively easy to annotate sequence-level labels. This kind of coarse annotations are…

Computer Vision and Pattern Recognition · Computer Science 2024-03-20 Vijay John , Yasutomo Kawanishi

Deep Multimodal Feature Encoding for Video Ordering

True understanding of videos comes from a joint analysis of all its modalities: the video frames, the audio track, and any accompanying text such as closed captions. We present a way to learn a compact multimodal feature representation that…

Computer Vision and Pattern Recognition · Computer Science 2020-04-07 Vivek Sharma , Makarand Tapaswi , Rainer Stiefelhagen

Temporal Cycle-Consistency Learning

We introduce a self-supervised representation learning method based on the task of temporal alignment between videos. The method trains a network using temporal cycle consistency (TCC), a differentiable cycle-consistency loss that can be…

Computer Vision and Pattern Recognition · Computer Science 2019-04-17 Debidatta Dwibedi , Yusuf Aytar , Jonathan Tompson , Pierre Sermanet , Andrew Zisserman