English
Related papers

Related papers: Learning Temporal Embeddings for Complex Video Ana…

200 papers

In video analysis, understanding the temporal context is crucial for recognizing object interactions, event patterns, and contextual changes over time. The proposed model leverages adjacency and semantic similarities between objects from…

Computer Vision and Pattern Recognition · Computer Science 2024-08-26 Ahnaf Farhan , M. Shahriar Hossain

Understanding the structure of complex activities in untrimmed videos is a challenging task in the area of action recognition. One problem here is that this task usually requires a large amount of hand-annotated minute- or even hour-long…

Computer Vision and Pattern Recognition · Computer Science 2020-10-01 Rosaura G. VidalMata , Walter J. Scheirer , Anna Kukleva , David Cox , Hilde Kuehne

In this dissertation, I present my work towards exploring temporal information for better video understanding. Specifically, I have worked on two problems: action recognition and semantic segmentation. For action recognition, I have…

Computer Vision and Pattern Recognition · Computer Science 2019-05-28 Yi Zhu

This paper presents TCE: Temporally Coherent Embeddings for self-supervised video representation learning. The proposed method exploits inherent structure of unlabeled video data to explicitly enforce temporal coherency in the embedding…

Computer Vision and Pattern Recognition · Computer Science 2020-11-18 Joshua Knights , Ben Harwood , Daniel Ward , Anthony Vanderkop , Olivia Mackenzie-Ross , Peyman Moghadam

This thesis explores the central question of how to leverage temporal relations among video elements to advance video understanding. Addressing the limitations of existing methods, the work presents a five-fold contribution: (1) an…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Thong Thanh Nguyen

In recent years, there has been remarkable progress in supervised image segmentation. Video segmentation is less explored, despite the temporal dimension being highly informative. Semantic labels, e.g. that cannot be accurately detected in…

Computer Vision and Pattern Recognition · Computer Science 2019-08-30 Radu Sibechi , Olaf Booij , Nora Baka , Peter Bloem

We present a self-supervised approach for learning video representations using temporal video alignment as a pretext task, while exploiting both frame-level and video-level information. We leverage a novel combination of temporal alignment…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Sanjay Haresh , Sateesh Kumar , Huseyin Coskun , Shahram Najam Syed , Andrey Konin , Muhammad Zeeshan Zia , Quoc-Huy Tran

Deep neural networks are efficient learning machines which leverage upon a large amount of manually labeled data for learning discriminative features. However, acquiring substantial amount of supervised data, especially for videos can be a…

Computer Vision and Pattern Recognition · Computer Science 2018-08-16 Sujoy Paul , Sourya Roy , Amit K. Roy-Chowdhury

Robust video scene classification models should capture the spatial (pixel-wise) and temporal (frame-wise) characteristics of a video effectively. Transformer models with self-attention which are designed to get contextualized…

Computer Vision and Pattern Recognition · Computer Science 2021-10-28 Saurabh Sahu , Palash Goyal

Video-Language Pre-training models have recently significantly improved various multi-modal downstream tasks. Previous dominant works mainly adopt contrastive learning to achieve global feature alignment across modalities. However, the…

Computer Vision and Pattern Recognition · Computer Science 2023-01-19 Fan Ma , Xiaojie Jin , Heng Wang , Jingjia Huang , Linchao Zhu , Jiashi Feng , Yi Yang

In this work we address the challenging problem of unsupervised learning from videos. Existing methods utilize the spatio-temporal continuity in contiguous video frames as regularization for the learning process. Typically, this temporal…

Computer Vision and Pattern Recognition · Computer Science 2018-10-12 Carolina Redondo-Cabrera , Roberto J. López-Sastre

Temporal action segmentation in untrimmed videos has gained increased attention recently. However, annotating action classes and frame-wise boundaries is extremely time consuming and cost intensive, especially on large-scale datasets. To…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Wei Lin , Anna Kukleva , Horst Possegger , Hilde Kuehne , Horst Bischof

The task of temporally detecting and segmenting actions in untrimmed videos has seen an increased attention recently. One problem in this context arises from the need to define and label action boundaries to create annotations for training…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Anna Kukleva , Hilde Kuehne , Fadime Sener , Juergen Gall

Understanding temporal information and how the visual world changes over time is a fundamental ability of intelligent systems. In video understanding, temporal information is at the core of many current challenges, including compression,…

Computer Vision and Pattern Recognition · Computer Science 2019-10-31 Laura Sevilla-Lara , Shengxin Zha , Zhicheng Yan , Vedanuj Goswami , Matt Feiszli , Lorenzo Torresani

Most existing real-time deep models trained with each frame independently may produce inconsistent results across the temporal axis when tested on a video sequence. A few methods take the correlations in the video sequence into…

Computer Vision and Pattern Recognition · Computer Science 2022-02-28 Yifan Liu , Chunhua Shen , Changqian Yu , Jingdong Wang

Video-based multimodal large language models (Video-LLMs) possess significant potential for video understanding tasks. However, most Video-LLMs treat videos as a sequential set of individual frames, which results in insufficient…

Computer Vision and Pattern Recognition · Computer Science 2024-10-16 Xiaohan Lan , Yitian Yuan , Zequn Jie , Lin Ma

Recent advances in Large Language Models (LLMs) have led to significant breakthroughs in video understanding. However, existing models still struggle with long video processing due to the context length constraint of LLMs and the vast…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Haoran Hao , Jiaming Han , Yiyuan Zhang , Xiangyu Yue

For training a video-based action recognition model that accepts multi-view video, annotating frame-level labels is tedious and difficult. However, it is relatively easy to annotate sequence-level labels. This kind of coarse annotations are…

Computer Vision and Pattern Recognition · Computer Science 2024-03-20 Vijay John , Yasutomo Kawanishi

True understanding of videos comes from a joint analysis of all its modalities: the video frames, the audio track, and any accompanying text such as closed captions. We present a way to learn a compact multimodal feature representation that…

Computer Vision and Pattern Recognition · Computer Science 2020-04-07 Vivek Sharma , Makarand Tapaswi , Rainer Stiefelhagen

We introduce a self-supervised representation learning method based on the task of temporal alignment between videos. The method trains a network using temporal cycle consistency (TCC), a differentiable cycle-consistency loss that can be…

Computer Vision and Pattern Recognition · Computer Science 2019-04-17 Debidatta Dwibedi , Yusuf Aytar , Jonathan Tompson , Pierre Sermanet , Andrew Zisserman
‹ Prev 1 2 3 10 Next ›