English
Related papers

Related papers: Controllable Augmentations for Video Representatio…

200 papers

We focus on contrastive methods for self-supervised video representation learning. A common paradigm in contrastive learning is to construct positive pairs by sampling different data views for the same instance, with different data…

Computer Vision and Pattern Recognition · Computer Science 2021-08-23 Chen Sun , Arsha Nagrani , Yonglong Tian , Cordelia Schmid

Contrastive learning of auditory and visual perception has been extremely successful when investigated individually. However, there are still major questions on how we could integrate principles learned from both domains to attain effective…

Computer Vision and Pattern Recognition · Computer Science 2021-10-15 Haider Al-Tahan , Yalda Mohsenzadeh

We present a self-supervised Contrastive Video Representation Learning (CVRL) method to learn spatiotemporal visual representations from unlabeled videos. Our representations are learned using a contrastive loss, where two augmented clips…

Computer Vision and Pattern Recognition · Computer Science 2021-04-07 Rui Qian , Tianjian Meng , Boqing Gong , Ming-Hsuan Yang , Huisheng Wang , Serge Belongie , Yin Cui

Contrastive learning has revolutionized self-supervised image representation learning field, and recently been adapted to video domain. One of the greatest advantages of contrastive learning is that it allows us to flexibly define powerful…

Computer Vision and Pattern Recognition · Computer Science 2021-08-06 Haofei Kuang , Yi Zhu , Zhi Zhang , Xinyu Li , Joseph Tighe , Sören Schwertfeger , Cyrill Stachniss , Mu Li

We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision. In contrast to images that capture the static scene appearance, videos also…

Computer Vision and Pattern Recognition · Computer Science 2023-02-16 Simon Jenni , Alexander Black , John Collomosse

Self-supervised approaches for video have shown impressive results in video understanding tasks. However, unlike early works that leverage temporal self-supervision, current state-of-the-art methods primarily rely on tasks from the image…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Ishan Rajendrakumar Dave , Simon Jenni , Mubarak Shah

We propose a self-supervised method to learn feature representations from videos. A standard approach in traditional self-supervised methods uses positive-negative data pairs to train with contrastive learning strategy. In such a case,…

Computer Vision and Pattern Recognition · Computer Science 2020-08-13 Li Tao , Xueting Wang , Toshihiko Yamasaki

Contrastive learning has nearly closed the gap between supervised and self-supervised learning of image representations, and has also been explored for videos. However, prior work on contrastive learning for video data has not explored the…

Computer Vision and Pattern Recognition · Computer Science 2022-03-31 Ishan Dave , Rohit Gupta , Mamshad Nayeem Rizve , Mubarak Shah

We present a novel technique for self-supervised video representation learning by: (a) decoupling the learning objective into two contrastive subtasks respectively emphasizing spatial and temporal features, and (b) performing it…

Computer Vision and Pattern Recognition · Computer Science 2021-09-02 Zehua Zhang , David Crandall

We propose a self-supervised visual learning method by predicting the variable playback speeds of a video. Without semantic labels, we learn the spatio-temporal visual representation of the video by leveraging the variations in the visual…

Computer Vision and Pattern Recognition · Computer Science 2021-06-02 Hyeon Cho , Taehoon Kim , Hyung Jin Chang , Wonjun Hwang

We introduce a novel self-supervised contrastive learning method to learn representations from unlabelled videos. Existing approaches ignore the specifics of input distortions, e.g., by learning invariance to temporal transformations.…

Computer Vision and Pattern Recognition · Computer Science 2021-12-08 Simon Jenni , Hailin Jin

Robust frame-wise embeddings are essential to perform video analysis and understanding tasks. We present a self-supervised method for representation learning based on aligning temporal video sequences. Our framework uses a transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Keyne Oei , Amr Gomaa , Anna Maria Feit , João Belo

We propose a supervised contrastive learning framework for video representation learning that leverages temporally global context. We introduce a video to image aggregation strategy that spatially arranges multiple frames from each video…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Shaif Chowdhury , Mushfika Rahman , Greg Hamerly

We present a self-supervised approach for learning video representations using temporal video alignment as a pretext task, while exploiting both frame-level and video-level information. We leverage a novel combination of temporal alignment…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Sanjay Haresh , Sateesh Kumar , Huseyin Coskun , Shahram Najam Syed , Andrey Konin , Muhammad Zeeshan Zia , Quoc-Huy Tran

Contrastive learning has delivered impressive results for various tasks in the self-supervised regime. However, existing approaches optimize for learning representations specific to downstream scenarios, i.e., \textit{global}…

Machine Learning · Computer Science 2021-10-29 Shuang Ma , Zhaoyang Zeng , Daniel McDuff , Yale Song

The crux of self-supervised video representation learning is to build general features from unlabeled videos. However, most recent works have mainly focused on high-level semantics and neglected lower-level representations and their…

Computer Vision and Pattern Recognition · Computer Science 2021-08-18 Rui Qian , Yuxi Li , Huabin Liu , John See , Shuangrui Ding , Xian Liu , Dian Li , Weiyao Lin

We introduce a weakly supervised method for representation learning based on aligning temporal sequences (e.g., videos) of the same process (e.g., human action). The main idea is to use the global temporal ordering of latent correspondences…

Computer Vision and Pattern Recognition · Computer Science 2021-05-12 Isma Hadji , Konstantinos G. Derpanis , Allan D. Jepson

Understanding temporal dynamics of video is an essential aspect of learning better video representations. Recently, transformer-based architectural designs have been extensively explored for video tasks due to their capability to capture…

Computer Vision and Pattern Recognition · Computer Science 2022-07-20 Sukmin Yun , Jaehyung Kim , Dongyoon Han , Hwanjun Song , Jung-Woo Ha , Jinwoo Shin

To equip artificial intelligence with a comprehensive understanding towards a temporal world, video and 4D panoptic scene graph generation abstracts visual data into nodes to represent entities and edges to capture temporal relations.…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Thong Thanh Nguyen , Xiaobao Wu , Yi Bin , Cong-Duy T Nguyen , See-Kiong Ng , Anh Tuan Luu

We propose a self-supervised contrastive learning approach for facial expression recognition (FER) in videos. We propose a novel temporal sampling-based augmentation scheme to be utilized in addition to standard spatial augmentations used…

Computer Vision and Pattern Recognition · Computer Science 2021-08-09 Shuvendu Roy , Ali Etemad
‹ Prev 1 2 3 10 Next ›