English
Related papers

Related papers: Deep Multimodal Feature Encoding for Video Orderin…

200 papers

The task of retrieving video content relevant to natural language queries plays a critical role in effectively handling internet-scale datasets. Most of the existing methods for this caption-to-video retrieval problem do not fully exploit…

Computer Vision and Pattern Recognition · Computer Science 2020-07-22 Valentin Gabeur , Chen Sun , Karteek Alahari , Cordelia Schmid

Video advertisement content structuring aims to segment a given video advertisement and label each segment on various dimensions, such as presentation form, scene, and style. Different from real-life videos, video advertisements contain…

Computer Vision and Pattern Recognition · Computer Science 2021-09-15 Daya Guo , Zhaoyang Zeng

The abundance of instructional videos and their narrations over the Internet offers an exciting avenue for understanding procedural activities. In this work, we propose to learn video representation that encodes both action steps and their…

Computer Vision and Pattern Recognition · Computer Science 2023-04-03 Yiwu Zhong , Licheng Yu , Yang Bai , Shangwen Li , Xueting Yan , Yin Li

Learning visual feature representations for video analysis is a daunting task that requires a large amount of training samples and a proper generalization framework. Many of the current state of the art methods for video captioning and…

Machine Learning · Computer Science 2018-09-20 Oliver Nina , Washington Garcia , Scott Clouse , Alper Yilmaz

This paper proposes a practical multimodal video summarization task setting and a dataset to train and evaluate the task. The target task involves summarizing a given video into a predefined number of keyframe-caption pairs and displaying…

Computation and Language · Computer Science 2023-12-05 Keito Kudo , Haruki Nagasawa , Jun Suzuki , Nobuyuki Shimizu

Multimodal learning, which involves integrating information from various modalities such as text, images, audio, and video, is pivotal for numerous complex tasks like visual question answering, cross-modal retrieval, and caption generation.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 G. Thomas Hudson , Dean Slack , Thomas Winterbottom , Jamie Sterling , Chenghao Xiao , Junjie Shentu , Noura Al Moubayed

In this paper, we propose to learn temporal embeddings of video frames for complex video analysis. Large quantities of unlabeled video data can be easily obtained from the Internet. These videos possess the implicit weak label that they are…

Computer Vision and Pattern Recognition · Computer Science 2015-05-05 Vignesh Ramanathan , Kevin Tang , Greg Mori , Li Fei-Fei

How to learn discriminative video representation from unlabeled videos is challenging but crucial for video analysis. The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions.…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Xinyu Sun , Peihao Chen , Liangwei Chen , Changhao Li , Thomas H. Li , Mingkui Tan , Chuang Gan

This paper provides a review on representation learning for videos. We classify recent spatiotemporal feature learning methods for sequential visual data and compare their pros and cons for general video analysis. Building effective…

Computer Vision and Pattern Recognition · Computer Science 2024-05-13 Elham Ravanbakhsh , Yongqing Liang , J. Ramanujam , Xin Li

Videos are a commonly-used type of content in learning during Web search. Many e-learning platforms provide quality content, but sometimes educational videos are long and cover many topics. Humans are good in extracting important sections…

Computer Vision and Pattern Recognition · Computer Science 2020-10-27 Junaid Ahmed Ghauri , Sherzod Hakimov , Ralph Ewerth

There is a growing trend in placing video advertisements on social platforms for online marketing, which demands automatic approaches to understand the contents of advertisements effectively. Taking the 2021 TAAC competition as an…

Computer Vision and Pattern Recognition · Computer Science 2021-08-31 Zejia Weng , Lingchen Meng , Rui Wang , Zuxuan Wu , Yu-Gang Jiang

Video action recognition is one of the representative tasks for video understanding. Over the last decade, we have witnessed great advancements in video action recognition thanks to the emergence of deep learning. But we also encountered…

Computer Vision and Pattern Recognition · Computer Science 2020-12-14 Yi Zhu , Xinyu Li , Chunhui Liu , Mohammadreza Zolfaghari , Yuanjun Xiong , Chongruo Wu , Zhi Zhang , Joseph Tighe , R. Manmatha , Mu Li

Multi-modal Ads Video Understanding Challenge is the first grand challenge aiming to comprehensively understand ads videos. Our challenge includes two tasks: video structuring in the temporal dimension and multi-modal video classification.…

Computer Vision and Pattern Recognition · Computer Science 2021-09-17 Zhenzhi Wang , Liyu Wu , Zhimin Li , Jiangfeng Xiong , Qinglin Lu

Recent advances in Large Language Models (LLMs) have led to significant breakthroughs in video understanding. However, existing models still struggle with long video processing due to the context length constraint of LLMs and the vast…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Haoran Hao , Jiaming Han , Yiyuan Zhang , Xiangyu Yue

Modern video summarization methods are based on deep neural networks that require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep…

Computer Vision and Pattern Recognition · Computer Science 2022-10-20 Li Haopeng , Ke Qiuhong , Gong Mingming , Tom Drummond

Dense video captioning is a task of localizing interesting events from an untrimmed video and producing textual description (captions) for each localized event. Most of the previous works in dense video captioning are solely based on visual…

Computer Vision and Pattern Recognition · Computer Science 2020-05-07 Vladimir Iashin , Esa Rahtu

Time-aware encoding of frame sequences in a video is a fundamental problem in video understanding. While many attempted to model time in videos, an explicit study on quantifying video time is missing. To fill this lacuna, we aim to evaluate…

Computer Vision and Pattern Recognition · Computer Science 2018-07-19 Amir Ghodrati , Efstratios Gavves , Cees G. M. Snoek

We address the problem of text-guided video temporal grounding, which aims to identify the time interval of a certain event based on a natural language description. Different from most existing methods that only consider RGB images as…

Computer Vision and Pattern Recognition · Computer Science 2021-11-01 Yi-Wen Chen , Yi-Hsuan Tsai , Ming-Hsuan Yang

With the explosive growth of video data in real-world applications, a comprehensive representation of videos becomes increasingly important. In this paper, we address the problem of video scene recognition, whose goal is to learn a…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Xuzheng Yu , Chen Jiang , Wei Zhang , Tian Gan , Linlin Chao , Jianan Zhao , Yuan Cheng , Qingpei Guo , Wei Chu

Multimodal ML models can process data in multiple modalities (e.g., video, images, audio, text) and are useful for video content analysis in a variety of problems (e.g., object detection, scene understanding). In this paper, we focus on the…

Computer Vision and Pattern Recognition · Computer Science 2020-06-09 Palash Goyal , Saurabh Sahu , Shalini Ghosh , Chul Lee
‹ Prev 1 2 3 10 Next ›