Related papers: Context-Oriented Web Video Tag Recommendation

Video Ads Content Structuring by Combining Scene Confidence Prediction and Tagging

Video ads segmentation and tagging is a challenging task due to two main reasons: (1) the video scene structure is complex and (2) it includes multiple modalities (e.g., visual, audio, text.). While previous work focuses mostly on activity…

Computer Vision and Pattern Recognition · Computer Science 2021-08-23 Tomoyuki Suzuki , Antonio Tejero-de-Pablos

TED Talk Recommender Using Speech Transcripts

Nowadays, online video platforms mostly recommend related videos by analyzing user-driven data such as viewing patterns, rather than the content of the videos. However, content is more important than any other element when videos aim to…

Social and Information Networks · Computer Science 2018-10-31 Jaehoon Oh , Injung Lee , Yeon Seonwoo , Simin Sung , Ilbong Kwon , Jae-Gil Lee

R-Rec: A rule-based system for contextual suggestion using tag-description similarity

Contextual Suggestion deals with search techniques for complex information needs that are highly focused on context and user needs. In this paper, we propose \emph{R-Rec}, a novel rule-based technique to identify and recommend appropriate…

Information Retrieval · Computer Science 2017-07-06 Kshitij Singh , Manajit Chakraborty , C. Ravindranath Chowdary

An Empirical Study of Frame Selection for Text-to-Video Retrieval

Text-to-video retrieval (TVR) aims to find the most relevant video in a large video gallery given a query text. The intricate and abundant context of the video challenges the performance and efficiency of TVR. To handle the serialized video…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Mengxia Wu , Min Cao , Yang Bai , Ziyin Zeng , Chen Chen , Liqiang Nie , Min Zhang

Dynamic inference of user context through social tag embedding for music recommendation

Music listening preferences at a given time depend on a wide range of contextual factors, such as user emotional state, location and activity at listening time, the day of the week, the time of the day, etc. It is therefore of great…

Information Retrieval · Computer Science 2021-09-24 Diego Sánchez-Moreno , Álvaro Lozano Murciego , Vivian F. López Batista , María Dolores Muñoz Vicente , María N. Moreno-García

VTC: Improving Video-Text Retrieval with User Comments

Multi-modal retrieval is an important problem for many applications, such as recommendation and search. Current benchmarks and even datasets are often manually constructed and consist of mostly clean samples where all modalities are…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Laura Hanu , James Thewlis , Yuki M. Asano , Christian Rupprecht

Context Enhanced Transformer for Single Image Object Detection

With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various…

Computer Vision and Pattern Recognition · Computer Science 2023-12-27 Seungjun An , Seonghoon Park , Gyeongnyeon Kim , Jeongyeol Baek , Byeongwon Lee , Seungryong Kim

Micro-video Tagging via Jointly Modeling Social Influence and Tag Relation

The last decade has witnessed the proliferation of micro-videos on various user-generated content platforms. According to our statistics, around 85.7\% of micro-videos lack annotation. In this paper, we focus on annotating micro-videos with…

Multimedia · Computer Science 2023-03-16 Xiao Wang , Tian Gan , Yinwei Wei , Jianlong Wu , Dai Meng , Liqiang Nie

A Data-Driven Approach for Tag Refinement and Localization in Web Videos

Tagging of visual content is becoming more and more widespread as web-based services and social networks have popularized tagging functionalities among their users. These user-generated tags are used to ease browsing and exploration of…

Computer Vision and Pattern Recognition · Computer Science 2015-09-09 Lamberto Ballan , Marco Bertini , Giuseppe Serra , Alberto Del Bimbo

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

Creating a vivid video from the event or scenario in our imagination is a truly fascinating experience. Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. While text is convenient…

Computer Vision and Pattern Recognition · Computer Science 2023-06-02 Jinbo Xing , Menghan Xia , Yuxin Liu , Yuechen Zhang , Yong Zhang , Yingqing He , Hanyuan Liu , Haoxin Chen , Xiaodong Cun , Xintao Wang , Ying Shan , Tien-Tsin Wong

Strategies for Searching Video Content with Text Queries or Video Examples

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos,…

Information Retrieval · Computer Science 2016-06-21 Shoou-I Yu , Yi Yang , Zhongwen Xu , Shicheng Xu , Deyu Meng , Zexi Mao , Zhigang Ma , Ming Lin , Xuanchong Li , Huan Li , Zhenzhong Lan , Lu Jiang , Alexander G. Hauptmann , Chuang Gan , Xingzhong Du , Xiaojun Chang

Using Contextual Information as Virtual Items on Top-N Recommender Systems

Traditionally, recommender systems for the Web deal with applications that have two dimensions, users and items. Based on access logs that relate these dimensions, a recommendation model can be built and used to identify a set of N items…

Machine Learning · Computer Science 2011-11-16 Marcos A. Domingues , Alipio Mario Jorge , Carlos Soares

Tag-Based Attention Guided Bottom-Up Approach for Video Instance Segmentation

Video Instance Segmentation is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence. Most existing methods typically accomplish this task by employing a multi-stage top-down…

Computer Vision and Pattern Recognition · Computer Science 2022-04-25 Jyoti Kini , Mubarak Shah

Learning Video Representations from Textual Web Supervision

Videos on the Internet are paired with pieces of text, such as titles and descriptions. This text typically describes the most important content in the video, such as the objects in the scene and the actions being performed. Based on this…

Computer Vision and Pattern Recognition · Computer Science 2021-08-31 Jonathan C. Stroud , Zhichao Lu , Chen Sun , Jia Deng , Rahul Sukthankar , Cordelia Schmid , David A. Ross

Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval

Video retrieval requires aligning visual content with corresponding natural language descriptions. In this paper, we introduce Modality Auxiliary Concepts for Video Retrieval (MAC-VR), a novel approach that leverages modality-specific tags…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Adriano Fragomeni , Dima Damen , Michael Wray

You Only Recognize Once: Towards Fast Video Text Spotting

Video text spotting is still an important research topic due to its various real-applications. Previous approaches usually fall into the four-staged pipeline: text detection in individual images, framewisely recognizing localized text…

Computer Vision and Pattern Recognition · Computer Science 2021-10-26 Zhanzhan Cheng , Jing Lu , Yi Niu , Shiliang Pu , Fei Wu , Shuigeng Zhou

Content-Based Video Browsing by Text Region Localization and Classification

The amount of digital video data is increasing over the world. It highlights the need for efficient algorithms that can index, retrieve and browse this data by content. This can be achieved by identifying semantic description captured…

Multimedia · Computer Science 2013-01-11 Bassem Bouaziz , Walid Mahdi , Tarek Zlitni , Abdelmajid ben Hamadou

Contrastive Graph Multimodal Model for Text Classification in Videos

The extraction of text information in videos serves as a critical step towards semantic understanding of videos. It usually involved in two steps: (1) text recognition and (2) text classification. To localize texts in videos, we can resort…

Computer Vision and Pattern Recognition · Computer Science 2022-06-07 Ye Liu , Changchong Lu , Chen Lin , Di Yin , Bo Ren

On the Role of Visual Context in Enriching Music Representations

Human perception and experience of music is highly context-dependent. Contextual variability contributes to differences in how we interpret and interact with music, challenging the design of robust models for information retrieval.…

Sound · Computer Science 2022-10-31 Kleanthis Avramidis , Shanti Stewart , Shrikanth Narayanan

CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval

This paper tackles a recently proposed Video Corpus Moment Retrieval task. This task is essential because advanced video retrieval applications should enable users to retrieve a precise moment from a large video corpus. We propose a novel…

Multimedia · Computer Science 2021-09-22 Zhijian Hou , Chong-Wah Ngo , Wing Kwong Chan