English
Related papers

Related papers: Dual Encoding for Zero-Example Video Retrieval

200 papers

This paper attacks the challenging problem of video retrieval by text. In such a retrieval paradigm, an end user searches for unlabeled videos by ad-hoc queries described exclusively in the form of a natural-language sentence, with no…

Computer Vision and Pattern Recognition · Computer Science 2021-02-19 Jianfeng Dong , Xirong Li , Chaoxi Xu , Xun Yang , Gang Yang , Xun Wang , Meng Wang

The goal of text-to-video retrieval is to search large databases for relevant videos based on text queries. Existing methods have progressed to handling explicit queries where the visual content of interest is described explicitly; however,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Yiqing Shen , Chenxiao Fan , Chenjia Li , Mathias Unberath

Answering query with semantic concepts has long been the mainstream approach for video search. Until recently, its performance is surpassed by concept-free approach, which embeds queries in a joint space as videos. Nevertheless, the…

Computer Vision and Pattern Recognition · Computer Science 2024-02-20 Jiaxin Wu , Chong-Wah Ngo

The rapid growth of user-generated videos on the Internet has intensified the need for text-based video retrieval systems. Traditional methods mainly favor the concept-based paradigm on retrieval with simple queries, which are usually…

Computer Vision and Pattern Recognition · Computer Science 2020-07-07 Xun Yang , Jianfeng Dong , Yixin Cao , Xun Wang , Meng Wang , Tat-Seng Chua

We describe a protocol to study text-to-video retrieval training with unlabeled videos, where we assume (i) no access to labels for any videos, i.e., no access to the set of ground-truth captions, but (ii) access to labeled images in the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-29 Lucas Ventura , Cordelia Schmid , Gül Varol

Video Retrieval is a challenging task where a text query is matched to a video or vice versa. Most of the existing approaches for addressing such a problem rely on annotations made by the users. Although simple, this approach is not always…

Computer Vision and Pattern Recognition · Computer Science 2021-03-01 Jesús Andrés Portillo-Quintero , José Carlos Ortiz-Bayliss , Hugo Terashima-Marín

Contrastively-trained Vision-Language Models (VLMs), such as CLIP, have become the standard approach for learning discriminative vision-language representations. However, these models often exhibit shallow language understanding,…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Ioanna Ntinou , Alexandros Xenos , Yassine Ouali , Adrian Bulat , Georgios Tzimiropoulos

Recently, with the enormous growth of online videos, fast video retrieval research has received increasing attention. As an extension of image hashing techniques, traditional video hashing methods mainly depend on hand-crafted features and…

Computer Vision and Pattern Recognition · Computer Science 2017-12-04 Yj Dong , JG Li

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos,…

Visual-semantic embedding is an interesting research topic because it is useful for various tasks, such as visual question answering (VQA), image-text retrieval, image captioning, and scene graph generation. In this paper, we focus on…

Computer Vision and Pattern Recognition · Computer Science 2021-09-29 Kazuya Ueki

The task of retrieving video content relevant to natural language queries plays a critical role in effectively handling internet-scale datasets. Most of the existing methods for this caption-to-video retrieval problem do not fully exploit…

Computer Vision and Pattern Recognition · Computer Science 2020-07-22 Valentin Gabeur , Chen Sun , Karteek Alahari , Cordelia Schmid

The rapid growth of video on the internet has made searching for video content using natural language queries a significant challenge. Human-generated queries for video datasets `in the wild' vary a lot in terms of degree of specificity,…

Computer Vision and Pattern Recognition · Computer Science 2020-02-17 Yang Liu , Samuel Albanie , Arsha Nagrani , Andrew Zisserman

Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a core theme in multimedia data management and retrieval. The success of AVS counts on cross-modal representation learning that encodes both query…

Computer Vision and Pattern Recognition · Computer Science 2020-11-25 Xirong Li , Fangming Zhou , Chaoxi Xu , Jiaqi Ji , Gang Yang

Video semantic search in densely crowded scenes remains a challenging task due to visual encoders tendency to prioritize salient foreground regions while neglecting contextually important, background areas. We propose an Inverse Attention…

Computer Vision and Pattern Recognition · Computer Science 2026-05-08 Faisal Aljehrai , Mohammed A. Alkhrashi , Alreem Almuhrij , Sarah Abuhimed , Noorh Aldossary , Abdullah Aldwyish , Raied Aljadaany , Huda Alamri , Muhammad Kamran J Khan

Visual Document Retrieval (VDR) typically operates as text-to-image retrieval using specialized bi-encoders trained to directly embed document images. We revisit a zero-shot generate-and-encode pipeline: a vision-language model first…

Information Retrieval · Computer Science 2025-09-22 Thong Nguyen , Yibin Lei , Jia-Huei Ju , Andrew Yates

Cross-modal retrieval has become popular in recent years, particularly with the rise of multimedia. Generally, the information from each modality exhibits distinct representations and semantic information, which makes feature tends to be in…

Information Retrieval · Computer Science 2023-08-29 Zichen Yuan , Qi Shen , Bingyi Zheng , Yuting Liu , Linying Jiang , Guibing Guo

In the recent years, the dual-encoder vision-language models (\eg CLIP) have achieved remarkable text-to-image retrieval performance. However, we discover that these models usually results in very different retrievals for a pair of…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Jiacheng Cheng , Hijung Valentina Shin , Nuno Vasconcelos , Bryan Russell , Fabian Caba Heilbron

Handwritten word retrieval is vital for digital archives but remains challenging due to large handwriting variability and cross-lingual semantic gaps. While large vision-language models offer potential solutions, their prohibitive…

Computer Vision and Pattern Recognition · Computer Science 2026-01-19 Fangke Chen , Tianhao Dong , Sirry Chen , Guobin Zhang , Yishu Zhang , Yining Chen

There has been significant attention to the research on dense video captioning, which aims to automatically localize and caption all events within untrimmed video. Several studies introduce methods by designing dense video captioning as a…

Computer Vision and Pattern Recognition · Computer Science 2024-04-12 Minkuk Kim , Hyeon Bae Kim , Jinyoung Moon , Jinwoo Choi , Seong Tae Kim

Precise video retrieval requires multi-modal correlations to handle unseen vocabulary and scenes, becoming more complex for lengthy videos where models must perform effectively without prior training on a specific dataset. We introduce a…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Mohamed Eltahir , Osamah Sarraj , Mohammed Bremoo , Mohammed Khurd , Abdulrahman Alfrihidi , Taha Alshatiri , Mohammad Almatrafi , Tanveer Hussain
‹ Prev 1 2 3 10 Next ›