English
Related papers

Related papers: Visual Text Correction

200 papers

This paper strives to find the sentence best describing the content of an image or video. Different from existing works, which rely on a joint subspace for image / video to sentence matching, we propose to do so in a visual space only. We…

Computer Vision and Pattern Recognition · Computer Science 2016-11-28 Jianfeng Dong , Xirong Li , Cees G. M. Snoek

Video-Text Retrieval (VTR) aims to search for the most relevant video related to the semantics in a given sentence, and vice versa. In general, this retrieval task is composed of four successive steps: video and textual feature…

Computer Vision and Pattern Recognition · Computer Science 2023-02-27 Cunjuan Zhu , Qi Jia , Wei Chen , Yanming Guo , Yu Liu

Video Temporal Grounding (VTG) aims to identify visual frames in a video clip that match text queries. Recent studies in VTG employ cross-attention to correlate visual frames and text queries as individual token sequences. However, these…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Jongbhin Woo , Hyeonggon Ryu , Youngjoon Jang , Jae Won Cho , Joon Son Chung

Research in the Vision and Language area encompasses challenging topics that seek to connect visual and textual information. When the visual information is related to videos, this takes us into Video-Text Research, which includes several…

Computer Vision and Pattern Recognition · Computer Science 2021-12-02 Jesus Perez-Martin , Benjamin Bustos , Silvio Jamil F. Guimarães , Ivan Sipiran , Jorge Pérez , Grethel Coello Said

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do…

Computer Vision and Pattern Recognition · Computer Science 2018-07-17 Jianfeng Dong , Xirong Li , Cees G. M. Snoek

We present a method for matching a text sentence from a given corpus to a given video clip and vice versa. Traditionally video and text matching is done by learning a shared embedding space and the encoding of one modality is independent of…

Computer Vision and Pattern Recognition · Computer Science 2021-10-22 Ameen Ali , Idan Schwartz , Tamir Hazan , Lior Wolf

Although deep neural networks (DNNs) enable great progress in video abnormal event detection (VAD), existing solutions typically suffer from two issues: (1) The localization of video events cannot be both precious and comprehensive. (2) The…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Siqi Wang , Guang Yu , Zhiping Cai , Xinwang Liu , En Zhu , Jianping Yin

Enhancing the diversity of sentences to describe video contents is an important problem arising in recent video captioning research. In this paper, we explore this problem from a novel perspective of customizing video captions by imitating…

Computer Vision and Pattern Recognition · Computer Science 2021-12-03 Yitian Yuan , Lin Ma , Wenwu Zhu

Video captioning is a challenging task since it requires generating sentences describing various diverse and complex videos. Existing video captioning models lack adequate visual representation due to the neglect of the existence of gaps…

Computer Vision and Pattern Recognition · Computer Science 2021-10-14 Mingkang Tang , Zhanyu Wang , Zhenhua Liu , Fengyun Rao , Dian Li , Xiu Li

Cross-modal retrieval between videos and texts has gained increasing research interest due to the rapid emergence of videos on the web. Generally, a video contains rich instance and event information and the query text only describes a part…

Computer Vision and Pattern Recognition · Computer Science 2022-09-28 Chengzhi Lin , Ancong Wu , Junwei Liang , Jun Zhang , Wenhang Ge , Wei-Shi Zheng , Chunhua Shen

Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Iqra Qasim , Alexander Horsch , Dilip K. Prasad

Visual text evokes an image in a person's mind, while non-visual text fails to do so. A method to automatically detect visualness in text will enable text-to-image retrieval and generation models to augment text with relevant images. This…

Computation and Language · Computer Science 2023-10-24 Gaurav Verma , Ryan A. Rossi , Christopher Tensmeyer , Jiuxiang Gu , Ani Nenkova

Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling. Following the human perception process, where the scene is effectively understood…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Kashu Yamazaki , Khoa Vo , Sang Truong , Bhiksha Raj , Ngan Le

The rapid proliferation of video content across various platforms has highlighted the urgent need for advanced video retrieval systems. Traditional methods, which primarily depend on directly matching textual queries with video metadata,…

Information Retrieval · Computer Science 2025-10-10 Peyang Liu , Xi Wang , Ziqiang Cui , Wei Ye

Visual texts embedded in videos carry rich semantic information, which is crucial for both holistic video understanding and fine-grained reasoning about local human actions. However, existing video understanding benchmarks largely overlook…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Zhoufaran Yang , Yan Shu , Jing Wang , Zhifei Yang , Yan Zhang , Yu Li , Keyang Lu , Gangyan Zeng , Shaohui Liu , Yu Zhou , Nicu Sebe

Suppose that we are given a set of videos, along with natural language descriptions in the form of multiple sentences (e.g., manual annotations, movie scripts, sport summaries etc.), and that these sentences appear in the same temporal…

Computer Vision and Pattern Recognition · Computer Science 2015-12-22 Piotr Bojanowski , Rémi Lajugie , Edouard Grave , Francis Bach , Ivan Laptev , Jean Ponce , Cordelia Schmid

Image-text matching has been a hot research topic bridging the vision and language areas. It remains challenging because the current representation of image usually lacks global semantic concepts as in its corresponding text caption. To…

Computer Vision and Pattern Recognition · Computer Science 2019-09-09 Kunpeng Li , Yulun Zhang , Kai Li , Yuanyuan Li , Yun Fu

Describing visual data into natural language is a very challenging task, at the intersection of computer vision, natural language processing and machine learning. Language goes well beyond the description of physical objects and their…

Computer Vision and Pattern Recognition · Computer Science 2020-05-26 Iulia Duta , Andrei Liviu Nicolicioiu , Simion-Vlad Bogolin , Marius Leordeanu

Sequential video understanding, as an emerging video understanding task, has driven lots of researchers' attention because of its goal-oriented nature. This paper studies weakly supervised sequential video understanding where the accurate…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Sixun Dong , Huazhang Hu , Dongze Lian , Weixin Luo , Yicheng Qian , Shenghua Gao

Visual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are…

Machine Learning · Computer Science 2018-09-05 Zhengyang Wang , Shuiwang Ji
‹ Prev 1 2 3 10 Next ›