Related papers: Sentence Directed Video Object Codetection

Object Detection in Videos by High Quality Object Linking

Compared with object detection in static images, object detection in videos is more challenging due to degraded image qualities. An effective way to address this problem is to exploit temporal contexts by linking the same object across…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Peng Tang , Chunyu Wang , Xinggang Wang , Wenyu Liu , Wenjun Zeng , Jingdong Wang

Learning to Detect and Retrieve Objects from Unlabeled Videos

Learning an object detector or retrieval requires a large data set with manual annotations. Such data sets are expensive and time consuming to create and therefore difficult to obtain on a large scale. In this work, we propose to exploit…

Computer Vision and Pattern Recognition · Computer Science 2019-10-22 Elad Amrani , Rami Ben-Ari , Tal Hakim , Alex Bronstein

Discover and Learn New Objects from Documentaries

Despite the remarkable progress in recent years, detecting objects in a new context remains a challenging task. Detectors learned from a public dataset can only work with a fixed list of categories, while training from scratch usually…

Computer Vision and Pattern Recognition · Computer Science 2017-08-01 Kai Chen , Hang Song , Chen Change Loy , Dahua Lin

Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction

We study weakly-supervised video object grounding: given a video segment and a corresponding descriptive sentence, the goal is to localize objects that are mentioned from the sentence in the video. During training, no object bounding boxes…

Computer Vision and Pattern Recognition · Computer Science 2018-07-23 Luowei Zhou , Nathan Louis , Jason J. Corso

Submodular video object proposal selection for semantic object segmentation

Learning a data-driven spatio-temporal semantic representation of the objects is the key to coherent and consistent labelling in video. This paper proposes to achieve semantic video object segmentation by learning a data-driven…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Tinghuai Wang

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

We introduce the task of weakly supervised learning for detecting human and object interactions in videos. Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the…

Computer Vision and Pattern Recognition · Computer Science 2021-10-08 Shuang Li , Yilun Du , Antonio Torralba , Josef Sivic , Bryan Russell

Learning Object Detection from Captions via Textual Scene Attributes

Object detection is a fundamental task in computer vision, requiring large annotated datasets that are difficult to collect, as annotators need to label objects and their bounding boxes. Thus, it is a significant challenge to use cheaper…

Computer Vision and Pattern Recognition · Computer Science 2020-10-01 Achiya Jerbi , Roei Herzig , Jonathan Berant , Gal Chechik , Amir Globerson

Small Object Detection using Context and Attention

There are many limitations applying object detection algorithm on various environments. Especially detecting small objects is still challenging because they have low resolution and limited information. We propose an object detection method…

Computer Vision and Pattern Recognition · Computer Science 2019-12-17 Jeong-Seon Lim , Marcella Astrid , Hyun-Jin Yoon , Seung-Ik Lee

Image Conditioned Keyframe-Based Video Summarization Using Object Detection

Video summarization plays an important role in selecting keyframe for understanding a video. Traditionally, it aims to find the most representative and diverse contents (or frames) in a video for short summaries. Recently, query-conditioned…

Computer Vision and Pattern Recognition · Computer Science 2020-09-14 Neeraj Baghel , Suresh C. Raikwar , Charul Bhatnagar

Contrastive Video-Language Segmentation

We focus on the problem of segmenting a certain object referred by a natural language sentence in video content, at the core of formulating a pinpoint vision-language relation. While existing attempts mainly construct such relation in an…

Computer Vision and Pattern Recognition · Computer Science 2021-09-30 Chen Liang , Yawei Luo , Yu Wu , Yi Yang

From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection

The eye-tracking video saliency prediction (VSP) task and video salient object detection (VSOD) task both focus on the most attractive objects in video and show the result in the form of predictive heatmaps and pixel-level saliency masks,…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Qi Qin , Runmin Cong , Gen Zhan , Yiting Liao , Sam Kwong

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation

Video object detection is a fundamental problem in computer vision and has a wide spectrum of applications. Based on deep networks, video object detection is actively studied for pushing the limits of detection speed and accuracy. To reduce…

Computer Vision and Pattern Recognition · Computer Science 2021-03-29 Xinggang Wang , Zhaojin Huang , Bencheng Liao , Lichao Huang , Yongchao Gong , Chang Huang

Attend and Interact: Higher-Order Object Interactions for Video Understanding

Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation…

Computer Vision and Pattern Recognition · Computer Science 2018-03-22 Chih-Yao Ma , Asim Kadav , Iain Melvin , Zsolt Kira , Ghassan AlRegib , Hans Peter Graf

Meta Learning Deep Visual Words for Fast Video Object Segmentation

Personal robots and driverless cars need to be able to operate in novel environments and thus quickly and efficiently learn to recognise new object classes. We address this problem by considering the task of video object segmentation.…

Computer Vision and Pattern Recognition · Computer Science 2020-08-18 Harkirat Singh Behl , Mohammad Najafi , Anurag Arnab , Philip H. S. Torr

Video Captioning Using Weak Annotation

Video captioning has shown impressive progress in recent years. One key reason of the performance improvements made by existing methods lie in massive paired video-sentence data, but collecting such strong annotation, i.e., high-quality…

Computer Vision and Pattern Recognition · Computer Science 2020-09-03 Jingyi Hou , Yunde Jia , Xinxiao wu , Yayun Qi

Context Matters: Refining Object Detection in Video with Recurrent Neural Networks

Given the vast amounts of video available online, and recent breakthroughs in object detection with static images, object detection in video offers a promising new frontier. However, motion blur and compression artifacts cause substantial…

Computer Vision and Pattern Recognition · Computer Science 2016-07-20 Subarna Tripathi , Zachary C. Lipton , Serge Belongie , Truong Nguyen

Activity Driven Weakly Supervised Object Detection

Weakly supervised object detection aims at reducing the amount of supervision required to train detection models. Such models are traditionally learned from images/videos labelled only with the object class and not the object bounding box.…

Computer Vision and Pattern Recognition · Computer Science 2019-04-04 Zhenheng Yang , Dhruv Mahajan , Deepti Ghadiyaram , Ram Nevatia , Vignesh Ramanathan

Enhancing Embodied Object Detection through Language-Image Pre-training and Implicit Object Memory

Deep-learning and large scale language-image training have produced image object detectors that generalise well to diverse environments and semantic classes. However, single-image object detectors trained on internet data are not optimally…

Robotics · Computer Science 2024-02-07 Nicolas Harvey Chapman , Feras Dayoub , Will Browne , Chris Lehnert

Self-supervised Object-Centric Learning for Videos

Unsupervised multi-object segmentation has shown impressive results on images by utilizing powerful semantics learned from self-supervised pretraining. An additional modality such as depth or motion is often used to facilitate the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Görkay Aydemir , Weidi Xie , Fatma Güney

When Few-Shot Learning Meets Video Object Detection

Different from static images, videos contain additional temporal and spatial information for better object detection. However, it is costly to obtain a large number of videos with bounding box annotations that are required for supervised…

Computer Vision and Pattern Recognition · Computer Science 2022-08-19 Zhongjie Yu , Gaoang Wang , Lin Chen , Sebastian Raschka , Jiebo Luo