Related papers: Sequence Level Semantics Aggregation for Video Obj…

Practical Video Object Detection via Feature Selection and Aggregation

Compared with still image object detection, video object detection (VOD) needs to particularly concern the high across-frame variation in object appearance, and the diverse deterioration in some frames. In principle, the detection in a…

Computer Vision and Pattern Recognition · Computer Science 2024-07-30 Yuheng Shi , Tong Zhang , Xiaojie Guo

DFA: Dynamic Feature Aggregation for Efficient Video Object Detection

Video object detection is a fundamental yet challenging task in computer vision. One practical solution is to take advantage of temporal information from the video and apply feature aggregation to enhance the object features in each frame.…

Computer Vision and Pattern Recognition · Computer Science 2022-10-04 Yiming Cui

Object Detection in Video with Spatial-temporal Context Aggregation

Recent cutting-edge feature aggregation paradigms for video object detection rely on inferring feature correspondence. The feature correspondence estimation problem is fundamentally difficult due to poor image quality, motion blur, etc, and…

Computer Vision and Pattern Recognition · Computer Science 2019-07-12 Hao Luo , Lichao Huang , Han Shen , Yuan Li , Chang Huang , Xinggang Wang

Flow-Guided Feature Aggregation for Video Object Detection

Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc. Existing work attempts to…

Computer Vision and Pattern Recognition · Computer Science 2017-08-21 Xizhou Zhu , Yujie Wang , Jifeng Dai , Lu Yuan , Yichen Wei

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation

Video object detection is a fundamental problem in computer vision and has a wide spectrum of applications. Based on deep networks, video object detection is actively studied for pushing the limits of detection speed and accuracy. To reduce…

Computer Vision and Pattern Recognition · Computer Science 2021-03-29 Xinggang Wang , Zhaojin Huang , Bencheng Liao , Lichao Huang , Yongchao Gong , Chang Huang

Object-aware Feature Aggregation for Video Object Detection

We present an Object-aware Feature Aggregation (OFA) module for video object detection (VID). Our approach is motivated by the intriguing property that video-level object-aware knowledge can be employed as a powerful semantic prior to help…

Computer Vision and Pattern Recognition · Computer Science 2020-10-26 Qichuan Geng , Hong Zhang , Na Jiang , Xiaojuan Qi , Liangjun Zhang , Zhong Zhou

SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving

Visual-based perception is the key module for autonomous driving. Among those visual perception tasks, video object detection is a primary yet challenging one because of feature degradation caused by fast motion or multiple poses. Current…

Computer Vision and Pattern Recognition · Computer Science 2024-05-30 Yiming Cui , Cheng Han , Dongfang Liu

Identity-Consistent Aggregation for Video Object Detection

In Video Object Detection (VID), a common practice is to leverage the rich temporal contexts from the video to enhance the object representations in each frame. Existing methods treat the temporal contexts obtained from different objects…

Computer Vision and Pattern Recognition · Computer Science 2023-08-16 Chaorui Deng , Da Chen , Qi Wu

Surgical Skill Assessment via Video Semantic Aggregation

Automated video-based assessment of surgical skills is a promising task in assisting young surgical trainees, especially in poor-resource areas. Existing works often resort to a CNN-LSTM joint framework that models long-term relationships…

Computer Vision and Pattern Recognition · Computer Science 2022-08-05 Zhenqiang Li , Lin Gu , Weimin Wang , Ryosuke Nakamura , Yoichi Sato

FAQ: Feature Aggregated Queries for Transformer-based Video Object Detectors

Video object detection needs to solve feature degradation situations that rarely happen in the image domain. One solution is to use the temporal information and fuse the features from the neighboring frames. With Transformerbased object…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Yiming Cui , Linjie Yang

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Yang Fu , Linjie Yang , Ding Liu , Thomas S. Huang , Humphrey Shi

Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection

Current video object detection (VOD) models often encounter issues with over-aggregation due to redundant aggregation strategies, which perform feature aggregation on every frame. This results in suboptimal performance and increased…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Bingqing Zhang , Sen Wang , Yifan Liu , Brano Kusy , Xue Li , Jiajun Liu

Single Shot Video Object Detector

Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos. Nevertheless, the extension of such object detectors from image to video is not trivial…

Computer Vision and Pattern Recognition · Computer Science 2020-07-08 Jiajun Deng , Yingwei Pan , Ting Yao , Wengang Zhou , Houqiang Li , Tao Mei

Learning Where to Focus for Efficient Video Object Detection

Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur. Previous approaches exploit to propagate and aggregate features across…

Computer Vision and Pattern Recognition · Computer Science 2020-07-17 Zhengkai Jiang , Yu Liu , Ceyuan Yang , Jihao Liu , Peng Gao , Qian Zhang , Shiming Xiang , Chunhong Pan

Context Sensing Attention Network for Video-based Person Re-identification

Video-based person re-identification (ReID) is challenging due to the presence of various interferences in video frames. Recent approaches handle this problem using temporal aggregation strategies. In this work, we propose a novel Context…

Computer Vision and Pattern Recognition · Computer Science 2022-07-07 Kan Wang , Changxing Ding , Jianxin Pang , Xiangmin Xu

Seq-NMS for Video Object Detection

Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip. Recently, there have been major advances for doing object detection in a single…

Computer Vision and Pattern Recognition · Computer Science 2016-08-24 Wei Han , Pooya Khorrami , Tom Le Paine , Prajit Ramachandran , Mohammad Babaeizadeh , Honghui Shi , Jianan Li , Shuicheng Yan , Thomas S. Huang

YOLOV: Making Still Image Object Detectors Great at Video Object Detection

Video object detection (VID) is challenging because of the high variation of object appearance as well as the diverse deterioration in some frames. On the positive side, the detection in a certain frame of a video, compared with that in a…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Yuheng Shi , Naiyan Wang , Xiaojie Guo

Feature Aggregation Network for Video Face Recognition

This paper aims to learn a compact representation of a video for video face recognition task. We make the following contributions: first, we propose a meta attention-based aggregation scheme which adaptively and fine-grained weighs the…

Computer Vision and Pattern Recognition · Computer Science 2019-09-13 Zhaoxiang Liu , Huan Hu , Jinqiang Bai , Shaohua Li , Shiguo Lian

RN-VID: A Feature Fusion Architecture for Video Object Detection

Consecutive frames in a video are highly redundant. Therefore, to perform the task of video object detection, executing single frame detectors on every frame without reusing any information is quite wasteful. It is with this idea in mind…

Computer Vision and Pattern Recognition · Computer Science 2020-04-03 Hughes Perreault , Maguelonne Héritier , Pierre Gravel , Guillaume-Alexandre Bilodeau , Nicolas Saunier

Memory Enhanced Global-Local Aggregation for Video Object Detection

How do humans recognize an object in a piece of video? Due to the deteriorated quality of single frame, it may be hard for people to identify an occluded object in this frame by just utilizing information within one image. We argue that…

Computer Vision and Pattern Recognition · Computer Science 2020-03-27 Yihong Chen , Yue Cao , Han Hu , Liwei Wang