Related papers: Efficient Video Classification Using Fewer Frames

I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames

Over the past few years, various tasks involving videos such as classification, description, summarization and question answering have received a lot of attention. Current models for these tasks compute an encoding of the video by treating…

Computer Vision and Pattern Recognition · Computer Science 2018-05-15 Shweta Bhardwaj , Mitesh M. Khapra

FASTER Recurrent Networks for Efficient Video Classification

Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips…

Computer Vision and Pattern Recognition · Computer Science 2019-09-10 Linchao Zhu , Laura Sevilla-Lara , Du Tran , Matt Feiszli , Yi Yang , Heng Wang

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

Recent incremental learning for action recognition usually stores representative videos to mitigate catastrophic forgetting. However, only a few bulky videos can be stored due to the limited memory. To address this problem, we propose…

Computer Vision and Pattern Recognition · Computer Science 2022-11-03 Yixuan Pei , Zhiwu Qing , Jun Cen , Xiang Wang , Shiwei Zhang , Yaxiong Wang , Mingqian Tang , Nong Sang , Xueming Qian

OCSampler: Compressing Videos to One Clip with Single-step Sampling

In this paper, we propose a framework named OCSampler to explore a compact yet effective video representation with one short clip for efficient video recognition. Recent works prefer to formulate frame sampling as a sequential decision task…

Computer Vision and Pattern Recognition · Computer Science 2022-01-13 Jintao Lin , Haodong Duan , Kai Chen , Dahua Lin , Limin Wang

Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval

Previous Knowledge Distillation based efficient image retrieval methods employs a lightweight network as the student model for fast inference. However, the lightweight student model lacks adequate representation capacity for effective…

Computer Vision and Pattern Recognition · Computer Science 2023-10-06 Yi Xie , Huaidong Zhang , Xuemiao Xu , Jianqing Zhu , Shengfeng He

Training compact deep learning models for video classification using circulant matrices

In real world scenarios, model accuracy is hardly the only factor to consider. Large models consume more memory and are computationally more intensive, which makes them difficult to train and to deploy, especially on mobile devices. In this…

Computer Vision and Pattern Recognition · Computer Science 2018-10-09 Alexandre Araujo , Benjamin Negrevergne , Yann Chevaleyre , Jamal Atif

More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation

Current state-of-the-art models for video action recognition are mostly based on expensive 3D ConvNets. This results in a need for large GPU clusters to train and evaluate such architectures. To address this problem, we present a…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Quanfu Fan , Chun-Fu Chen , Hilde Kuehne , Marco Pistoia , David Cox

Efficient Large Scale Video Classification

Video classification has advanced tremendously over the recent years. A large part of the improvements in video classification had to do with the work done by the image classification community and the use of deep convolutional networks…

Computer Vision and Pattern Recognition · Computer Science 2015-05-26 Balakrishnan Varadarajan , George Toderici , Sudheendra Vijayanarasimhan , Apostol Natsev

Feature Aggregation Network for Video Face Recognition

This paper aims to learn a compact representation of a video for video face recognition task. We make the following contributions: first, we propose a meta attention-based aggregation scheme which adaptively and fine-grained weighs the…

Computer Vision and Pattern Recognition · Computer Science 2019-09-13 Zhaoxiang Liu , Huan Hu , Jinqiang Bai , Shaohua Li , Shiguo Lian

Online Model Distillation for Efficient Video Inference

High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of…

Computer Vision and Pattern Recognition · Computer Science 2020-01-29 Ravi Teja Mullapudi , Steven Chen , Keyi Zhang , Deva Ramanan , Kayvon Fatahalian

CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers

Contrastive Language-Image Pre-training (CLIP) has been shown to improve zero-shot generalization capabilities of language and vision models. In this paper, we extend CLIP for efficient knowledge distillation, by utilizing embeddings as…

Machine Learning · Computer Science 2024-09-02 Lakshmi Nair

Learning Metrics from Teachers: Compact Networks for Image Embedding

Metric learning networks are used to compute image embeddings, which are widely used in many applications such as image retrieval and face recognition. In this paper, we propose to use network distillation to efficiently compute image…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Lu Yu , Vacit Oguz Yazici , Xialei Liu , Joost van de Weijer , Yongmei Cheng , Arnau Ramisa

Camera clustering for scalable stream-based active distillation

We present a scalable framework designed to craft efficient lightweight models for video object detection utilizing self-training and knowledge distillation techniques. We scrutinize methodologies for the ideal selection of training images…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Dani Manjah , Davide Cacciarelli , Christophe De Vleeschouwer , Benoit Macq

I Spy With My Little Eye: A Minimum Cost Multicut Investigation of Dataset Frames

Visual framing analysis is a key method in social sciences for determining common themes and concepts in a given discourse. To reduce manual effort, image clustering can significantly speed up the annotation process. In this work, we phrase…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Katharina Prasse , Isaac Bravo , Stefanie Walter , Margret Keuper

CenterCLIP: Token Clustering for Efficient Text-Video Retrieval

Recently, large-scale pre-training methods like CLIP have made great progress in multi-modal research such as text-video retrieval. In CLIP, transformers are vital for modeling complex multi-modal relations. However, in the vision…

Computer Vision and Pattern Recognition · Computer Science 2022-05-03 Shuai Zhao , Linchao Zhu , Xiaohan Wang , Yi Yang

TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval

For text-to-video retrieval (T2VR), which aims to retrieve unlabeled videos by ad-hoc textual queries, CLIP-based methods are dominating. Compared to CLIP4Clip which is efficient and compact, the state-of-the-art models tend to compute…

Computer Vision and Pattern Recognition · Computer Science 2023-08-03 Kaibin Tian , Ruixiang Zhao , Hu Hu , Runquan Xie , Fengzong Lian , Zhanhui Kang , Xirong Li

Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification

Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially when an immense volume of video content is being constantly generated. Traditional methods require…

Computer Vision and Pattern Recognition · Computer Science 2024-03-14 Yuxing Han , Yunan Ding , Chen Ye Gan , Jiangtao Wen

Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling

The majority of online continual learning (CL) advocates single-epoch training and imposes restrictions on the size of replay memory. However, single-epoch training would incur a different amount of computations per CL algorithm, and the…

Machine Learning · Computer Science 2025-03-18 Minhyuk Seo , Hyunseo Koh , Jonghyun Choi

FrameExit: Conditional Early Exiting for Efficient Video Recognition

In this paper, we propose a conditional early exiting framework for efficient video recognition. While existing works focus on selecting a subset of salient frames to reduce the computation costs, we propose to use a simple sampling…

Computer Vision and Pattern Recognition · Computer Science 2021-04-29 Amir Ghodrati , Babak Ehteshami Bejnordi , Amirhossein Habibian

Efficient Adaptive Ensembling for Image Classification

In recent times, with the exception of sporadic cases, the trend in Computer Vision is to achieve minor improvements compared to considerable increases in complexity. To reverse this trend, we propose a novel method to boost image…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Antonio Bruno , Davide Moroni , Massimo Martinelli