Related papers: Deep Learning Methods for Efficient Large Scale Vi…

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding

This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place. Because the challenge provides pre-extracted visual and audio features instead of…

Computer Vision and Pattern Recognition · Computer Science 2017-07-17 Fu Li , Chuang Gan , Xiao Liu , Yunlong Bian , Xiang Long , Yandong Li , Zhichao Li , Jie Zhou , Shilei Wen

Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset

YouTube-8M is the largest video dataset for multi-label video classification. In order to tackle the multi-label classification on this challenging dataset, it is necessary to solve several issues such as temporal modeling of videos, label…

Computer Vision and Pattern Recognition · Computer Science 2017-07-13 Seil Na , Youngjae Yu , Sangho Lee , Jisung Kim , Gunhee Kim

The YouTube-8M Kaggle Competition: Challenges and Methods

We took part in the YouTube-8M Video Understanding Challenge hosted on Kaggle, and achieved the 10th place within less than one month's time. In this paper, we present an extensive analysis and solution to the underlying machine-learning…

Computer Vision and Pattern Recognition · Computer Science 2017-07-14 Haosheng Zou , Kun Xu , Jialian Li , Jun Zhu

Aggregating Frame-level Features for Large-Scale Video Classification

This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset. We employ a…

Computer Vision and Pattern Recognition · Computer Science 2017-07-05 Shaoxiang Chen , Xi Wang , Yongyi Tang , Xinpeng Chen , Zuxuan Wu , Yu-Gang Jiang

An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform

Large-scale datasets have played a significant role in progress of neural network and deep learning areas. YouTube-8M is such a benchmark dataset for general multi-label video classification. It was created from over 7 million YouTube…

Machine Learning · Statistics 2017-06-27 Zhenzhen Zhong , Shujiao Huang , Cheng Zhan , Licheng Zhang , Zhiwei Xiao , Chang-Chun Wang , Pei Yang

Large-Scale YouTube-8M Video Understanding with Deep Neural Networks

Video classification problem has been studied many years. The success of Convolutional Neural Networks (CNN) in image recognition tasks gives a powerful incentive for researchers to create more advanced video classification approaches. As…

Computer Vision and Pattern Recognition · Computer Science 2017-06-15 Manuk Akopyan , Eshsou Khashba

Multi-attention Networks for Temporal Localization of Video-level Labels

Temporal localization remains an important challenge in video understanding. In this work, we present our solution to the 3rd YouTube-8M Video Understanding Challenge organized by Google Research. Participants were required to build a…

Computer Vision and Pattern Recognition · Computer Science 2019-11-19 Lijun Zhang , Srinath Nizampatnam , Ahana Gangopadhyay , Marcos V. Conde

Label Denoising with Large Ensembles of Heterogeneous Neural Networks

Despite recent advances in computer vision based on various convolutional architectures, video understanding remains an important challenge. In this work, we present and discuss a top solution for the large-scale video classification…

Computer Vision and Pattern Recognition · Computer Science 2019-01-17 Pavel Ostyakov , Elizaveta Logacheva , Roman Suvorov , Vladimir Aliev , Gleb Sterkin , Oleg Khomenko , Sergey I. Nikolenko

The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge

This article describes the final solution of team monkeytyping, who finished in second place in the YouTube-8M video understanding challenge. The dataset used in this challenge is a large-scale benchmark for multi-label video…

Computer Vision and Pattern Recognition · Computer Science 2017-06-19 He-Da Wang , Teng Zhang , Ji Wu

Non-local NetVLAD Encoding for Video Classification

This paper describes our solution for the 2$^\text{nd}$ YouTube-8M video understanding challenge organized by Google AI. Unlike the video recognition benchmarks, such as Kinetics and Moments, the YouTube-8M challenge provides pre-extracted…

Computer Vision and Pattern Recognition · Computer Science 2018-10-02 Yongyi Tang , Xing Zhang , Jingwen Wang , Shaoxiang Chen , Lin Ma , Yu-Gang Jiang

YouTube-8M: A Large-Scale Video Classification Benchmark

Many recent advancements in Computer Vision are attributed to large datasets. Open-source software packages for Machine Learning and inexpensive commodity hardware have reduced the barrier of entry for exploring novel approaches at scale.…

Computer Vision and Pattern Recognition · Computer Science 2016-09-29 Sami Abu-El-Haija , Nisarg Kothari , Joonseok Lee , Paul Natsev , George Toderici , Balakrishnan Varadarajan , Sudheendra Vijayanarasimhan

YouTube-8M Video Understanding Challenge Approach and Applications

This paper introduces the YouTube-8M Video Understanding Challenge hosted as a Kaggle competition and also describes my approach to experimenting with various models. For each of my experiments, I provide the score result as well as…

Machine Learning · Statistics 2017-06-27 Edward Chen

Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification

We report on CMU Informedia Lab's system used in Google's YouTube 8 Million Video Understanding Challenge. In this multi-label video classification task, our pipeline achieved 84.675% and 84.662% GAP on our evaluation split and the official…

Computer Vision and Pattern Recognition · Computer Science 2017-07-26 Po-Yao Huang , Ye Yuan , Zhenzhong Lan , Lu Jiang , Alexander G. Hauptmann

UTS submission to Google YouTube-8M Challenge 2017

In this paper, we present our solution to Google YouTube-8M Video Classification Challenge 2017. We leveraged both video-level and frame-level features in the submission. For video-level classification, we simply used a 200-mixture Mixture…

Computer Vision and Pattern Recognition · Computer Science 2017-07-14 Linchao Zhu , Yanbin Liu , Yi Yang

Large-scale Video Classification guided by Batch Normalized LSTM Translator

Youtube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it…

Computer Vision and Pattern Recognition · Computer Science 2017-07-14 Jae Hyeon Yoo

Learning to Localize Temporal Events in Large-scale Video Data

We address temporal localization of events in large-scale video data, in the context of the Youtube-8M Segments dataset. This emerging field within video recognition can enable applications to identify the precise time a specified event…

Computer Vision and Pattern Recognition · Computer Science 2019-10-28 Mikel Bober-Irizar , Miha Skalic , David Austin

Hierarchical Deep Recurrent Architecture for Video Understanding

This paper introduces the system we developed for the Youtube-8M Video Understanding Challenge, in which a large-scale benchmark dataset was used for multi-label video classification. The proposed framework contains hierarchical deep…

Computer Vision and Pattern Recognition · Computer Science 2017-07-12 Luming Tang , Boyang Deng , Haiyu Zhao , Shuai Yi

Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling

This paper presents the Axon AI's solution to the 2nd YouTube-8M Video Understanding Challenge, achieving the final global average precision (GAP) of 88.733% on the private test set (ranked 3rd among 394 teams, not considering the model…

Computer Vision and Pattern Recognition · Computer Science 2018-09-24 Choongyeun Cho , Benjamin Antin , Sanchit Arora , Shwan Ashrafi , Peilin Duan , Dang The Huynh , Lee James , Hang Tuan Nguyen , Mojtaba Solgi , Cuong Van Than

Deep Multimodal Learning: An Effective Method for Video Classification

Videos have become ubiquitous on the Internet. And video analysis can provide lots of information for detecting and recognizing objects as well as help people understand human actions and interactions with the real world. However, facing…

Computer Vision and Pattern Recognition · Computer Science 2018-12-03 Tianqi Zhao

Cultivating DNN Diversity for Large Scale Video Labelling

We investigate factors controlling DNN diversity in the context of the Google Cloud and YouTube-8M Video Understanding Challenge. While it is well-known that ensemble methods improve prediction performance, and that combining accurate but…

Computer Vision and Pattern Recognition · Computer Science 2017-07-17 Mikel Bober-Irizar , Sameed Husain , Eng-Jon Ong , Miroslaw Bober