Related papers: DenseImage Network: Video Spatial-Temporal Evoluti…

Evolving Space-Time Neural Architectures for Videos

We present a new method for finding video CNN architectures that capture rich spatio-temporal information in videos. Previous work, taking advantage of 3D convolutions, obtained promising results by manually designing video CNN…

Computer Vision and Pattern Recognition · Computer Science 2019-08-22 AJ Piergiovanni , Anelia Angelova , Alexander Toshev , Michael S. Ryoo

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

Despite the success of deep learning for static image understanding, it remains unclear what are the most effective network architectures for the spatial-temporal modeling in videos. In this paper, in contrast to the existing CNN+RNN or…

Computer Vision and Pattern Recognition · Computer Science 2018-12-12 Dongliang He , Zhichao Zhou , Chuang Gan , Fu Li , Xiao Liu , Yandong Li , Limin Wang , Shilei Wen

Deep Temporal Linear Encoding Networks

The CNN-encoding of features from entire videos for the representation of human actions has rarely been addressed. Instead, CNN work has focused on approaches to fuse spatial and temporal networks, but these were typically limited to…

Computer Vision and Pattern Recognition · Computer Science 2016-11-22 Ali Diba , Vivek Sharma , Luc Van Gool

Describing Videos by Exploiting Temporal Structure

Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic…

Machine Learning · Statistics 2015-10-02 Li Yao , Atousa Torabi , Kyunghyun Cho , Nicolas Ballas , Christopher Pal , Hugo Larochelle , Aaron Courville

Action Recognition with Dynamic Image Networks

We introduce the concept of "dynamic image", a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or…

Computer Vision and Pattern Recognition · Computer Science 2017-08-22 Hakan Bilen , Basura Fernando , Efstratios Gavves , Andrea Vedaldi

Dense Interaction Learning for Video-based Person Re-identification

Video-based person re-identification (re-ID) aims at matching the same person across video clips. Efficiently exploiting multi-scale fine-grained features while building the structural interaction among them is pivotal for its success. In…

Computer Vision and Pattern Recognition · Computer Science 2021-08-18 Tianyu He , Xin Jin , Xu Shen , Jianqiang Huang , Zhibo Chen , Xian-Sheng Hua

Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision

Convolutional neural network (CNN) driven by image recognition has been shown to be able to explain cortical responses to static pictures at ventral-stream areas. Here, we further showed that such CNN could reliably predict and decode…

Neurons and Cognition · Quantitative Biology 2017-11-15 Haiguang Wen , Junxing Shi , Yizhen Zhang , Kun-Han Lu , Jiayue Cao , Zhongming Liu

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification. Three main challenges exist including…

Computer Vision and Pattern Recognition · Computer Science 2018-07-30 Saining Xie , Chen Sun , Jonathan Huang , Zhuowen Tu , Kevin Murphy

Action Recognition for Depth Video using Multi-view Dynamic Images

Dynamic imaging is a recently proposed action description paradigm for simultaneously capturing motion and temporal evolution information, particularly in the context of deep convolutional neural networks (CNNs). Compared with optical flow…

Computer Vision and Pattern Recognition · Computer Science 2018-12-31 Yang Xiao , Jun Chen , Yancheng Wang , Zhiguo Cao , Joey Tianyi Zhou , Xiang Bai

Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation

This paper presents a deep learning framework for medical video segmentation. Convolution neural network (CNN) and transformer-based methods have achieved great milestones in medical image segmentation tasks due to their incredible semantic…

Computer Vision and Pattern Recognition · Computer Science 2024-02-13 Chengxi Zeng , Xinyu Yang , David Smithard , Majid Mirmehdi , Alberto M Gambaruto , Tilo Burghardt

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding

In recent years, the introduction of Multi-modal Large Language Models (MLLMs) into video understanding tasks has become increasingly prevalent. However, how to effectively integrate temporal information remains a critical research focus.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-22 Xiaoyi Bao , Chenwei Xie , Hao Tang , Tingyu Weng , Xiaofeng Wang , Yun Zheng , Xingang Wang

Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks

Facial expressions are one of the most powerful ways for depicting specific patterns in human behavior and describing human emotional state. Despite the impressive advances of affective computing over the last decade, automatic video-based…

Computer Vision and Pattern Recognition · Computer Science 2021-01-18 Thomas Teixeira , Eric Granger , Alessandro Lameiras Koerich

STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very…

Computer Vision and Pattern Recognition · Computer Science 2016-09-05 Mohsen Fayyaz , Mohammad Hajizadeh Saffar , Mohammad Sabokrou , Mahmood Fathy , Reinhard Klette , Fay Huang

Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations

Spatially dense self-supervised learning is a rapidly growing problem domain with promising applications for unsupervised segmentation and pretraining for dense downstream tasks. Despite the abundance of temporal data in the form of videos,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-24 Mohammadreza Salehi , Efstratios Gavves , Cees G. M. Snoek , Yuki M. Asano

Comparison of Spatiotemporal Networks for Learning Video Related Tasks

Many methods for learning from video sequences involve temporally processing 2D CNN features from the individual frames or directly utilizing 3D convolutions within high-performing 2D CNN architectures. The focus typically remains on how to…

Computer Vision and Pattern Recognition · Computer Science 2020-09-17 Logan Courtney , Ramavarapu Sreenivas

DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking

Deep neural networks, albeit their great success on feature learning in various computer vision tasks, are usually considered as impractical for online visual tracking because they require very long training time and a large number of…

Computer Vision and Pattern Recognition · Computer Science 2016-05-04 Hanxi Li , Yi Li , Fatih Porikli

In Defense of Image Pre-Training for Spatiotemporal Recognition

Image pre-training, the current de-facto paradigm for a wide range of visual tasks, is generally less favored in the field of video recognition. By contrast, a common strategy is to directly train with spatiotemporal convolutional neural…

Computer Vision and Pattern Recognition · Computer Science 2022-08-03 Xianhang Li , Huiyu Wang , Chen Wei , Jieru Mei , Alan Yuille , Yuyin Zhou , Cihang Xie

Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer

Video denoising aims to recover high-quality frames from the noisy video. While most existing approaches adopt convolutional neural networks~(CNNs) to separate the noise from the original visual content, however, CNNs focus on local…

Computer Vision and Pattern Recognition · Computer Science 2023-01-18 Wulian Yun , Mengshi Qi , Chuanming Wang , Huiyuan Fu , Huadong Ma

ViDeNN: Deep Blind Video Denoising

We propose ViDeNN: a CNN for Video Denoising without prior knowledge on the noise distribution (blind denoising). The CNN architecture uses a combination of spatial and temporal filtering, learning to spatially denoise the frames first and…

Computer Vision and Pattern Recognition · Computer Science 2019-04-25 Michele Claus , Jan van Gemert

Beyond Short Snippets: Deep Networks for Video Classification

Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep…

Computer Vision and Pattern Recognition · Computer Science 2015-04-14 Joe Yue-Hei Ng , Matthew Hausknecht , Sudheendra Vijayanarasimhan , Oriol Vinyals , Rajat Monga , George Toderici