Related papers: Universal-to-Specific Framework for Complex Action…

Feature Hallucination for Self-supervised Action Recognition

Understanding human actions in videos requires more than raw pixel analysis; it relies on high-level semantic reasoning and effective integration of multimodal features. We propose a deep translational action recognition framework that…

Computer Vision and Pattern Recognition · Computer Science 2025-06-26 Lei Wang , Piotr Koniusz

Your "Attention" Deserves Attention: A Self-Diversified Multi-Channel Attention for Facial Action Analysis

Visual attention has been extensively studied for learning fine-grained features in both facial expression recognition (FER) and Action Unit (AU) detection. A broad range of previous research has explored how to use attention modules to…

Computer Vision and Pattern Recognition · Computer Science 2022-03-24 Xiaotian Li , Zhihua Li , Huiyuan Yang , Geran Zhao , Lijun Yin

Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion

Action recognition is an important yet challenging task in computer vision. In this paper, we propose a novel deep-based framework for action recognition, which improves the recognition accuracy by: 1) deriving more precise features for…

Computer Vision and Pattern Recognition · Computer Science 2017-11-21 Weiyao Lin , Yang Mi , Jianxin Wu , Ke Lu , Hongkai Xiong

Towards Universal Object Detection by Domain Attention

Despite increasing efforts on universal representations for visual recognition, few have addressed object detection. In this paper, we develop an effective and efficient universal object detection system that is capable of working on…

Computer Vision and Pattern Recognition · Computer Science 2019-07-09 Xudong Wang , Zhaowei Cai , Dashan Gao , Nuno Vasconcelos

Mask2Anomaly: Mask Transformer for Universal Open-set Segmentation

Segmenting unknown or anomalous object instances is a critical task in autonomous driving applications, and it is approached traditionally as a per-pixel classification problem. However, reasoning individually about each pixel without…

Computer Vision and Pattern Recognition · Computer Science 2023-09-14 Shyam Nandan Rai , Fabio Cermelli , Barbara Caputo , Carlo Masone

Deep Action- and Context-Aware Sequence Learning for Activity Recognition and Anticipation

Action recognition and anticipation are key to the success of many computer vision applications. Existing methods can roughly be grouped into those that extract global, context-aware representations of the entire image or sequence, and…

Computer Vision and Pattern Recognition · Computer Science 2016-11-21 Mohammad Sadegh Aliakbarian , Fatemehsadat Saleh , Basura Fernando , Mathieu Salzmann , Lars Petersson , Lars Andersson

Video-based Contrastive Learning on Decision Trees: from Action Recognition to Autism Diagnosis

How can we teach a computer to recognize 10,000 different actions? Deep learning has evolved from supervised and unsupervised to self-supervised approaches. In this paper, we present a new contrastive learning-based framework for decision…

Computer Vision and Pattern Recognition · Computer Science 2023-04-24 Mindi Ruan , Xiangxu Yu , Na Zhang , Chuanbo Hu , Shuo Wang , Xin Li

Application-Driven AI Paradigm for Human Action Recognition

Human action recognition in computer vision has been widely studied in recent years. However, most algorithms consider only certain action specially with even high computational cost. That is not suitable for practical applications with…

Computer Vision and Pattern Recognition · Computer Science 2022-10-03 Zezhou Chen , Yajie Cui , Kaikai Zhao , Zhaoxiang Liu , Shiguo Lian

Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training

Recently, much progress has been made for self-supervised action recognition. Most existing approaches emphasize the contrastive relations among videos, including appearance and motion consistency. However, two main issues remain for…

Computer Vision and Pattern Recognition · Computer Science 2022-04-28 Guanhong Wang , Keyu Lu , Yang Zhou , Zhanhao He , Gaoang Wang

Two-Stream Convolutional Networks for Action Recognition in Videos

We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between…

Computer Vision and Pattern Recognition · Computer Science 2014-11-13 Karen Simonyan , Andrew Zisserman

UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

Vision-and-language pre-training has achieved impressive success in learning multimodal representations between vision and language. To generalize this success to non-English languages, we introduce UC2, the first machine…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Mingyang Zhou , Luowei Zhou , Shuohang Wang , Yu Cheng , Linjie Li , Zhou Yu , Jingjing Liu

Facial Action Unit Detection Using Attention and Relation Learning

Attention mechanism has recently attracted increasing attentions in the field of facial action unit (AU) detection. By finding the region of interest of each AU with the attention mechanism, AU-related local features can be captured. Most…

Computer Vision and Pattern Recognition · Computer Science 2019-10-21 Zhiwen Shao , Zhilei Liu , Jianfei Cai , Yunsheng Wu , Lizhuang Ma

Action Recognition using Visual Attention

We propose a soft attention based model for the task of action recognition in videos. We use multi-layered Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units which are deep both spatially and temporally. Our model…

Machine Learning · Computer Science 2016-02-16 Shikhar Sharma , Ryan Kiros , Ruslan Salakhutdinov

Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Video recognition remains an open challenge, requiring the identification of diverse content categories within videos. Mainstream approaches often perform flat classification, overlooking the intrinsic hierarchical structure relating…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Rui Zhang , Shuailong Li , Junxiao Xue , Feng Lin , Qing Zhang , Xiao Ma , Xiaoran Yan

An end-to-end multi-scale network for action prediction in videos

In this paper, we develop an efficient multi-scale network to predict action classes in partial videos in an end-to-end manner. Unlike most existing methods with offline feature generation, our method directly takes frames as input and…

Computer Vision and Pattern Recognition · Computer Science 2023-01-04 Xiaofa Liu , Jianqin Yin , Yuan Sun , Zhicheng Zhang , Jin Tang

Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos

We propose an end-to-end learning framework for segmenting generic objects in both images and videos. Given a novel image or video, our approach produces a pixel-level mask for all "object-like" regions---even for object categories never…

Computer Vision and Pattern Recognition · Computer Science 2018-12-19 Bo Xiong , Suyog Dutt Jain , Kristen Grauman

Action Recognition based on Subdivision-Fusion Model

This paper proposes a novel Subdivision-Fusion Model (SFM) to recognize human actions. In most action recognition tasks, overlapping feature distribution is a common problem leading to overfitting. In the subdivision stage of the proposed…

Computer Vision and Pattern Recognition · Computer Science 2015-08-19 Hao Zongbo , Lu Linlin , Zhang Qianni , Wu Jie , Izquierdo Ebroul , Yang Juanyu , Zhao Jun

Action Classification and Highlighting in Videos

Inspired by recent advances in neural machine translation, that jointly align and translate using encoder-decoder networks equipped with attention, we propose an attentionbased LSTM model for human activity recognition. Our model jointly…

Computer Vision and Pattern Recognition · Computer Science 2017-09-01 Atousa Torabi , Leonid Sigal

U2Net: A General Framework with Spatial-Spectral-Integrated Double U-Net for Image Fusion

In image fusion tasks, images obtained from different sources exhibit distinct properties. Consequently, treating them uniformly with a single-branch network can lead to inadequate feature extraction. Additionally, numerous works have…

Image and Video Processing · Electrical Eng. & Systems 2023-10-03 Siran Peng , Chenhao Guo , Xiao Wu , Liang-Jian Deng

Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks

With the development of video understanding, there is a proliferation of tasks for clip-level temporal video analysis, including temporal action detection (TAD), temporal action segmentation (TAS), and generic event boundary detection…

Computer Vision and Pattern Recognition · Computer Science 2024-09-30 Min Yang , Zichen Zhang , Limin Wang