Related papers: Gradient Frequency Modulation for Visually Explain…

Explain EEG-based End-to-end Deep Learning Models in the Frequency Domain

The recent rise of EEG-based end-to-end deep learning models presents a significant challenge in elucidating how these models process raw EEG signals and generate predictions in the frequency domain. This challenge limits the transparency…

Signal Processing · Electrical Eng. & Systems 2024-07-26 Hanqi Wang , Kun Yang , Jingyu Zhang , Tao Chen , Liang Song

Diffusion-aided Extreme Video Compression with Lightweight Semantics Guidance

Modern video codecs and learning-based approaches struggle for semantic reconstruction at extremely low bit-rates due to reliance on low-level spatiotemporal redundancies. Generative models, especially diffusion models, offer a new paradigm…

Image and Video Processing · Electrical Eng. & Systems 2026-02-06 Maojun Zhang , Haotian Wu , Richeng Jin , Deniz Gunduz , Krystian Mikolajczyk

Extreme Video Compression with Pre-trained Diffusion Models

Diffusion models have achieved remarkable success in generating high quality image and video data. More recently, they have also been used for image compression with high perceptual quality. In this paper, we present a novel approach to…

Image and Video Processing · Electrical Eng. & Systems 2024-02-15 Bohan Li , Yiming Liu , Xueyan Niu , Bo Bai , Lei Deng , Deniz Gündüz

Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition

Training robust deep video representations has proven to be computationally challenging due to substantial decoding overheads, the enormous size of raw video streams, and their inherent high temporal redundancy. Different from existing…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Shristi Das Biswas , Efstathia Soufleri , Arani Roy , Kaushik Roy

ST-ABN: Visual Explanation Taking into Account Spatio-temporal Information for Video Recognition

It is difficult for people to interpret the decision-making in the inference process of deep neural networks. Visual explanation is one method for interpreting the decision-making of deep learning. It analyzes the decision-making of 2D CNNs…

Computer Vision and Pattern Recognition · Computer Science 2021-11-01 Masahiro Mitsuhara , Tsubasa Hirakawa , Takayoshi Yamashita , Hironobu Fujiyoshi

Fusion-CAM: Integrating Gradient and Region-Based Class Activation Maps for Robust Visual Explanations

Interpreting the decision-making process of deep convolutional neural networks remains a central challenge in achieving trustworthy and transparent artificial intelligence. Explainable AI (XAI) techniques, particularly Class Activation Map…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Hajar Dekdegue , Moncef Garouani , Josiane Mothe , Jordan Bernigaud

Rethinking Saliency Map: An Context-aware Perturbation Method to Explain EEG-based Deep Learning Model

Deep learning is widely used to decode the electroencephalogram (EEG) signal. However, there are few attempts to specifically investigate how to explain the EEG-based deep learning models. We conduct a review to summarize the existing works…

Machine Learning · Computer Science 2022-05-31 Hanqi Wang , Xiaoguang Zhu , Tao Chen , Chengfang Li , Liang Song

Deep Generative Video Compression

The usage of deep generative models for image compression has led to impressive performance gains over classical codecs while neural video compression is still in its infancy. Here, we propose an end-to-end, deep generative modeling…

Computer Vision and Pattern Recognition · Computer Science 2019-11-05 Jun Han , Salvator Lombardo , Christopher Schroers , Stephan Mandt

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

Recent video recognition models utilize Transformer models for long-range spatio-temporal context modeling. Video transformer designs are based on self-attention that can model global context at a high computational cost. In comparison,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-30 Syed Talal Wasim , Muhammad Uzair Khattak , Muzammal Naseer , Salman Khan , Mubarak Shah , Fahad Shahbaz Khan

Attention Distillation for Learning Video Representations

We address the challenging problem of learning motion representations using deep models for video recognition. To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for…

Computer Vision and Pattern Recognition · Computer Science 2020-08-18 Miao Liu , Xin Chen , Yun Zhang , Yin Li , James M. Rehg

Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos

In this paper, we address the problem of referring expression comprehension in videos, which is challenging due to complex expression and scene dynamics. Unlike previous methods which solve the problem in multiple stages (i.e., tracking,…

Computer Vision and Pattern Recognition · Computer Science 2021-03-24 Sijie Song , Xudong Lin , Jiaying Liu , Zongming Guo , Shih-Fu Chang

Modelling Temporal Information Using Discrete Fourier Transform for Recognizing Emotions in User-generated Videos

With the widespread of user-generated Internet videos, emotion recognition in those videos attracts increasing research efforts. However, most existing works are based on framelevel visual features and/or audio features, which might fail to…

Computer Vision and Pattern Recognition · Computer Science 2016-08-04 Haimin Zhang , Min Xu

Insights from Generative Modeling for Neural Video Compression

While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view…

Image and Video Processing · Electrical Eng. & Systems 2024-10-28 Ruihan Yang , Yibo Yang , Joseph Marino , Stephan Mandt

Deep Time-Frequency Representation and Progressive Decision Fusion for ECG Classification

Early recognition of abnormal rhythms in ECG signals is crucial for monitoring and diagnosing patients' cardiac conditions, increasing the success rate of the treatment. Classifying abnormal rhythms into exact categories is very challenging…

Machine Learning · Computer Science 2019-12-18 Jing Zhang , Jing Tian , Yang Cao , Yuxiang Yang , Xiaobin Xu

Modelling Temporal Information Using Discrete Fourier Transform for Video Classification

Recently, video classification attracts intensive research efforts. However, most existing works are based on framelevel visual features, which might fail to model the temporal information, e.g. characteristics accumulated along time. In…

Computer Vision and Pattern Recognition · Computer Science 2016-08-18 Haimin Zhang

Don't Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis

A variety of methods have been proposed to try to explain how deep neural networks make their decisions. Key to those approaches is the need to sample the pixel space efficiently in order to derive importance maps. However, it has been…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Thomas Fel , Melanie Ducoffe , David Vigouroux , Remi Cadene , Mikael Capelle , Claire Nicodeme , Thomas Serre

Towards Visually Explaining Video Understanding Networks with Perturbation

''Making black box models explainable'' is a vital problem that accompanies the development of deep learning networks. For networks taking visual information as input, one basic but challenging explanation method is to identify and…

Computer Vision and Pattern Recognition · Computer Science 2020-11-10 Zhenqiang Li , Weimin Wang , Zuoyue Li , Yifei Huang , Yoichi Sato

P-TAME: Explain Any Image Classifier with Trained Perturbations

The adoption of Deep Neural Networks (DNNs) in critical fields where predictions need to be accompanied by justifications is hindered by their inherent black-box nature. In this paper, we introduce P-TAME (Perturbation-based Trainable…

Computer Vision and Pattern Recognition · Computer Science 2025-06-04 Mariano V. Ntrougkas , Vasileios Mezaris , Ioannis Patras

GCD-DDPM: A Generative Change Detection Model Based on Difference-Feature Guided DDPM

Deep learning (DL)-based methods have recently shown great promise in bitemporal change detection (CD). Existing discriminative methods based on Convolutional Neural Networks (CNNs) and Transformers rely on discriminative representation…

Computer Vision and Pattern Recognition · Computer Science 2024-03-05 Yihan Wen , Xianping Ma , Xiaokang Zhang , Man-On Pun

FE-Adapter: Adapting Image-based Emotion Classifiers to Videos

Utilizing large pre-trained models for specific tasks has yielded impressive results. However, fully fine-tuning these increasingly large models is becoming prohibitively resource-intensive. This has led to a focus on more…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Shreyank N Gowda , Boyan Gao , David A. Clifton