Related papers: Mutual Modality Learning for Video Action Classifi…

Mixup Helps Understanding Multimodal Video Better

Multimodal video understanding plays a crucial role in tasks such as action recognition and emotion classification by combining information from different modalities. However, multimodal models are prone to overfitting strong modalities,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Xiaoyu Ma , Ding Ding , Hao Chen

Cross-modal Learning for Multi-modal Video Categorization

Multi-modal machine learning (ML) models can process data in multiple modalities (e.g., video, audio, text) and are useful for video content analysis in a variety of problems (e.g., object detection, scene understanding, activity…

Computer Vision and Pattern Recognition · Computer Science 2020-06-09 Palash Goyal , Saurabh Sahu , Shalini Ghosh , Chul Lee

Improving Multi-Modal Learning with Uni-Modal Teachers

Learning multi-modal representations is an essential step towards real-world robotic applications, and various multi-modal fusion models have been developed for this purpose. However, we observe that existing models, whose objectives are…

Machine Learning · Computer Science 2021-06-22 Chenzhuang Du , Tingle Li , Yichen Liu , Zixin Wen , Tianyu Hua , Yue Wang , Hang Zhao

Action Recognition Using Temporal Shift Module and Ensemble Learning

This paper presents the first-rank solution for the Multi-Modal Action Recognition Challenge, part of the Multi-Modal Visual Pattern Recognition Workshop at the \acl{ICPR} 2024. The competition aimed to recognize human actions using a…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Anh-Kiet Duong , Petra Gomez-Krämer

Modality Distillation with Multiple Stream Networks for Action Recognition

Diverse input data modalities can provide complementary cues for several tasks, usually leading to more robust algorithms and better performance. However, while a (training) dataset could be accurately designed to include a variety of…

Computer Vision and Pattern Recognition · Computer Science 2018-10-30 Nuno Garcia , Pietro Morerio , Vittorio Murino

Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

While the field of multi-modal learning keeps growing fast, the deficiency of the standard joint training paradigm has become clear through recent studies. They attribute the sub-optimal performance of the jointly trained model to the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-16 Hong Li , Xingyu Li , Pengbo Hu , Yinuo Lei , Chunxiao Li , Yi Zhou

Continual Learning for Multiple Modalities

Continual learning aims to learn knowledge of tasks observed in sequential time steps while mitigating the forgetting of previously learned knowledge. Existing methods were designed to learn a single modality (e.g., image) over time, which…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Hyundong Jin , Eunwoo Kim

PMR: Prototypical Modal Rebalance for Multimodal Learning

Multimodal learning (MML) aims to jointly exploit the common priors of different modalities to compensate for their inherent limitations. However, existing MML methods often optimize a uniform objective for different modalities, leading to…

Machine Learning · Computer Science 2022-11-15 Yunfeng Fan , Wenchao Xu , Haozhao Wang , Junxiao Wang , Song Guo

Visual Integration of Data and Model Space in Ensemble Learning

Ensembles of classifier models typically deliver superior performance and can outperform single classifier models given a dataset and classification task at hand. However, the gain in performance comes together with the lack in…

Human-Computer Interaction · Computer Science 2017-10-23 Bruno Schneider , Dominik Jäckle , Florian Stoffel , Alexandra Diehl , Johannes Fuchs , Daniel Keim

Improving Multimodal Learning with Multi-Loss Gradient Modulation

Learning from multiple modalities, such as audio and video, offers opportunities for leveraging complementary information, enhancing robustness, and improving contextual understanding and performance. However, combining such modalities…

Multimedia · Computer Science 2024-10-15 Konstantinos Kontras , Christos Chatzichristos , Matthew Blaschko , Maarten De Vos

Towards Modality Generalization: A Benchmark and Prospective Analysis

Multi-modal learning has achieved remarkable success by integrating information from various modalities, achieving superior performance in tasks like recognition and retrieval compared to uni-modal approaches. However, real-world scenarios…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Xiaohao Liu , Xiaobo Xia , Zhuo Huang , See-Kiong Ng , Tat-Seng Chua

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2022-08-19 Nina Shvetsova , Brian Chen , Andrew Rouditchenko , Samuel Thomas , Brian Kingsbury , Rogerio Feris , David Harwath , James Glass , Hilde Kuehne

DMCL: Distillation Multiple Choice Learning for Multimodal Action Recognition

In this work, we address the problem of learning an ensemble of specialist networks using multimodal data, while considering the realistic and challenging scenario of possible missing modalities at test time. Our goal is to leverage the…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Nuno C. Garcia , Sarah Adel Bargal , Vitaly Ablavsky , Pietro Morerio , Vittorio Murino , Stan Sclaroff

Auxiliary Class Based Multiple Choice Learning

The merit of ensemble learning lies in having different outputs from many individual models on a single input, i.e., the diversity of the base models. The high quality of diversity can be achieved when each model is specialized to different…

Machine Learning · Computer Science 2021-12-09 Sihwan Kim , Dae Yon Jung , Taejang Park

Training Multimodal Systems for Classification with Multiple Objectives

We learn about the world from a diverse range of sensory information. Automated systems lack this ability as investigation has centred on processing information presented in a single form. Adapting architectures to learn from multiple…

Machine Learning · Computer Science 2020-10-27 Jason Armitage , Shramana Thakur , Rishi Tripathi , Jens Lehmann , Maria Maleshkova

Enhancing multimodal cooperation via sample-level modality valuation

One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities. However most models often suffer from unsatisfactory multimodal cooperation which cannot jointly utilize all modalities…

Computer Vision and Pattern Recognition · Computer Science 2024-06-17 Yake Wei , Ruoxuan Feng , Zihe Wang , Di Hu

Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos

Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition.…

Computer Vision and Pattern Recognition · Computer Science 2016-12-28 Amir Shahroudy , Tian-Tsong Ng , Yihong Gong , Gang Wang

Modality Mixer for Multi-modal Action Recognition

In multi-modal action recognition, it is important to consider not only the complementary nature of different modalities but also global action content. In this paper, we propose a novel network, named Modality Mixer (M-Mixer) network, to…

Computer Vision and Pattern Recognition · Computer Science 2023-02-22 Sumin Lee , Sangmin Woo , Yeonju Park , Muhammad Adi Nugroho , Changick Kim

What Makes Training Multi-Modal Classification Networks Hard?

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart. In our…

Computer Vision and Pattern Recognition · Computer Science 2020-04-06 Weiyao Wang , Du Tran , Matt Feiszli

Multimodal Guidance Network for Missing-Modality Inference in Content Moderation

Multimodal deep learning, especially vision-language models, have gained significant traction in recent years, greatly improving performance on many downstream tasks, including content moderation and violence detection. However, standard…

Computer Vision and Pattern Recognition · Computer Science 2024-08-05 Zhuokai Zhao , Harish Palani , Tianyi Liu , Lena Evans , Ruth Toner