Related papers: Training Multimodal Systems for Classification wit…

A Theory of Multimodal Learning

Human perception of the empirical world involves recognizing the diverse appearances, or 'modalities', of underlying objects. Despite the longstanding consideration of this perspective in philosophy and cognitive science, the study of…

Machine Learning · Computer Science 2023-12-19 Zhou Lu

Generalization in Multimodal Language Learning from Simulation

Neural networks can be powerful function approximators, which are able to model high-dimensional feature distributions from a subset of examples drawn from the target distribution. Naturally, they perform well at generalizing within the…

Machine Learning · Computer Science 2021-08-06 Aaron Eisermann , Jae Hee Lee , Cornelius Weber , Stefan Wermter

Towards Robust Multimodal Learning in the Open World

The rapid evolution of machine learning has propelled neural networks to unprecedented success across diverse domains. In particular, multimodal learning has emerged as a transformative paradigm, leveraging complementary information from…

Machine Learning · Computer Science 2025-11-14 Fushuo Huo

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

Multimodal Understanding Through Correlation Maximization and Minimization

Multimodal learning has mainly focused on learning large models on, and fusing feature representations from, different modalities for better performances on downstream tasks. In this work, we take a detour from this trend and study the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-08 Yifeng Shi , Marc Niethammer

Multi-Modal Machine Learning Framework for Automated Seizure Detection in Laboratory Rats

A multi-modal machine learning system uses multiple unique data sources and types to improve its performance. This article proposes a system that combines results from several types of models, all of which are trained on different data…

Machine Learning · Computer Science 2024-02-05 Aaron Mullen , Samuel E. Armstrong , Jasmine Perdeh , Bjorn Bauer , Jeffrey Talbert , V. K. Cody Bumgardner

Revisit Modality Imbalance at the Decision Layer

Multimodal learning integrates information from different modalities to enhance model performance, yet it often suffers from modality imbalance, where dominant modalities overshadow weaker ones during joint optimization. This paper reveals…

Machine Learning · Computer Science 2025-10-17 Xiaoyu Ma , Hao Chen

What Makes Training Multi-Modal Classification Networks Hard?

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart. In our…

Computer Vision and Pattern Recognition · Computer Science 2020-04-06 Weiyao Wang , Du Tran , Matt Feiszli

Balancing Multi-modal Sensor Learning via Multi-objective Optimization

Learning-enabled control systems increasingly rely on multiple sensing modalities (e.g., vision, audio, language, etc.) for perception and decision support. A key challenge is that multi-modal sensor training dynamics are often imbalanced:…

Machine Learning · Computer Science 2026-04-01 Heshan Fernando , Quan Xiao , Parikshit Ram , Yi Zhou , Horst Samulowitz , Nathalie Baracaldo , Tianyi Chen

Modular meta-learning

Many prediction problems, such as those that arise in the context of robotics, have a simplifying underlying structure that, if known, could accelerate learning. In this paper, we present a strategy for learning a set of neural network…

Machine Learning · Computer Science 2019-05-06 Ferran Alet , Tomás Lozano-Pérez , Leslie P. Kaelbling

Multimodal Machine Learning: A Survey and Taxonomy

Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors. Modality refers to the way in which something happens or is experienced and a research problem is characterized as…

Machine Learning · Computer Science 2017-08-02 Tadas Baltrušaitis , Chaitanya Ahuja , Louis-Philippe Morency

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Multimodal learning helps to comprehensively understand the world, by integrating different senses. Accordingly, multiple input modalities are expected to boost model performance, but we actually find that they are not fully exploited even…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Xiaokang Peng , Yake Wei , Andong Deng , Dong Wang , Di Hu

Attribution Regularization for Multimodal Paradigms

Multimodal machine learning has gained significant attention in recent years due to its potential for integrating information from multiple modalities to enhance learning and decision-making processes. However, it is commonly observed that…

Machine Learning · Computer Science 2025-09-12 Sahiti Yerramilli , Jayant Sravan Tamarapalli , Jonathan Francis , Eric Nyberg

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Humans do not acquire perceptual abilities in the way we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies more heavily on…

Sound · Computer Science 2019-11-15 Aren Jansen , Daniel P. W. Ellis , Shawn Hershey , R. Channing Moore , Manoj Plakal , Ashok C. Popat , Rif A. Saurous

Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

Recent technological advancements in multimodal machine learning--including the rise of large language models (LLMs)--have improved our ability to collect, process, and analyze diverse multimodal data such as speech, video, and eye gaze in…

Machine Learning · Computer Science 2025-12-19 Clayton Cohn , Eduardo Davalos , Caleb Vatral , Joyce Horn Fonteles , Hanchen David Wang , Austin Coursey , Surya Rayala , Ashwin T S , Meiyi Ma , Gautam Biswas

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples…

Computer Vision and Pattern Recognition · Computer Science 2024-08-29 Zhiqiu Lin , Samuel Yu , Zhiyi Kuang , Deepak Pathak , Deva Ramanan

Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning

One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is challenging and expensive. Recently, self-supervised…

Computer Vision and Pattern Recognition · Computer Science 2020-12-11 Elad Amrani , Rami Ben-Ari , Daniel Rotman , Alex Bronstein

Unbiased Dynamic Multimodal Fusion

Traditional multimodal methods often assume static modality quality, which limits their adaptability in dynamic real-world scenarios. Thus, dynamical multimodal methods are proposed to assess modality quality and adjust their contribution…

Computer Vision and Pattern Recognition · Computer Science 2026-03-23 Shicai Wei , Kaijie Zhang , Luyi Chen , Tao He , Guiduo Duan

Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

The natural world is abundant with concepts expressed via visual, acoustic, tactile, and linguistic modalities. Much of the existing progress in multimodal learning, however, focuses primarily on problems where the same set of modalities…

Machine Learning · Computer Science 2020-12-08 Paul Pu Liang , Peter Wu , Liu Ziyin , Louis-Philippe Morency , Ruslan Salakhutdinov

Modular Continual Learning in a Unified Visual Environment

A core aspect of human intelligence is the ability to learn new tasks quickly and switch between them flexibly. Here, we describe a modular continual reinforcement learning paradigm inspired by these abilities. We first introduce a visual…

Machine Learning · Computer Science 2017-12-13 Kevin T. Feigelis , Blue Sheffer , Daniel L. K. Yamins