Related papers: Multimodal Deep Learning

Recent Advances and Trends in Multimodal Deep Learning: A Review

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Jabeen Summaira , Xi Li , Amin Muhammad Shoib , Songyuan Li , Jabbar Abdul

A Review on Methods and Applications in Multimodal Deep Learning

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities.…

Machine Learning · Computer Science 2022-02-21 Jabeen Summaira , Xi Li , Amin Muhammad Shoib , Jabbar Abdul

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA),…

Artificial Intelligence · Computer Science 2024-03-04 Muhammad Arslan Manzoor , Sarah Albarri , Ziting Xian , Zaiqiao Meng , Preslav Nakov , Shangsong Liang

Survey on Self-Supervised Multimodal Representation Learning and Foundation Models

Deep learning has been the subject of growing interest in recent years. Specifically, a specific type called Multimodal learning has shown great promise for solving a wide range of problems in domains such as language, vision, audio, etc.…

Machine Learning · Computer Science 2022-11-30 Sushil Thapa

New Ideas and Trends in Deep Multimodal Content Understanding: A Review

The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. Unlike classic reviews of deep learning where monomodal image classifiers such as VGG, ResNet and Inception module are central…

Computer Vision and Pattern Recognition · Computer Science 2020-10-19 Wei Chen , Weiping Wang , Li Liu , Michael S. Lew

Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities

Multimodal models are expected to be a critical component to future advances in artificial intelligence. This field is starting to grow rapidly with a surge of new design elements motivated by the success of foundation models in natural…

Computation and Language · Computer Science 2024-06-11 Sai Munikoti , Ian Stewart , Sameera Horawalavithana , Henry Kvinge , Tegan Emerson , Sandra E Thompson , Karl Pazdernik

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Deep learning methods have revolutionized speech recognition, image recognition, and natural language processing since 2010. Each of these tasks involves a single modality in their input signals. However, many applications in the artificial…

Artificial Intelligence · Computer Science 2020-07-15 Chao Zhang , Zichao Yang , Xiaodong He , Li Deng

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications

Deep learning has revolutionized biomedical research by providing sophisticated methods to handle complex, high-dimensional data. Multimodal deep learning (MDL) further enhances this capability by integrating diverse data types such as…

Machine Learning · Computer Science 2026-03-13 Valerio Guarrasi , Fatih Aksu , Camillo Maria Caruso , Francesco Di Feola , Aurora Rofena , Filippo Ruffini , Paolo Soda

A survey on knowledge-enhanced multimodal learning

Multimodal learning has been a field of increasing interest, aiming to combine various modalities in a single joint representation. Especially in the area of visiolinguistic (VL) learning multiple models and techniques have been developed,…

Machine Learning · Computer Science 2024-03-26 Maria Lymperaiou , Giorgos Stamou

A survey of multimodal deep generative models

Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and…

Machine Learning · Computer Science 2022-07-06 Masahiro Suzuki , Yutaka Matsuo

A Survey on State-of-the-art Deep Learning Applications and Challenges

Deep learning, a branch of artificial intelligence, is a data-driven method that uses multiple layers of interconnected units or neurons to learn intricate patterns and representations directly from raw input data. Empowered by this…

Machine Learning · Computer Science 2025-07-28 Mohd Halim Mohd Noor , Ayokunle Olalekan Ige

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves…

Machine Learning · Computer Science 2022-01-19 Anil Rahate , Rahee Walambe , Sheela Ramanna , Ketan Kotecha

Multimodal Large Language Models: A Survey

The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to…

Artificial Intelligence · Computer Science 2023-11-23 Jiayang Wu , Wensheng Gan , Zefeng Chen , Shicheng Wan , Philip S. Yu

DeepSuM: Deep Sufficient Modality Learning Framework

Multimodal learning has become a pivotal approach in developing robust learning models with applications spanning multimedia, robotics, large language models, and healthcare. The efficiency of multimodal systems is a critical concern, given…

Machine Learning · Computer Science 2025-03-04 Zhe Gao , Jian Huang , Ting Li , Xueqin Wang

A Theory of Multimodal Learning

Human perception of the empirical world involves recognizing the diverse appearances, or 'modalities', of underlying objects. Despite the longstanding consideration of this perspective in philosophy and cognitive science, the study of…

Machine Learning · Computer Science 2023-12-19 Zhou Lu

Continual Learning for Multiple Modalities

Continual learning aims to learn knowledge of tasks observed in sequential time steps while mitigating the forgetting of previously learned knowledge. Existing methods were designed to learn a single modality (e.g., image) over time, which…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Hyundong Jin , Eunwoo Kim

Quantifying Multimodal Capabilities: Formal Generalization Guarantees in Pairwise Metric Learning

Multimodal learning leverages the integration of diverse data modalities to enhance performance in complex tasks. Yet, it frequently encounters incomplete or redundant modality data in real-world scenarios. This paper presents a…

Machine Learning · Computer Science 2026-05-05 Richeng Zhou , Xuelin Zhang , Liyuan Liu

Semi-supervised Multimodal Representation Learning through a Global Workspace

Recent deep learning models can efficiently combine inputs from different modalities (e.g., images and text) and learn to align their latent representations, or to translate signals from one domain to another (as in image captioning, or…

Artificial Intelligence · Computer Science 2025-11-27 Benjamin Devillers , Léopold Maytié , Rufin VanRullen

Design Perspectives of Multitask Deep Learning Models and Applications

In recent years, multi-task learning has turned out to be of great success in various applications. Though single model training has promised great results throughout these years, it ignores valuable information that might help us estimate…

Machine Learning · Computer Science 2022-09-28 Yeshwant Singh , Anupam Biswas , Angshuman Bora , Debashish Malakar , Subham Chakraborty , Suman Bera