Related papers: What is Multimodality?

Multimodal Machine Translation through Visuals and Speech

Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. The most prominent tasks in this area…

Computation and Language · Computer Science 2019-12-02 Umut Sulubacak , Ozan Caglayan , Stig-Arne Grönroos , Aku Rouhe , Desmond Elliott , Lucia Specia , Jörg Tiedemann

Multimodal Large Language Models: A Survey

The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to…

Artificial Intelligence · Computer Science 2023-11-23 Jiayang Wu , Wensheng Gan , Zefeng Chen , Shicheng Wan , Philip S. Yu

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA),…

Artificial Intelligence · Computer Science 2024-03-04 Muhammad Arslan Manzoor , Sarah Albarri , Ziting Xian , Zaiqiao Meng , Preslav Nakov , Shangsong Liang

A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges

In recent years, multi-modal machine translation has attracted significant interest in both academia and industry due to its superior performance. It takes both textual and visual modalities as inputs, leveraging visual context to tackle…

Computation and Language · Computer Science 2024-05-24 Huangjun Shen , Liangying Shao , Wenbo Li , Zhibin Lan , Zhanyu Liu , Jinsong Su

Multimodal Machine Learning: A Survey and Taxonomy

Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors. Modality refers to the way in which something happens or is experienced and a research problem is characterized as…

Machine Learning · Computer Science 2017-08-02 Tadas Baltrušaitis , Chaitanya Ahuja , Louis-Philippe Morency

Multimodality in Meta-Learning: A Comprehensive Survey

Meta-learning has gained wide popularity as a training framework that is more data-efficient than traditional machine learning methods. However, its generalization ability in complex task distributions, such as multimodal tasks, has not…

Machine Learning · Computer Science 2022-05-10 Yao Ma , Shilin Zhao , Weixiao Wang , Yaoman Li , Irwin King

Mind with Eyes: from Language Reasoning to Multimodal Reasoning

Language models have recently advanced into the realm of reasoning, yet it is through multimodal reasoning that we can fully unlock the potential to achieve more comprehensive, human-like cognitive capabilities. This survey provides a…

Computation and Language · Computer Science 2025-03-25 Zhiyu Lin , Yifei Gao , Xian Zhao , Yunfan Yang , Jitao Sang

Recent Advances and Trends in Multimodal Deep Learning: A Review

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Jabeen Summaira , Xi Li , Amin Muhammad Shoib , Songyuan Li , Jabbar Abdul

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative…

Machine Learning · Computer Science 2023-02-21 Paul Pu Liang , Amir Zadeh , Louis-Philippe Morency

A Review on Methods and Applications in Multimodal Deep Learning

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities.…

Machine Learning · Computer Science 2022-02-21 Jabeen Summaira , Xi Li , Amin Muhammad Shoib , Jabbar Abdul

Multilingual Multimodality: A Taxonomical Survey of Datasets, Techniques, Challenges and Opportunities

Contextualizing language technologies beyond a single language kindled embracing multiple modalities and languages. Individually, each of these directions undoubtedly proliferated into several NLP tasks. Despite this momentum, most of the…

Computation and Language · Computer Science 2022-11-01 Khyathi Raghavi Chandu , Alborz Geramifard

Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy

Multimodal learning, a rapidly evolving field in artificial intelligence, seeks to construct more versatile and robust systems by integrating and analyzing diverse types of data, including text, images, audio, and video. Inspired by the…

Artificial Intelligence · Computer Science 2024-12-24 Priyaranjan Pattnayak , Hitesh Laxmichand Patel , Bhargava Kumar , Amit Agarwal , Ishan Banerjee , Srikant Panda , Tejaswini Kumar

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves…

Machine Learning · Computer Science 2022-01-19 Anil Rahate , Rahee Walambe , Sheela Ramanna , Ketan Kotecha

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Deep learning methods have revolutionized speech recognition, image recognition, and natural language processing since 2010. Each of these tasks involves a single modality in their input signals. However, many applications in the artificial…

Artificial Intelligence · Computer Science 2020-07-15 Chao Zhang , Zichao Yang , Xiaodong He , Li Deng

Multimodal Grounding for Language Processing

This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing…

Computation and Language · Computer Science 2019-07-04 Lisa Beinborn , Teresa Botschen , Iryna Gurevych

Multimodal Conversational AI: A Survey of Datasets and Approaches

As humans, we experience the world with all our senses or modalities (sound, sight, touch, smell, and taste). We use these modalities, particularly sight and touch, to convey and interpret specific meanings. Multimodal expressions are…

Machine Learning · Computer Science 2022-05-17 Anirudh Sundar , Larry Heck

Revisit Multimodal Meta-Learning through the Lens of Multi-Task Learning

Multimodal meta-learning is a recent problem that extends conventional few-shot meta-learning by generalizing its setup to diverse multimodal task distributions. This setup makes a step towards mimicking how humans make use of a diverse set…

Machine Learning · Computer Science 2021-10-28 Milad Abdollahzadeh , Touba Malekzadeh , Ngai-Man Cheung

Vision+X: A Survey on Multimodal Learning in the Light of Data

We are perceiving and communicating with the world in a multisensory manner, where different information sources are sophisticatedly processed and interpreted by separate parts of the human brain to constitute a complex, yet harmonious and…

Computer Vision and Pattern Recognition · Computer Science 2024-06-12 Ye Zhu , Yu Wu , Nicu Sebe , Yan Yan

Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

Recent technological advancements in multimodal machine learning--including the rise of large language models (LLMs)--have improved our ability to collect, process, and analyze diverse multimodal data such as speech, video, and eye gaze in…

Machine Learning · Computer Science 2025-12-19 Clayton Cohn , Eduardo Davalos , Caleb Vatral , Joyce Horn Fonteles , Hanchen David Wang , Austin Coursey , Surya Rayala , Ashwin T S , Meiyi Ma , Gautam Biswas