Related papers: Deep Multi-Modal Sets

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

What Makes Multi-modal Learning Better than Single (Provably)

The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform their uni-modal counterparts, since more information is aggregated. Recently, joining the success of deep learning,…

Machine Learning · Computer Science 2021-10-27 Yu Huang , Chenzhuang Du , Zihui Xue , Xuanyao Chen , Hang Zhao , Longbo Huang

Dense Multimodal Fusion for Hierarchically Joint Representation

Multiple modalities can provide more valuable information than single one by describing the same contents in various ways. Hence, it is highly expected to learn effective joint representation by fusing the features of different modalities.…

Computer Vision and Pattern Recognition · Computer Science 2018-10-09 Di Hu , Feiping Nie , Xuelong Li

Robust Deep Multi-modal Learning Based on Gated Information Fusion Network

The goal of multi-modal learning is to use complimentary information on the relevant task provided by the multiple modalities to achieve reliable and robust performance. Recently, deep learning has led significant improvement in multi-modal…

Computer Vision and Pattern Recognition · Computer Science 2018-11-05 Jaekyum Kim , Junho Koh , Yecheol Kim , Jaehyung Choi , Youngbae Hwang , Jun Won Choi

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network. The framework consists of two innovative fusion schemes. Firstly, unlike existing multimodal methods that necessitate…

Computer Vision and Pattern Recognition · Computer Science 2021-08-12 Yikai Wang , Fuchun Sun , Ming Lu , Anbang Yao

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Deep learning methods have revolutionized speech recognition, image recognition, and natural language processing since 2010. Each of these tasks involves a single modality in their input signals. However, many applications in the artificial…

Artificial Intelligence · Computer Science 2020-07-15 Chao Zhang , Zichao Yang , Xiaodong He , Li Deng

Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning

Developing effective multimodal data fusion strategies has become increasingly essential for improving the predictive power of statistical machine learning methods across a wide range of applications, from autonomous driving to medical…

Machine Learning · Computer Science 2025-07-29 Ziyi Liang , Annie Qu , Babak Shahbaba

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

We abstract the features (i.e. learned representations) of multi-modal data into 1) uni-modal features, which can be learned from uni-modal training, and 2) paired features, which can only be learned from cross-modal interactions.…

Computer Vision and Pattern Recognition · Computer Science 2023-06-26 Chenzhuang Du , Jiaye Teng , Tingle Li , Yichen Liu , Tianyuan Yuan , Yue Wang , Yang Yuan , Hang Zhao

Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models

Multimodal learning typically relies on the assumption that all modalities are fully available during both the training and inference phases. However, in real-world scenarios, consistently acquiring complete multimodal data presents…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Donggeun Kim , Taesup Kim

Improving Multi-Modal Learning with Uni-Modal Teachers

Learning multi-modal representations is an essential step towards real-world robotic applications, and various multi-modal fusion models have been developed for this purpose. However, we observe that existing models, whose objectives are…

Machine Learning · Computer Science 2021-06-22 Chenzhuang Du , Tingle Li , Yichen Liu , Zixin Wen , Tianyu Hua , Yue Wang , Hang Zhao

Cross-Modal Discrete Representation Learning

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector. In this work we present a self-supervised…

Computer Vision and Pattern Recognition · Computer Science 2021-06-11 Alexander H. Liu , SouYoung Jin , Cheng-I Jeff Lai , Andrew Rouditchenko , Aude Oliva , James Glass

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion

With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each…

Computer Vision and Pattern Recognition · Computer Science 2020-06-16 Yang Wang

A Closer Look at Multimodal Representation Collapse

We aim to develop a fundamental understanding of modality collapse, a recently observed empirical phenomenon wherein models trained for multimodal fusion tend to rely only on a subset of the modalities, ignoring the rest. We show that…

Machine Learning · Computer Science 2025-08-18 Abhra Chaudhuri , Anjan Dutta , Tu Bui , Serban Georgescu

Does a Technique for Building Multimodal Representation Matter? -- Comparative Analysis

Creating a meaningful representation by fusing single modalities (e.g., text, images, or audio) is the core concept of multimodal learning. Although several techniques for building multimodal representations have been proven successful,…

Machine Learning · Computer Science 2025-08-08 Maciej Pawłowski , Anna Wróblewska , Sylwia Sysko-Romańczuk

Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process

Developing effective multimodal fusion approaches has become increasingly essential in many real-world scenarios, such as health care and finance. The key challenge is how to preserve the feature expressiveness in each modality while…

Machine Learning · Computer Science 2025-10-24 Tsai Hor Chan , Feng Wu , Yihang Chen , Guosheng Yin , Lequan Yu

Learning to Represent and Predict Sets with Deep Neural Networks

In this thesis, we develop various techniques for working with sets in machine learning. Each input or output is not an image or a sequence, but a set: an unordered collection of multiple objects, each object described by a feature vector.…

Machine Learning · Computer Science 2021-03-09 Yan Zhang

UniMat: Unifying Materials Embeddings through Multi-modal Learning

Materials science datasets are inherently heterogeneous and are available in different modalities such as characterization spectra, atomic structures, microscopic images, and text-based synthesis conditions. The advancements in multi-modal…

Machine Learning · Computer Science 2024-11-14 Janghoon Ock , Joseph Montoya , Daniel Schweigert , Linda Hung , Santosh K. Suram , Weike Ye

Multimodal Prediction based on Graph Representations

This paper proposes a learning model, based on rank-fusion graphs, for general applicability in multimodal prediction tasks, such as multimodal regression and image classification. Rank-fusion graphs encode information from multiple…

Computer Vision and Pattern Recognition · Computer Science 2020-07-06 Icaro Cavalcante Dourado , Salvatore Tabbone , Ricardo da Silva Torres

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA),…

Artificial Intelligence · Computer Science 2024-03-04 Muhammad Arslan Manzoor , Sarah Albarri , Ziting Xian , Zaiqiao Meng , Preslav Nakov , Shangsong Liang

Continual Learning for Multiple Modalities

Continual learning aims to learn knowledge of tasks observed in sequential time steps while mitigating the forgetting of previously learned knowledge. Existing methods were designed to learn a single modality (e.g., image) over time, which…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Hyundong Jin , Eunwoo Kim