Related papers: Efficient Large-Scale Multi-Modal Classification

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion

With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each…

Computer Vision and Pattern Recognition · Computer Science 2020-06-16 Yang Wang

Image and Encoded Text Fusion for Multi-Modal Classification

Multi-modal approaches employ data from multiple input streams such as textual and visual domains. Deep neural networks have been successfully employed for these approaches. In this paper, we present a novel multi-modal approach that fuses…

Computer Vision and Pattern Recognition · Computer Science 2018-10-05 Ignazio Gallo , Alessandro Calefati , Shah Nawaz , Muhammad Kamran Janjua

Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions

As multimodal learning finds applications in a wide variety of high-stakes societal tasks, investigating their robustness becomes important. Existing work has focused on understanding the robustness of vision-and-language models to…

Machine Learning · Computer Science 2022-11-07 Gaurav Verma , Vishwa Vinay , Ryan A. Rossi , Srijan Kumar

Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

Classifying products into categories precisely and efficiently is a major challenge in modern e-commerce. The high traffic of new products uploaded daily and the dynamic nature of the categories raise the need for machine learning models…

Computer Vision and Pattern Recognition · Computer Science 2016-11-30 Tom Zahavy , Alessandro Magnani , Abhinandan Krishnan , Shie Mannor

Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches

Information fusion is used widely to improve document classification by the integration of multiple data sources (multimodal) or representations (multiview). However, the field lacks a unified framework, a quantitative synthesis of its…

Computation and Language · Computer Science 2026-05-27 Marcin Michał Mirończuk

Beyond Images: Adaptive Fusion of Visual and Textual Data for Food Classification

This study introduces a novel multimodal food recognition framework that effectively combines visual and textual modalities to enhance classification accuracy and robustness. The proposed approach employs a dynamic multimodal fusion…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Prateek Mittal , Puneet Goyal , Joohi Chauhan

Quality-based Multimodal Classification Using Tree-Structured Sparsity

Recent studies have demonstrated advantages of information fusion based on sparsity models for multimodal classification. Among several sparsity models, tree-structured sparsity provides a flexible framework for extraction of…

Computer Vision and Pattern Recognition · Computer Science 2015-02-04 Soheil Bahrampour , Asok Ray , Nasser M. Nasrabadi , Kenneth W. Jenkins

Robust Multi-Modal Sensor Fusion: An Adversarial Approach

In recent years, multi-modal fusion has attracted a lot of research interest, both in academia, and in industry. Multimodal fusion entails the combination of information from a set of different types of sensors. Exploiting complementary…

Machine Learning · Computer Science 2020-08-27 Siddharth Roheda , Hamid Krim , Benjamin S. Riggan

Adaptive Fusion Techniques for Multimodal Data

Effective fusion of data from multiple modalities, such as video, speech, and text, is challenging due to the heterogeneous nature of multimodal data. In this paper, we propose adaptive fusion techniques that aim to model context from…

Computation and Language · Computer Science 2021-01-27 Gaurav Sahu , Olga Vechtomova

Multimodal Classification for Analysing Social Media

Classification of social media data is an important approach in understanding user behavior on the Web. Although information on social media can be of different modalities such as texts, images, audio or videos, traditional approaches in…

Computation and Language · Computer Science 2017-08-08 Chi Thang Duong , Remi Lebret , Karl Aberer

Learning Multimodal Word Representation via Dynamic Fusion Methods

Multimodal models have been proven to outperform text-based models on learning semantic word representations. Almost all previous multimodal models typically treat the representations from different modalities equally. However, it is…

Computation and Language · Computer Science 2018-01-03 Shaonan Wang , Jiajun Zhang , Chengqing Zong

A review of deep learning-based information fusion techniques for multimodal medical image classification

Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep…

Computer Vision and Pattern Recognition · Computer Science 2024-04-24 Yihao Li , Mostafa El Habib Daho , Pierre-Henri Conze , Rachid Zeghlache , Hugo Le Boité , Ramin Tadayoni , Béatrice Cochener , Mathieu Lamard , Gwenolé Quellec

Multimodal Fusion on Low-quality Data: A Comprehensive Survey

Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical…

Machine Learning · Computer Science 2024-11-04 Qingyang Zhang , Yake Wei , Zongbo Han , Huazhu Fu , Xi Peng , Cheng Deng , Qinghua Hu , Cai Xu , Jie Wen , Di Hu , Changqing Zhang

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

What Makes Multi-modal Learning Better than Single (Provably)

The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform their uni-modal counterparts, since more information is aggregated. Recently, joining the success of deep learning,…

Machine Learning · Computer Science 2021-10-27 Yu Huang , Chenzhuang Du , Zihui Xue , Xuanyao Chen , Hang Zhao , Longbo Huang

Multi-modal Deep Analysis for Multimedia

With the rapid development of Internet and multimedia services in the past decade, a huge amount of user-generated and service provider-generated multimedia data become available. These data are heterogeneous and multi-modal in nature,…

Multimedia · Computer Science 2020-01-07 Wenwu Zhu , Xin Wang , Hongzhi Li

SynerGraph: An Integrated Graph Convolution Network for Multimodal Recommendation

This article presents a novel approach to multimodal recommendation systems, focusing on integrating and purifying multimodal data. Our methodology starts by developing a filter to remove noise from various types of data, making the…

Information Retrieval · Computer Science 2024-05-30 Mert Burabak , Tevfik Aytekin

Multimodal E-Commerce Product Classification Using Hierarchical Fusion

In this work, we present a multi-modal model for commercial product classification, that combines features extracted by multiple neural network models from textual (CamemBERT and FlauBERT) and visual data (SE-ResNeXt-50), using simple…

Artificial Intelligence · Computer Science 2022-07-12 Tsegaye Misikir Tashu , Sara Fattouh , Peter Kiss , Tomas Horvath

On the Benefits of Early Fusion in Multimodal Representation Learning

Intelligently reasoning about the world often requires integrating data from multiple modalities, as any individual modality may contain unreliable or incomplete information. Prior work in multimodal learning fuses input modalities only…

Machine Learning · Computer Science 2020-11-17 George Barnum , Sabera Talukder , Yisong Yue

Deep Multi-Modal Sets

Many vision-related tasks benefit from reasoning over multiple modalities to leverage complementary views of data in an attempt to learn robust embedding spaces. Most deep learning-based methods rely on a late fusion technique whereby…

Computer Vision and Pattern Recognition · Computer Science 2020-03-04 Austin Reiter , Menglin Jia , Pu Yang , Ser-Nam Lim