Related papers: Deep Mamba Multi-modal Learning

Deep Cross-Modal Hashing

Due to its low storage cost and fast query speed, cross-modal hashing (CMH) has been widely used for similarity search in multimedia retrieval applications. However, almost all existing CMH methods are based on hand-crafted features which…

Information Retrieval · Computer Science 2016-02-16 Qing-Yuan Jiang , Wu-Jun Li

MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval

Deep image hashing aims to enable effective large-scale image retrieval by mapping the input images into simple binary hash codes through deep neural networks. More recently, Vision Mamba with linear time complexity has attracted extensive…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Chao He , Hongxi Wei

DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection

Depression is a common mental disorder that affects millions of people worldwide. Although promising, current multimodal methods hinge on aligned or aggregated multimodal fusion, suffering two significant limitations: (i) inefficient…

Computers and Society · Computer Science 2024-09-25 Jiaxin Ye , Junping Zhang , Hongming Shan

End-to-End Multi-Modal Diffusion Mamba

Current end-to-end multi-modal models utilize different encoders and decoders to process input and output information. This separation hinders the joint representation learning of various modalities. To unify multi-modal processing, we…

Computer Vision and Pattern Recognition · Computer Science 2025-10-20 Chunhao Lu , Qiang Lu , Meichen Dong , Jake Luo

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

Mamba-based architectures have shown to be a promising new direction for deep learning models owing to their competitive performance and sub-quadratic deployment speed. However, current Mamba multi-modal large language models (MLLM) are…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Yifei Xing , Xiangyuan Lan , Ruiping Wang , Dongmei Jiang , Wenjun Huang , Qingfang Zheng , Yaowei Wang

MambaGlue: Fast and Robust Local Feature Matching With Mamba

In recent years, robust matching methods using deep learning-based approaches have been actively studied and improved in computer vision tasks. However, there remains a persistent demand for both robust and fast matching techniques. To…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Kihwan Ryoo , Hyungtae Lim , Hyun Myung

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals

Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. In this article, we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable…

Computer Vision and Pattern Recognition · Computer Science 2022-01-06 Lu Jin , Zechao Li , Jinhui Tang

ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2

Multimodal Large Language Models (MLLMs) have attracted much attention for their multifunctionality. However, traditional Transformer architectures incur significant overhead due to their secondary computational complexity. To address this…

Computer Vision and Pattern Recognition · Computer Science 2024-08-22 Wenjun Huang , Jiakai Pan , Jiahao Tang , Yanyu Ding , Yifei Xing , Yuhe Wang , Zhengzhuo Wang , Jianguo Hu

A Discriminative Vectorial Framework for Multi-modal Feature Representation

Due to the rapid advancements of sensory and computing technology, multi-modal data sources that represent the same pattern or phenomenon have attracted growing attention. As a result, finding means to explore useful information from these…

Machine Learning · Computer Science 2021-03-10 Lei Gao , Ling Guan

Deep Metric Multi-View Hashing for Multimedia Retrieval

Learning the hash representation of multi-view heterogeneous data is an important task in multimedia retrieval. However, existing methods fail to effectively fuse the multi-view features and utilize the metric information provided by the…

Computer Vision and Pattern Recognition · Computer Science 2023-04-14 Jian Zhu , Zhangmin Huang , Xiaohu Ruan , Yu Cui , Yongli Cheng , Lingfang Zeng

Dense Multimodal Fusion for Hierarchically Joint Representation

Multiple modalities can provide more valuable information than single one by describing the same contents in various ways. Hence, it is highly expected to learn effective joint representation by fusing the features of different modalities.…

Computer Vision and Pattern Recognition · Computer Science 2018-10-09 Di Hu , Feiping Nie , Xuelong Li

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement…

Machine Learning · Computer Science 2024-06-04 Sili Huang , Jifeng Hu , Zhejian Yang , Liwei Yang , Tao Luo , Hechang Chen , Lichao Sun , Bo Yang

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

Multimodal image fusion aims to integrate information from different imaging techniques to produce a comprehensive, detail-rich single image for downstream vision tasks. Existing methods based on local convolutional neural networks (CNNs)…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Xinyu Xie , Yawen Cui , Tao Tan , Xubin Zheng , Zitong Yu

Fusion-Mamba for Cross-modality Object Detection

Cross-modality fusing complementary information from different modalities effectively improves object detection performance, making it more useful and robust for a wider range of applications. Existing fusion strategies combine different…

Computer Vision and Pattern Recognition · Computer Science 2025-07-23 Wenhao Dong , Haodong Zhu , Shaohui Lin , Xiaoyan Luo , Yunhang Shen , Xuhui Liu , Juan Zhang , Guodong Guo , Baochang Zhang

CSFMamba: Cross State Fusion Mamba Operator for Multimodal Remote Sensing Image Classification

Multimodal fusion has made great progress in the field of remote sensing image classification due to its ability to exploit the complementary spatial-spectral information. Deep learning methods such as CNN and Transformer have been widely…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Qingyu Wang , Xue Jiang , Guozheng Xu

Hybrid Mamba-Transformer Decoder for Error-Correcting Codes

We introduce a novel deep learning method for decoding error correction codes based on the Mamba architecture, enhanced with Transformer layers. Our approach proposes a hybrid decoder that leverages Mamba's efficient sequential modeling…

Information Theory · Computer Science 2025-05-26 Shy-el Cohen , Yoni Choukroun , Eliya Nachmani

Deep Multimodal Learning: An Effective Method for Video Classification

Videos have become ubiquitous on the Internet. And video analysis can provide lots of information for detecting and recognizing objects as well as help people understand human actions and interactions with the real world. However, facing…

Computer Vision and Pattern Recognition · Computer Science 2018-12-03 Tianqi Zhao

Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion

Multi-modality image fusion aims to integrate the merits of images from different sources and render high-quality fusion images. However, existing feature extraction and fusion methods are either constrained by inherent local reduction bias…

Computer Vision and Pattern Recognition · Computer Science 2024-09-06 Chenguang Zhu , Shan Gao , Huafeng Chen , Guangqian Guo , Chaowei Wang , Yaoxing Wang , Chen Shu Lei , Quanjiang Fan

Learning Social Image Embedding with Deep Multimodal Attention Networks

Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain…

Multimedia · Computer Science 2017-10-19 Feiran Huang , Xiaoming Zhang , Zhoujun Li , Tao Mei , Yueying He , Zhonghua Zhao

A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications

Deep learning has revolutionized biomedical research by providing sophisticated methods to handle complex, high-dimensional data. Multimodal deep learning (MDL) further enhances this capability by integrating diverse data types such as…

Machine Learning · Computer Science 2026-03-13 Valerio Guarrasi , Fatih Aksu , Camillo Maria Caruso , Francesco Di Feola , Aurora Rofena , Filippo Ruffini , Paolo Soda