English
Related papers

Related papers: Deep Mamba Multi-modal Learning

200 papers

Due to its low storage cost and fast query speed, cross-modal hashing (CMH) has been widely used for similarity search in multimedia retrieval applications. However, almost all existing CMH methods are based on hand-crafted features which…

Information Retrieval · Computer Science 2016-02-16 Qing-Yuan Jiang , Wu-Jun Li

Deep image hashing aims to enable effective large-scale image retrieval by mapping the input images into simple binary hash codes through deep neural networks. More recently, Vision Mamba with linear time complexity has attracted extensive…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Chao He , Hongxi Wei

Depression is a common mental disorder that affects millions of people worldwide. Although promising, current multimodal methods hinge on aligned or aggregated multimodal fusion, suffering two significant limitations: (i) inefficient…

Computers and Society · Computer Science 2024-09-25 Jiaxin Ye , Junping Zhang , Hongming Shan

Current end-to-end multi-modal models utilize different encoders and decoders to process input and output information. This separation hinders the joint representation learning of various modalities. To unify multi-modal processing, we…

Computer Vision and Pattern Recognition · Computer Science 2025-10-20 Chunhao Lu , Qiang Lu , Meichen Dong , Jake Luo

Mamba-based architectures have shown to be a promising new direction for deep learning models owing to their competitive performance and sub-quadratic deployment speed. However, current Mamba multi-modal large language models (MLLM) are…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Yifei Xing , Xiangyuan Lan , Ruiping Wang , Dongmei Jiang , Wenjun Huang , Qingfang Zheng , Yaowei Wang

In recent years, robust matching methods using deep learning-based approaches have been actively studied and improved in computer vision tasks. However, there remains a persistent demand for both robust and fast matching techniques. To…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Kihwan Ryoo , Hyungtae Lim , Hyun Myung

Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. In this article, we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable…

Computer Vision and Pattern Recognition · Computer Science 2022-01-06 Lu Jin , Zechao Li , Jinhui Tang

Multimodal Large Language Models (MLLMs) have attracted much attention for their multifunctionality. However, traditional Transformer architectures incur significant overhead due to their secondary computational complexity. To address this…

Computer Vision and Pattern Recognition · Computer Science 2024-08-22 Wenjun Huang , Jiakai Pan , Jiahao Tang , Yanyu Ding , Yifei Xing , Yuhe Wang , Zhengzhuo Wang , Jianguo Hu

Due to the rapid advancements of sensory and computing technology, multi-modal data sources that represent the same pattern or phenomenon have attracted growing attention. As a result, finding means to explore useful information from these…

Machine Learning · Computer Science 2021-03-10 Lei Gao , Ling Guan

Learning the hash representation of multi-view heterogeneous data is an important task in multimedia retrieval. However, existing methods fail to effectively fuse the multi-view features and utilize the metric information provided by the…

Computer Vision and Pattern Recognition · Computer Science 2023-04-14 Jian Zhu , Zhangmin Huang , Xiaohu Ruan , Yu Cui , Yongli Cheng , Lingfang Zeng

Multiple modalities can provide more valuable information than single one by describing the same contents in various ways. Hence, it is highly expected to learn effective joint representation by fusing the features of different modalities.…

Computer Vision and Pattern Recognition · Computer Science 2018-10-09 Di Hu , Feiping Nie , Xuelong Li

Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement…

Machine Learning · Computer Science 2024-06-04 Sili Huang , Jifeng Hu , Zhejian Yang , Liwei Yang , Tao Luo , Hechang Chen , Lichao Sun , Bo Yang

Multimodal image fusion aims to integrate information from different imaging techniques to produce a comprehensive, detail-rich single image for downstream vision tasks. Existing methods based on local convolutional neural networks (CNNs)…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Xinyu Xie , Yawen Cui , Tao Tan , Xubin Zheng , Zitong Yu

Cross-modality fusing complementary information from different modalities effectively improves object detection performance, making it more useful and robust for a wider range of applications. Existing fusion strategies combine different…

Computer Vision and Pattern Recognition · Computer Science 2025-07-23 Wenhao Dong , Haodong Zhu , Shaohui Lin , Xiaoyan Luo , Yunhang Shen , Xuhui Liu , Juan Zhang , Guodong Guo , Baochang Zhang

Multimodal fusion has made great progress in the field of remote sensing image classification due to its ability to exploit the complementary spatial-spectral information. Deep learning methods such as CNN and Transformer have been widely…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Qingyu Wang , Xue Jiang , Guozheng Xu

We introduce a novel deep learning method for decoding error correction codes based on the Mamba architecture, enhanced with Transformer layers. Our approach proposes a hybrid decoder that leverages Mamba's efficient sequential modeling…

Information Theory · Computer Science 2025-05-26 Shy-el Cohen , Yoni Choukroun , Eliya Nachmani

Videos have become ubiquitous on the Internet. And video analysis can provide lots of information for detecting and recognizing objects as well as help people understand human actions and interactions with the real world. However, facing…

Computer Vision and Pattern Recognition · Computer Science 2018-12-03 Tianqi Zhao

Multi-modality image fusion aims to integrate the merits of images from different sources and render high-quality fusion images. However, existing feature extraction and fusion methods are either constrained by inherent local reduction bias…

Computer Vision and Pattern Recognition · Computer Science 2024-09-06 Chenguang Zhu , Shan Gao , Huafeng Chen , Guangqian Guo , Chaowei Wang , Yaoxing Wang , Chen Shu Lei , Quanjiang Fan

Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain…

Multimedia · Computer Science 2017-10-19 Feiran Huang , Xiaoming Zhang , Zhoujun Li , Tao Mei , Yueying He , Zhonghua Zhao

Deep learning has revolutionized biomedical research by providing sophisticated methods to handle complex, high-dimensional data. Multimodal deep learning (MDL) further enhances this capability by integrating diverse data types such as…

Machine Learning · Computer Science 2026-03-13 Valerio Guarrasi , Fatih Aksu , Camillo Maria Caruso , Francesco Di Feola , Aurora Rofena , Filippo Ruffini , Paolo Soda
‹ Prev 1 2 3 10 Next ›