Related papers: Multi-modal Image Processing based on Coupled Dict…

Multimodal Image Super-resolution via Joint Sparse Representations induced by Coupled Dictionaries

Real-world data processing problems often involve various image modalities associated with a certain scene, including RGB images, infrared images or multi-spectral images. The fact that different image modalities often share certain…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Pingfan Song , Xin Deng , João F. C. Mota , Nikos Deligiannis , Pier Luigi Dragotti , Miguel R. D. Rodrigues

Multi-Focus Image Fusion Using Sparse Representation and Coupled Dictionary Learning

We address the multi-focus image fusion problem, where multiple images captured with different focal settings are to be fused into an all-in-focus image of higher quality. Algorithms for this problem necessarily admit the source image…

Computer Vision and Pattern Recognition · Computer Science 2019-05-06 Farshad G. Veshki , Sergiy A. Vorobyov

Coupled Feature Learning for Multimodal Medical Image Fusion

Multimodal image fusion aims to combine relevant information from images acquired with different sensors. In medical imaging, fused images play an essential role in both standard and automated diagnosis. In this paper, we propose a novel…

Computer Vision and Pattern Recognition · Computer Science 2021-02-18 Farshad G. Veshki , Nora Ouzir , Sergiy A. Vorobyov , Esa Ollila

Multi-modal dictionary learning for image separation with application in art investigation

In support of art investigation, we propose a new source separation method that unmixes a single X-ray scan acquired from double-sided paintings. In this problem, the X-ray signals to be separated have similar morphological characteristics,…

Computer Vision and Pattern Recognition · Computer Science 2016-11-15 Nikos Deligiannis , Joao F. C. Mota , Bruno Cornelis , Miguel R. D. Rodrigues , Ingrid Daubechies

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training

Image matching, which aims to identify corresponding pixel locations between images, is crucial in a wide range of scientific disciplines, aiding in image registration, fusion, and analysis. In recent years, deep learning-based image…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Xingyi He , Hao Yu , Sida Peng , Dongli Tan , Zehong Shen , Hujun Bao , Xiaowei Zhou

Multimodal sparse representation learning and applications

Unsupervised methods have proven effective for discriminative tasks in a single-modality scenario. In this paper, we present a multimodal framework for learning sparse representations that can capture semantic correlation between…

Machine Learning · Computer Science 2016-03-03 Miriam Cha , Youngjune Gwon , H. T. Kung

Multimodal Image Denoising based on Coupled Dictionary Learning

In this paper, we propose a new multimodal image denoising approach to attenuate white Gaussian additive noise in a given image modality under the aid of a guidance image modality. The proposed coupled image denoising approach consists of…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Pingfan Song , Miguel R. D. Rodrigues

Online Convolutional Dictionary Learning for Multimodal Imaging

Computational imaging methods that can exploit multiple modalities have the potential to enhance the capabilities of traditional sensing systems. In this paper, we propose a new method that reconstructs multimodal images from their linear…

Computer Vision and Pattern Recognition · Computer Science 2017-06-15 Kevin Degraux , Ulugbek S. Kamilov , Petros T. Boufounos , Dehong Liu

Image Pivoting for Learning Multilingual Multimodal Representations

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model…

Computation and Language · Computer Science 2017-07-25 Spandana Gella , Rico Sennrich , Frank Keller , Mirella Lapata

A Bimodal Co-Sparse Analysis Model for Image Processing

The success of many computer vision tasks lies in the ability to exploit the interdependency between different image modalities such as intensity and depth. Fusing corresponding information can be achieved on several levels, and one…

Computer Vision and Pattern Recognition · Computer Science 2014-06-26 Martin Kiechle , Tim Habigt , Simon Hawe , Martin Kleinsteuber

Using Multiple Instance Learning to Build Multimodal Representations

Image-text multimodal representation learning aligns data across modalities and enables important medical applications, e.g., image classification, visual grounding, and cross-modal retrieval. In this work, we establish a connection between…

Computer Vision and Pattern Recognition · Computer Science 2023-06-14 Peiqi Wang , William M. Wells , Seth Berkowitz , Steven Horng , Polina Golland

Learning Multi-modal Similarity

In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits…

Artificial Intelligence · Computer Science 2010-09-01 Brian McFee , Gert Lanckriet

A Computational Acquisition Model for Multimodal Word Categorization

Recent advances in self-supervised modeling of text and images open new opportunities for computational models of child language acquisition, which is believed to rely heavily on cross-modal signals. However, prior studies have been limited…

Computation and Language · Computer Science 2022-05-13 Uri Berger , Gabriel Stanovsky , Omri Abend , Lea Frermann

Multimodal Task-Driven Dictionary Learning for Image Classification

Dictionary learning algorithms have been successfully used for both reconstructive and discriminative tasks, where an input signal is represented with a sparse linear combination of dictionary atoms. While these methods are mostly developed…

Machine Learning · Statistics 2016-01-20 Soheil Bahrampour , Nasser M. Nasrabadi , Asok Ray , W. Kenneth Jenkins

Multi-modal Visual Understanding with Prompts for Semantic Information Disentanglement of Image

Multi-modal visual understanding of images with prompts involves using various visual and textual cues to enhance the semantic understanding of images. This approach combines both vision and language processing to generate more accurate…

Computer Vision and Pattern Recognition · Computer Science 2023-05-17 Yuzhou Peng

Joint Learning of Distributed Representations for Images and Texts

This technical report provides extra details of the deep multimodal similarity model (DMSM) which was proposed in (Fang et al. 2015, arXiv:1411.4952). The model is trained via maximizing global semantic similarity between images and their…

Computer Vision and Pattern Recognition · Computer Science 2015-04-29 Xiaodong He , Rupesh Srivastava , Jianfeng Gao , Li Deng

Show, Translate and Tell

Humans have an incredible ability to process and understand information from multiple sources such as images, video, text, and speech. Recent success of deep neural networks has enabled us to develop algorithms which give machines the…

Computer Vision and Pattern Recognition · Computer Science 2019-03-18 Dheeraj Peri , Shagan Sah , Raymond Ptucha

Coupled dictionary learning for unsupervised change detection between multi-sensor remote sensing images

Archetypal scenarios for change detection generally consider two images acquired through sensors of the same modality. However, in some specific cases such as emergency situations, the only images available may be those acquired through…

Image and Video Processing · Electrical Eng. & Systems 2019-09-04 Vinicius Ferraris , Nicolas Dobigeon , Yanna Cavalcanti , Thomas Oberlin , Marie Chabert

Dictionary-Based Deblurring for Unpaired Data

Effective image deblurring typically relies on large and fully paired datasets of blurred and corresponding sharp images. However, obtaining such accurately aligned data in the real world poses a number of difficulties, limiting the…

Image and Video Processing · Electrical Eng. & Systems 2025-10-21 Alok Panigrahi , Jayaprakash Katual , Satish Mulleti

Learning Multimodal Affinities for Textual Editing in Images

Nowadays, as cameras are rapidly adopted in our daily routine, images of documents are becoming both abundant and prevalent. Unlike natural images that capture physical objects, document-images contain a significant amount of text with…

Computer Vision and Pattern Recognition · Computer Science 2021-03-19 Or Perel , Oron Anschel , Omri Ben-Eliezer , Shai Mazor , Hadar Averbuch-Elor