Related papers: Deep Multimodal Representation Learning from Tempo…

Latent Variable Algorithms for Multimodal Learning and Sensor Fusion

Multimodal learning has been lacking principled ways of combining information from different modalities and learning a low-dimensional manifold of meaningful representations. We study multimodal learning and sensor fusion from a latent…

Machine Learning · Computer Science 2019-04-24 Lijiang Guo

CARRNN: A Continuous Autoregressive Recurrent Neural Network for Deep Representation Learning from Sporadic Temporal Data

Learning temporal patterns from multivariate longitudinal data is challenging especially in cases when data is sporadic, as often seen in, e.g., healthcare applications where the data can suffer from irregularity and asynchronicity as the…

Machine Learning · Computer Science 2021-04-09 Mostafa Mehdipour Ghazi , Lauge Sørensen , Sébastien Ourselin , Mads Nielsen

GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion

Data generated from real world events are usually temporal and contain multimodal information such as audio, visual, depth, sensor etc. which are required to be intelligently combined for classification tasks. In this paper, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2016-09-20 Ankit Gandhi , Arjun Sharma , Arijit Biswas , Om Deshmukh

Application of Multimodal Fusion Deep Learning Model in Disease Recognition

This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Xiaoyi Liu , Hongjie Qiu , Muqing Li , Zhou Yu , Yutian Yang , Yafeng Yan

A Temporal Kernel Approach for Deep Learning with Continuous-time Information

Sequential deep learning models such as RNN, causal CNN and attention mechanism do not readily consume continuous-time information. Discretizing the temporal data, as we show, causes inconsistency even for simple continuous-time processes.…

Machine Learning · Computer Science 2021-03-30 Da Xu , Chuanwei Ruan , Evren Korpeoglu , Sushant Kumar , Kannan Achan

Common Representation Learning Using Step-based Correlation Multi-Modal CNN

Deep learning techniques have been successfully used in learning a common representation for multi-view data, wherein the different modalities are projected onto a common subspace. In a broader perspective, the techniques used to…

Computer Vision and Pattern Recognition · Computer Science 2017-11-02 Gaurav Bhatt , Piyush Jha , Balasubramanian Raman

Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition

This paper presents a novel deep neural network (DNN) for multimodal fusion of audio, video and text modalities for emotion recognition. The proposed DNN architecture has independent and shared layers which aim to learn the representation…

Computer Vision and Pattern Recognition · Computer Science 2019-07-09 Juan D. S. Ortega , Mohammed Senoussaoui , Eric Granger , Marco Pedersoli , Patrick Cardinal , Alessandro L. Koerich

Dense Multimodal Fusion for Hierarchically Joint Representation

Multiple modalities can provide more valuable information than single one by describing the same contents in various ways. Hence, it is highly expected to learn effective joint representation by fusing the features of different modalities.…

Computer Vision and Pattern Recognition · Computer Science 2018-10-09 Di Hu , Feiping Nie , Xuelong Li

Correlation Net: Spatiotemporal multimodal deep learning for action recognition

This paper describes a network that captures multimodal correlations over arbitrary timestamps. The proposed scheme operates as a complementary, extended network over a multimodal convolutional neural network (CNN). Spatial and temporal…

Computer Vision and Pattern Recognition · Computer Science 2019-12-17 Novanto Yudistira , Takio Kurita

Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks

With the rising number of interconnected devices and sensors, modeling distributed sensor networks is of increasing interest. Recurrent neural networks (RNN) are considered particularly well suited for modeling sensory and streaming data.…

Machine Learning · Computer Science 2017-11-15 Stephan Baier , Sigurd Spieckermann , Volker Tresp

Revisiting Temporal Modeling for Video Super-resolution

Video super-resolution plays an important role in surveillance video analysis and ultra-high-definition video display, which has drawn much attention in both the research and industrial communities. Although many deep learning-based VSR…

Image and Video Processing · Electrical Eng. & Systems 2020-08-21 Takashi Isobe , Fang Zhu , Xu Jia , Shengjin Wang

Deep Multimodal Learning: An Effective Method for Video Classification

Videos have become ubiquitous on the Internet. And video analysis can provide lots of information for detecting and recognizing objects as well as help people understand human actions and interactions with the real world. However, facing…

Computer Vision and Pattern Recognition · Computer Science 2018-12-03 Tianqi Zhao

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification

Videos are inherently multimodal. This paper studies the problem of how to fully exploit the abundant multimodal clues for improved video categorization. We introduce a hybrid deep learning framework that integrates useful clues from…

Multimedia · Computer Science 2017-06-15 Yu-Gang Jiang , Zuxuan Wu , Jinhui Tang , Zechao Li , Xiangyang Xue , Shih-Fu Chang

Deep Structured Neural Network for Event Temporal Relation Extraction

We propose a novel deep structured learning framework for event temporal relation extraction. The model consists of 1) a recurrent neural network (RNN) to learn scoring functions for pair-wise relations, and 2) a structured support vector…

Computation and Language · Computer Science 2019-09-26 Rujun Han , I-Hung Hsu , Mu Yang , Aram Galstyan , Ralph Weischedel , Nanyun Peng

Audio Visual Speech Recognition using Deep Recurrent Neural Networks

In this work, we propose a training algorithm for an audio-visual automatic speech recognition (AV-ASR) system using deep recurrent neural network (RNN).First, we train a deep RNN acoustic model with a Connectionist Temporal Classification…

Computer Vision and Pattern Recognition · Computer Science 2016-11-10 Abhinav Thanda , Shankar M Venkatesan

Multimodal Residual Learning for Visual QA

Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the…

Computer Vision and Pattern Recognition · Computer Science 2016-09-01 Jin-Hwa Kim , Sang-Woo Lee , Dong-Hyun Kwak , Min-Oh Heo , Jeonghee Kim , Jung-Woo Ha , Byoung-Tak Zhang

Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction

Continuous dimensional emotion prediction is a challenging task where the fusion of various modalities usually achieves state-of-the-art performance such as early fusion or late fusion. In this paper, we propose a novel multi-modal fusion…

Computer Vision and Pattern Recognition · Computer Science 2017-09-08 Shizhe Chen , Qin Jin

Multi-level Attention Fusion Network for Audio-visual Event Recognition

Event classification is inherently sequential and multimodal. Therefore, deep neural models need to dynamically focus on the most relevant time window and/or modality of a video. In this study, we propose the Multi-level Attention Fusion…

Computer Vision and Pattern Recognition · Computer Science 2021-06-15 Mathilde Brousmiche , Jean Rouat , Stéphane Dupont

Deep Learning Approach to Bearing and Induction Motor Fault Diagnosis via Data Fusion

Convolutional Neural Networks (CNNs) are used to evaluate accelerometer and microphone data for bearing and induction motor diagnosis. A Long Short-Term Memory (LSTM) recurrent neural network is used to combine sensor information…

Machine Learning · Computer Science 2025-06-16 Mert Sehri , Merve Ertagrin , Ozal Yildirim , Ahmet Orhan , Patrick Dumond

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise.…

Computer Vision and Pattern Recognition · Computer Science 2016-06-02 Jeff Donahue , Lisa Anne Hendricks , Marcus Rohrbach , Subhashini Venugopalan , Sergio Guadarrama , Kate Saenko , Trevor Darrell