Related papers: Memory Fusion Network for Multi-view Sequential Le…
One of the core tasks in multi-view learning is to capture relations among views. For sequential data, the relations not only span across views, but also extend throughout the view length to form long-term intra-view and inter-view…
Long Short-Term Memory (LSTM) is a prominent recurrent neural network for extracting dependencies from sequential data such as time-series and multi-view data, having achieved impressive results for different visual recognition tasks. A…
In practical applications, multi-view data depicting objectives from assorted perspectives can facilitate the accuracy increase of learning algorithms. However, given multi-view data, there is limited work for learning discriminative node…
Multimodal learning has been lacking principled ways of combining information from different modalities and learning a low-dimensional manifold of meaningful representations. We study multimodal learning and sensor fusion from a latent…
Deep neural networks show great potential for automating various visual quality inspection tasks in manufacturing. However, their applicability is limited in more volatile scenarios, such as remanufacturing, where the inspected products and…
This paper presents a novel neural model - Dynamic Fusion Network (DFN), for machine reading comprehension (MRC). DFNs differ from most state-of-the-art models in their use of a dynamic multi-strategy attention process, in which passages,…
We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning…
Convolution neural network (CNN) has been widely used in Single Image Super Resolution (SISR) so that SISR has been a great success recently. As the network deepens, the learning ability of network becomes more and more powerful. However,…
In this paper we address the problem of human action recognition from video sequences. Inspired by the exemplary results obtained via automatic feature learning and deep learning approaches in computer vision, we focus our attention towards…
Event classification is inherently sequential and multimodal. Therefore, deep neural models need to dynamically focus on the most relevant time window and/or modality of a video. In this study, we propose the Multi-level Attention Fusion…
Aiming at the limitation that deep long and short-term memory network(DLSTM) algorithm cannot perform parallel computing and cannot obtain global information, in this paper, feature extraction and feature processing are firstly carried out…
Continuous dimensional emotion prediction is a challenging task where the fusion of various modalities usually achieves state-of-the-art performance such as early fusion or late fusion. In this paper, we propose a novel multi-modal fusion…
In this paper, we propose a novel neural network structure, namely \emph{feedforward sequential memory networks (FSMN)}, to model long-term dependency in time series without using recurrent feedback. The proposed FSMN is a standard…
Deep learning and Convolutional Neural Networks (CNNs) have driven major transformations in diverse research areas. However, their limitations in handling low-frequency information present obstacles in certain tasks like interpreting global…
We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback. The proposed FSMN is a standard feedforward neural…
Anticipating the future actions of a human is a widely studied problem in robotics that requires spatio-temporal reasoning. In this work we propose a deep learning approach for anticipation in sensory-rich robotics applications. We…
In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications. When the available modalities consist of time series data such…
Deep networks can learn to accurately recognize objects of a category by training on a large number of annotated images. However, a meta-learning challenge known as a low-shot image recognition task comes when only a few images with…
Deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition. In this study, we investigate various aspects of…
Human action recognition is one of the challenging tasks in computer vision. The current action recognition methods use computationally expensive models for learning spatio-temporal dependencies of the action. Models utilizing RGB channels…