Related papers: Multiple Object Recognition with Visual Attention
We propose augmenting deep neural networks with an attention mechanism for the visual object detection task. As perceiving a scene, humans have the capability of multiple fixation points, each attended to scene content at different…
The visual system processes a scene using a sequence of selective glimpses, each driven by spatial and object-based attention. These glimpses reflect what is relevant to the ongoing task and are selected through recurrent processing and…
Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data. Recently, significant efforts have been made to exploit attention schemes to advance computer vision…
Object-based attention is a key component of the visual system, relevant for perception, learning, and memory. Neurons tuned to features of attended objects tend to be more active than those associated with non-attended objects. There is a…
Active vision is inherently attention-driven: The agent actively selects views to attend in order to fast achieve the vision task while improving its internal representation of the scene being observed. Inspired by the recent success of…
Object co-segmentation is the task of segmenting the same objects from multiple images. In this paper, we propose the Attention Based Object Co-Segmentation for object co-segmentation that utilize a novel attention mechanism in the…
Fine-grained visual recognition typically depends on modeling subtle difference from object parts. However, these parts often exhibit dramatic visual variations such as occlusions, viewpoints, and spatial transformations, making it hard to…
Attention-based learning for fine-grained image recognition remains a challenging task, where most of the existing methods treat each object part in isolation, while neglecting the correlations among them. In addition, the multi-stage or…
The demand for accurate food quantification has increased in the recent years, driven by the needs of applications in dietary monitoring. At the same time, computer vision approaches have exhibited great potential in automating tasks within…
One of the greatest challenges for detecting moving objects in the solar system from wide-field survey data is determining whether a signal indicates a true object or is due to some other source, like noise. Object verification has relied…
We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an improved attention-based architecture for multiple object recognition. The proposed model is a fully differentiable unit that can be optimized end-to-end by using…
Humans are very good at directing their visual attention toward relevant areas when they search for different types of objects. For instance, when we search for cars, we will look at the streets, not at the top of buildings. The motivation…
Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of…
Human decision-making often relies on visual information from multiple perspectives or views. In contrast, machine learning-based object recognition utilizes information from a single image of the object. However, the information conveyed…
Fine-grained object recognition concerns the identification of the type of an object among a large number of closely related sub-categories. Multisource data analysis, that aims to leverage the complementary spectral, spatial, and…
Fine-grained classification is challenging because categories can only be discriminated by subtle and local differences. Variances in the pose, scale or rotation usually make the problem more difficult. Most fine-grained classification…
With the aim of promoting and understanding the multilingual version of image search, we leverage visual object detection and propose a model with diverse multi-head attention to learn grounded multilingual multimodal representations.…
The understanding of where humans look in a scene is a problem of great interest in visual perception and computer vision. When eye-tracking devices are not a viable option, models of human attention can be used to predict fixations. In…
We introduce a new architecture for unsupervised object-centric representation learning and multi-object detection and segmentation, which uses a translation-equivariant attention mechanism to predict the coordinates of the objects present…
We propose a novel attention model that can accurately attends to target objects of various scales and shapes in images. The model is trained to gradually suppress irrelevant regions in an input image via a progressive attentive process…