Related papers: Learning to aggregate feature representations
We present an exploration of machine learning architectures for predicting brain responses to realistic images on occasion of the Algonauts Challenge 2023. Our research involved extensive experimentation with various pretrained models.…
The advent of deep learning has a profound effect on visual neuroscience. It paved the way for new models to predict neural data. Although deep convolutional neural networks are explicitly trained for categorization, they learn a…
There is growing symbiosis between artificial and biological intelligence sciences: neural principles inspire new intelligent machines, which are in turn used to advance our theoretical understanding of the brain. To promote further…
Multimodal MR image synthesis aims to generate missing modality images by effectively fusing and mapping from a subset of available MRI modalities. Most existing methods adopt an image-to-image translation paradigm, treating multiple…
Learning to reliably perceive and understand the scene is an integral enabler for robots to operate in the real-world. This problem is inherently challenging due to the multitude of object types as well as appearance changes caused by…
The Algonauts 2025 Challenge called on the community to develop encoding models that predict whole-brain fMRI responses to naturalistic multimodal movies. In this submission, we propose a sequence-to-sequence Transformer that…
Scene parsing from images is a fundamental yet challenging problem in visual content understanding. In this dense prediction task, the parsing model assigns every pixel to a categorical label, which requires the contextual information of…
Human perception is routinely assessing the similarity between images, both for decision making and creative thinking. But the underlying cognitive process is not really well understood yet, hence difficult to be mimicked by computer vision…
In this work, we present a network-specific approach for predicting brain responses to complex multimodal movies, leveraging the Yeo 7-network parcellation of the Schaefer atlas. Rather than treating the brain as a homogeneous system, we…
Deep learning models, such as the fully convolutional network (FCN), have been widely used in 3D biomedical segmentation and achieved state-of-the-art performance. Multiple modalities are often used for disease diagnosis and quantification.…
This paper introduces a novel segmentation framework that integrates a classifier network with a reverse HRNet architecture for efficient image segmentation. Our approach utilizes a ResNet-50 backbone, pretrained in a semi-supervised…
In recent times, with the exception of sporadic cases, the trend in Computer Vision is to achieve minor improvements compared to considerable increases in complexity. To reverse this trend, we propose a novel method to boost image…
This paper aims to learn a compact representation of a video for video face recognition task. We make the following contributions: first, we propose a meta attention-based aggregation scheme which adaptively and fine-grained weighs the…
This paper investigates how to perform robust visual tracking in adverse and challenging conditions using complementary visual and thermal infrared data (RGBT tracking). We propose a novel deep network architecture called qualityaware…
Despite advancements in artificial intelligence, object recognition models still lag behind in emulating visual information processing in human brains. Recent studies have highlighted the potential of using neural data to mimic brain…
While deep learning has achieved phenomenal successes in many AI applications, its enormous model size and intensive computation requirements pose a formidable challenge to the deployment in resource-limited nodes. There has recently been…
We present here, a novel network architecture called MergeNet for discovering small obstacles for on-road scenes in the context of autonomous driving. The basis of the architecture rests on the central consideration of training with less…
Most of the achievements in artificial intelligence so far were accomplished by supervised learning which requires numerous annotated training data and thus costs innumerable manpower for labeling. Unsupervised learning is one of the…
RGB and thermal image fusion have great potential to exhibit improved semantic segmentation in low-illumination conditions. Existing methods typically employ a two-branch encoder framework for multimodal feature extraction and design…
Accurately predicting distributed cortical responses to naturalistic stimuli requires models that integrate visual, auditory and semantic information over time. We present a hierarchical multimodal recurrent ensemble that maps pretrained…