Related papers: Learning to aggregate feature representations

Exploration and Comparison of Deep Learning Architectures to Predict Brain Response to Realistic Pictures

We present an exploration of machine learning architectures for predicting brain responses to realistic images on occasion of the Algonauts Challenge 2023. Our research involved extensive experimentation with various pretrained models.…

Neurons and Cognition · Quantitative Biology 2023-09-20 Riccardo Chimisso , Sathya Buršić , Paolo Marocco , Giuseppe Vizzari , Dimitri Ognibene

The advent of deep learning has a profound effect on visual neuroscience. It paved the way for new models to predict neural data. Although deep convolutional neural networks are explicitly trained for categorization, they learn a…

Neurons and Cognition · Quantitative Biology 2019-07-08 Aakash Agrawal

The Algonauts Project 2025 Challenge: How the Human Brain Makes Sense of Multimodal Movies

There is growing symbiosis between artificial and biological intelligence sciences: neural principles inspire new intelligent machines, which are in turn used to advance our theoretical understanding of the brain. To promote further…

Neurons and Cognition · Quantitative Biology 2025-01-07 Alessandro T. Gifford , Domenic Bersch , Marie St-Laurent , Basile Pinsard , Julie Boyle , Lune Bellec , Aude Oliva , Gemma Roig , Radoslaw M. Cichy

Learning Modality-Aware Representations: Adaptive Group-wise Interaction Network for Multimodal MRI Synthesis

Multimodal MR image synthesis aims to generate missing modality images by effectively fusing and mapping from a subset of available MRI modalities. Most existing methods adopt an image-to-image translation paradigm, treating multiple…

Image and Video Processing · Electrical Eng. & Systems 2025-04-29 Tao Song , Yicheng Wu , Minhao Hu , Xiangde Luo , Linda Wei , Guotai Wang , Yi Guo , Feng Xu , Shaoting Zhang

Self-Supervised Model Adaptation for Multimodal Semantic Segmentation

Learning to reliably perceive and understand the scene is an integral enabler for robots to operate in the real-world. This problem is inherently challenging due to the multitude of object types as well as appearance changes caused by…

Computer Vision and Pattern Recognition · Computer Science 2021-11-05 Abhinav Valada , Rohit Mohan , Wolfram Burgard

A Multimodal Seq2Seq Transformer for Predicting Brain Responses to Naturalistic Stimuli

The Algonauts 2025 Challenge called on the community to develop encoding models that predict whole-brain fMRI responses to naturalistic multimodal movies. In this submission, we propose a sequence-to-sequence Transformer that…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Qianyi He , Yuan Chang Leong

Multi-layer Feature Aggregation for Deep Scene Parsing Models

Scene parsing from images is a fundamental yet challenging problem in visual content understanding. In this dense prediction task, the parsing model assigns every pixel to a categorical label, which requires the contextual information of…

Computer Vision and Pattern Recognition · Computer Science 2020-11-06 Litao Yu , Yongsheng Gao , Jun Zhou , Jian Zhang , Qiang Wu

Learning an Adaptation Function to Assess Image Visual Similarities

Human perception is routinely assessing the similarity between images, both for decision making and creative thinking. But the underlying cognitive process is not really well understood yet, hence difficult to be mimicked by computer vision…

Computer Vision and Pattern Recognition · Computer Science 2022-06-06 Olivier Risser-Maroix , Amine Marzouki , Hala Djeghim , Camille Kurtz , Nicolas Lomenie

The ISLab Solution to the Algonauts Challenge 2025: A Multimodal Deep Learning Approach to Brain Response Prediction

In this work, we present a network-specific approach for predicting brain responses to complex multimodal movies, leveraging the Yeo 7-network parcellation of the Schaefer atlas. Rather than treating the brain as a homogeneous system, we…

Neurons and Cognition · Quantitative Biology 2025-10-28 Andrea Corsico , Giorgia Rigamonti , Simone Zini , Luigi Celona , Paolo Napoletano

OctopusNet: A Deep Learning Segmentation Network for Multi-modal Medical Images

Deep learning models, such as the fully convolutional network (FCN), have been widely used in 3D biomedical segmentation and achieved state-of-the-art performance. Multiple modalities are often used for disease diagnosis and quantification.…

Image and Video Processing · Electrical Eng. & Systems 2019-08-23 Yu Chen , Jiawei Chen , Dong Wei , Yuexiang Li , Yefeng Zheng

Early Fusion of Features for Semantic Segmentation

This paper introduces a novel segmentation framework that integrates a classifier network with a reverse HRNet architecture for efficient image segmentation. Our approach utilizes a ResNet-50 backbone, pretrained in a semi-supervised…

Computer Vision and Pattern Recognition · Computer Science 2024-02-12 Anupam Gupta , Ashok Krishnamurthy , Lisa Singh

Efficient Adaptive Ensembling for Image Classification

In recent times, with the exception of sporadic cases, the trend in Computer Vision is to achieve minor improvements compared to considerable increases in complexity. To reverse this trend, we propose a novel method to boost image…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Antonio Bruno , Davide Moroni , Massimo Martinelli

Feature Aggregation Network for Video Face Recognition

This paper aims to learn a compact representation of a video for video face recognition task. We make the following contributions: first, we propose a meta attention-based aggregation scheme which adaptively and fine-grained weighs the…

Computer Vision and Pattern Recognition · Computer Science 2019-09-13 Zhaoxiang Liu , Huan Hu , Jinqiang Bai , Shaohua Li , Shiguo Lian

FANet: Quality-Aware Feature Aggregation Network for Robust RGB-T Tracking

This paper investigates how to perform robust visual tracking in adverse and challenging conditions using complementary visual and thermal infrared data (RGBT tracking). We propose a novel deep network architecture called qualityaware…

Computer Vision and Pattern Recognition · Computer Science 2019-10-15 Yabin Zhu , Chenglong Li , Bin Luo , Jin Tang

Achieving More Human Brain-Like Vision via Human EEG Representational Alignment

Despite advancements in artificial intelligence, object recognition models still lag behind in emulating visual information processing in human brains. Recent studies have highlighted the potential of using neural data to mimic brain…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 Zitong Lu , Yile Wang , Julie D. Golomb

MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning

While deep learning has achieved phenomenal successes in many AI applications, its enormous model size and intensive computation requirements pose a formidable challenge to the deployment in resource-limited nodes. There has recently been…

Machine Learning · Computer Science 2020-12-01 Sen Lin , Li Yang , Zhezhi He , Deliang Fan , Junshan Zhang

MergeNet: A Deep Net Architecture for Small Obstacle Discovery

We present here, a novel network architecture called MergeNet for discovering small obstacles for on-road scenes in the context of autonomous driving. The basis of the architecture rests on the central consideration of training with less…

Computer Vision and Pattern Recognition · Computer Science 2018-03-20 Krishnam Gupta , Syed Ashar Javed , Vineet Gandhi , K. Madhava Krishna

AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation

Most of the achievements in artificial intelligence so far were accomplished by supervised learning which requires numerous annotated training data and thus costs innumerable manpower for labeling. Unsupervised learning is one of the…

Computer Vision and Pattern Recognition · Computer Science 2021-06-14 Mingxiang Chen , Zhanguo Chang , Haonan Lu , Bitao Yang , Zhuang Li , Liufang Guo , Zhecheng Wang

Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation

RGB and thermal image fusion have great potential to exhibit improved semantic segmentation in low-illumination conditions. Existing methods typically employ a two-branch encoder framework for multimodal feature extraction and design…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Zhengwen Shen , Yulian Li , Han Zhang , Yuchen Weng , Jun Wang

Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025)

Accurately predicting distributed cortical responses to naturalistic stimuli requires models that integrate visual, auditory and semantic information over time. We present a hierarchical multimodal recurrent ensemble that maps pretrained…

Neurons and Cognition · Quantitative Biology 2025-10-30 Semih Eren , Deniz Kucukahmetler , Nico Scherf