English
Related papers

Related papers: Audio Outperforms Text for Visual Decoding

200 papers

Multimodal large language models have fueled progress in image captioning. These models, fine-tuned on vast image datasets, exhibit a deep understanding of semantic concepts. In this work, we show that this ability can be re-purposed for…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-10 Hugo Malard , Michel Olvera , Stéphane Lathuiliere , Slim Essid

Supervised learning methods can solve the given problem in the presence of a large set of labeled data. However, the acquisition of a dataset covering all the target classes typically requires manual labeling which is expensive and…

Sound · Computer Science 2022-06-13 Duygu Dogan , Huang Xie , Toni Heittola , Tuomas Virtanen

The development of algorithms to accurately decode neural information has long been a research focus in the field of neuroscience. Brain decoding typically involves training machine learning models to map neural data onto a preestablished…

Predicting brain activity in response to naturalistic, multimodal stimuli is a key challenge in computational neuroscience. While encoding models are becoming more powerful, their ability to generalize to truly novel contexts remains a…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Hamid Abdollahi , Amir Hossein Mansouri Majoumerd , Amir Hossein Bagheri Baboukani , Amir Abolfazl Suratgar , Mohammad Bagher Menhaj

Visual neural decoding aims to extract and interpret original visual experiences directly from human brain activity. Recent studies have demonstrated the feasibility of decoding visual semantic categories from electroencephalography (EEG)…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Hongzhou Chen , Lianghua He , Yihang Liu , Longzhen Yang , Shaohua Shang , MengChu Zhou

Decoding sensory experiences from neural activity to reconstruct human-perceived visual stimuli and semantic content remains a challenge in neuroscience and artificial intelligence. Despite notable progress in current brain decoding models,…

Neurons and Cognition · Quantitative Biology 2025-10-13 Feihan Feng , Jingxin Nie

There has been a long-standing quest for a unified audio-visual-text model to enable various multimodal understanding tasks, which mimics the listening, seeing and reading process of human beings. Humans tends to represent knowledge using…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-22 Xianghu Yue , Xiaohai Tian , Lu Lu , Malu Zhang , Zhizheng Wu , Haizhou Li

Brain decoding is a field of computational neuroscience that uses measurable brain activity to infer mental states or internal representations of perceptual inputs. Therefore, we propose a novel approach to brain decoding that also relies…

Computer Vision and Pattern Recognition · Computer Science 2023-03-23 Matteo Ferrante , Tommaso Boccato , Nicola Toschi

Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. We introduce a tri-modal contrastive…

Machine Learning · Computer Science 2026-05-26 Zexuan Chen , Sichao Liu , Runhao Lu , Huichao Qi , Alexandra Woolgar , Xi Vincent Wang , Lihui Wang

Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

Multi-modal word semantics aims to enhance embeddings with perceptual input, assuming that human meaning representation is grounded in sensory experience. Most research focuses on evaluation involving direct visual input, however, visual…

Computation and Language · Computer Science 2021-10-07 Anita L. Verő , Ann Copestake

Neural decoding, the process of understanding how brain activity corresponds to different stimuli, has been a primary objective in cognitive sciences. Over the past three decades, advances in functional Magnetic Resonance Imaging (fMRI) and…

Computer Vision and Pattern Recognition · Computer Science 2026-01-28 Yanchen Wang , Adam Turnbull , Tiange Xiang , Yunlong Xu , Sa Zhou , Adnan Masoud , Shekoofeh Azizi , Feng Vankee Lin , Ehsan Adeli

A proper semantic representation for encoding side information is key to the success of zero-shot learning. In this paper, we explore two alternative semantic representations especially for zero-shot human action recognition: textual…

Computer Vision and Pattern Recognition · Computer Science 2017-06-29 Qian Wang , Ke Chen

Recent focus in video captioning has been on designing architectures that can consume both video and text modalities, and using large-scale video datasets with text transcripts for pre-training, such as HowTo100M. Though these approaches…

Computer Vision and Pattern Recognition · Computer Science 2023-06-23 Yuhan Shen , Linjie Yang , Longyin Wen , Haichao Yu , Ehsan Elhamifar , Heng Wang

Recent work shows that documents from encyclopedias serve as helpful auxiliary information for zero-shot learning. Existing methods align the entire semantics of a document with corresponding images to transfer knowledge. However, they…

Computer Vision and Pattern Recognition · Computer Science 2024-07-24 Xiangyan Qu , Jing Yu , Keke Gai , Jiamin Zhuang , Yuanmin Tang , Gang Xiong , Gaopeng Gou , Qi Wu

Biological research has revealed that the verbal semantic information in the brain cortex, as an additional source, participates in nonverbal semantic tasks, such as visual encoding. However, previous visual encoding models did not…

Computer Vision and Pattern Recognition · Computer Science 2023-08-30 Shuxiao Ma , Linyuan Wang , Bin Yan

Multi-modal learning, particularly among imaging and linguistic modalities, has made amazing strides in many high-level fundamental visual understanding problems, ranging from language grounding to dense event captioning. However, much of…

Computer Vision and Pattern Recognition · Computer Science 2019-10-28 Tanzila Rahman , Bicheng Xu , Leonid Sigal

In this work, we propose an innovative framework that integrates EEG, image, and text data, aiming to decode visual neural representations from low signal-to-noise ratio EEG signals. Specifically, we introduce text modality to enhance the…

Computer Vision and Pattern Recognition · Computer Science 2025-09-04 Kaili sun , Xingyu Miao , Bing Zhai , Haoran Duan , Yang Long

We present SEED (Semantic Evaluation for Visual Brain Decoding), a novel metric for evaluating the semantic decoding performance of visual brain decoding models. It integrates three complementary metrics, each capturing a different aspect…

Computer Vision and Pattern Recognition · Computer Science 2026-02-25 Juhyeon Park , Peter Yongho Kim , Jiook Cha , Shinjae Yoo , Taesup Moon

Deep learning is leading to major advances in the realm of brain decoding from functional Magnetic Resonance Imaging (fMRI). However, the large inter-subject variability in brain characteristics has limited most studies to train models on…

Machine Learning · Computer Science 2023-12-12 Alexis Thual , Yohann Benchetrit , Felix Geilert , Jérémy Rapin , Iurii Makarov , Hubert Banville , Jean-Rémi King
‹ Prev 1 2 3 10 Next ›