Related papers: Cross-view Brain Decoding

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

Large-scale pre-trained multi-modal models (e.g., CLIP) demonstrate strong zero-shot transfer capability in many discriminative tasks. Their adaptation to zero-shot image-conditioned text generation tasks has drawn increasing interest.…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Wei Li , Linchao Zhu , Longyin Wen , Yi Yang

Multi-task Learning with Cross Attention for Keyword Spotting

Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-23 Takuya Higuchi , Anmol Gupta , Chandra Dhir

Audio Outperforms Text for Visual Decoding

Decoding visual semantic representations from human brain activity is a significant challenge. While recent zero-shot decoding approaches have improved performance by leveraging aligned image-text datasets, they overlook a fundamental…

Neurons and Cognition · Quantitative Biology 2026-01-21 Zhengdi Zhang , Hao Zhang , Wenjun Xia

Interpreting and Analysing CLIP's Zero-Shot Image Classification via Mutual Knowledge

Contrastive Language-Image Pretraining (CLIP) performs zero-shot image classification by mapping images and textual class representation into a shared embedding space, then retrieving the class closest to the image. This work provides a new…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Fawaz Sammani , Nikos Deligiannis

Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing

Building a semantic parser quickly in a new domain is a fundamental challenge for conversational interfaces, as current semantic parsers require expensive supervision and lack the ability to generalize to new domains. In this paper, we…

Computation and Language · Computer Science 2018-09-25 Jonathan Herzig , Jonathan Berant

Brain-aligning of semantic vectors improves neural decoding of visual stimuli

The development of algorithms to accurately decode neural information has long been a research focus in the field of neuroscience. Brain decoding typically involves training machine learning models to map neural data onto a preestablished…

Neurons and Cognition · Quantitative Biology 2025-12-03 Shirin Vafaei , Ryohei Fukuma , Takufumi Yanagisawa , Huixiang Yang , Satoru Oshino , Naoki Tani , Hui Ming Khoo , Hidenori Sugano , Yasushi Iimura , Hiroharu Suzuki , Madoka Nakajima , Kentaro Tamura , Haruhiko Kishima

Cross-View Completion Models are Zero-shot Correspondence Estimators

In this work, we explore new perspectives on cross-view completion learning by drawing an analogy to self-supervised correspondence learning. Through our analysis, we demonstrate that the cross-attention map within cross-view completion…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Honggyu An , Jinhyeon Kim , Seonghoon Park , Jaewoo Jung , Jisang Han , Sunghwan Hong , Seungryong Kim

Cross-Image Attention for Zero-Shot Appearance Transfer

Recent advancements in text-to-image generative models have demonstrated a remarkable ability to capture a deep semantic understanding of images. In this work, we leverage this semantic knowledge to transfer the visual appearance between…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Yuval Alaluf , Daniel Garibi , Or Patashnik , Hadar Averbuch-Elor , Daniel Cohen-Or

ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences. While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Yoad Tewel , Yoav Shalev , Idan Schwartz , Lior Wolf

Semantic Brain Decoding: from fMRI to conceptually similar image reconstruction of visual stimuli

Brain decoding is a field of computational neuroscience that uses measurable brain activity to infer mental states or internal representations of perceptual inputs. Therefore, we propose a novel approach to brain decoding that also relies…

Computer Vision and Pattern Recognition · Computer Science 2023-03-23 Matteo Ferrante , Tommaso Boccato , Nicola Toschi

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. A field-wide goal is to achieve…

Machine Learning · Computer Science 2026-04-10 Mu Nan , Muquan Yu , Weijian Mai , Jacob S. Prince , Hossein Adeli , Rui Zhang , Jiahang Cao , Benjamin Becker , John A. Pyles , Margaret M. Henderson , Chunfeng Song , Nikolaus Kriegeskorte , Michael J. Tarr , Xiaoqing Hu , Andrew F. Luo

Cross-Lingual Transfer Learning for Complex Word Identification

Complex Word Identification (CWI) is a task centered on detecting hard-to-understand words, or groups of words, in texts from different areas of expertise. The purpose of CWI is to highlight problematic structures that non-native speakers…

Computation and Language · Computer Science 2020-10-05 George-Eduard Zaharia , Dumitru-Clementin Cercel , Mihai Dascalu

Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments

Zero-shot scene understanding in real-world settings presents major challenges due to the complexity and variability of natural scenes, where models must recognize new objects, actions, and contexts without prior labeled examples. This work…

Computer Vision and Pattern Recognition · Computer Science 2025-10-30 Manjunath Prasad Holenarasipura Rajiv , B. M. Vidyavathi

MeaCap: Memory-Augmented Zero-shot Image Captioning

Zero-shot image captioning (IC) without well-paired image-text data can be divided into two categories, training-free and text-only-training. Generally, these two types of methods realize zero-shot IC by integrating pretrained…

Computer Vision and Pattern Recognition · Computer Science 2024-03-07 Zequn Zeng , Yan Xie , Hao Zhang , Chiyu Chen , Zhengjue Wang , Bo Chen

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Wenhu Chen , Aurelien Lucchi , Thomas Hofmann

Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages

Multilingual neural machine translation systems learn to map sentences of different languages into a common representation space. Intuitively, with a growing number of seen languages the encoder sentence representation grows more flexible…

Computation and Language · Computer Science 2024-08-06 Carlos Mullov , Ngoc-Quan Pham , Alexander Waibel

Towards Zero-shot Cross-lingual Image Retrieval and Tagging

There has been a recent spike in interest in multi-modal Language and Vision problems. On the language side, most of these models primarily focus on English since most multi-modal datasets are monolingual. We try to bridge this gap with a…

Machine Learning · Computer Science 2021-09-17 Pranav Aggarwal , Ritiz Tambi , Ajinkya Kale

Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models

Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks such as translation and multilingual word sense disambiguation (WSD). However, they often struggle at disambiguating…

Computation and Language · Computer Science 2023-04-28 Haoqiang Kang , Terra Blevins , Luke Zettlemoyer

Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features

The many-to-many multilingual neural machine translation can be regarded as the process of integrating semantic features from the source sentences and linguistic features from the target sentences. To enhance zero-shot translation, models…

Computation and Language · Computer Science 2024-08-05 Mengyu Bu , Shuhao Gu , Yang Feng

Visual Concepts Tokenization

Obtaining the human-like perception ability of abstracting visual concepts from concrete pixels has always been a fundamental and important target in machine learning research fields such as disentangled representation learning and scene…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Tao Yang , Yuwang Wang , Yan Lu , Nanning Zheng