Related papers: Decoding fMRI Data into Captions using Prefix Lang…

Brain Captioning: Decoding human brain activity into images and text

Every day, the human brain processes an immense volume of visual information, relying on intricate neural mechanisms to perceive and interpret these stimuli. Recent breakthroughs in functional magnetic resonance imaging (fMRI) have enabled…

Computer Vision and Pattern Recognition · Computer Science 2023-05-22 Matteo Ferrante , Furkan Ozcelik , Tommaso Boccato , Rufin VanRullen , Nicola Toschi

ClipCap: CLIP Prefix for Image Captioning

Image captioning is a fundamental task in vision-language understanding, where the model predicts a textual informative caption to a given input image. In this paper, we present a simple approach to address this task. We use CLIP encoding…

Computer Vision and Pattern Recognition · Computer Science 2021-11-19 Ron Mokady , Amir Hertz , Amit H. Bermano

DreamCatcher: Revealing the Language of the Brain with fMRI using GPT Embedding

The human brain possesses remarkable abilities in visual processing, including image recognition and scene summarization. Efforts have been made to understand the cognitive capacities of the visual brain, but a comprehensive understanding…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Subhrasankar Chatterjee , Debasis Samanta

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Sulabh Katiyar , Samir Kumar Borgohain

fMRI Semantic Category Decoding using Linguistic Encoding of Word Embeddings

The dispute of how the human brain represents conceptual knowledge has been argued in many scientific fields. Brain imaging studies have shown that the spatial patterns of neural activation in the brain are correlated with thinking about…

Neurons and Cognition · Quantitative Biology 2018-06-15 Subba Reddy Oota , Naresh Manwani , Bapi Raju S

A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning

Generating textual descriptions for images has been an attractive problem for the computer vision and natural language processing researchers in recent years. Dozens of models based on deep learning have been proposed to solve this problem.…

Computer Vision and Pattern Recognition · Computer Science 2019-07-01 Ahmad Asadi , Reza Safabakhsh

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance…

Computer Vision and Pattern Recognition · Computer Science 2017-04-14 Zhou Ren , Xiaoyu Wang , Ning Zhang , Xutao Lv , Li-Jia Li

Transformers in Medicine: Improving Vision-Language Alignment for Medical Image Captioning

We present a transformer-based multimodal framework for generating clinically relevant captions for MRI scans. Our system combines a DEiT-Small vision transformer as an image encoder, MediCareBERT for caption embedding, and a custom…

Image and Video Processing · Electrical Eng. & Systems 2025-11-03 Yogesh Thakku Suresh , Vishwajeet Shivaji Hogale , Luca-Alexandru Zamfira , Anandavardhana Hegde

RefineCap: Concept-Aware Refinement for Image Captioning

Automatically translating images to texts involves image scene understanding and language modeling. In this paper, we propose a novel model, termed RefineCap, that refines the output vocabulary of the language decoder using decoder-guided…

Computation and Language · Computer Science 2021-09-09 Yekun Chai , Shuo Jin , Junliang Xing

Brain2Word: Decoding Brain Activity for Language Generation

Brain decoding, understood as the process of mapping brain activities to the stimuli that generated them, has been an active research area in the last years. In the case of language stimuli, recent studies have shown that it is possible to…

Computation and Language · Computer Science 2020-11-12 Nicolas Affolter , Beni Egressy , Damian Pascual , Roger Wattenhofer

Image Captioning based on Deep Reinforcement Learning

Recently it has shown that the policy-gradient methods for reinforcement learning have been utilized to train deep end-to-end systems on natural language processing tasks. What's more, with the complexity of understanding image content and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-14 Haichao Shi , Peng Li , Bo Wang , Zhenyu Wang

Image Captioning with Deep Bidirectional LSTMs

This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. Our model builds on a deep convolutional neural network (CNN) and two separate LSTM networks. It is capable of learning…

Computer Vision and Pattern Recognition · Computer Science 2016-07-21 Cheng Wang , Haojin Yang , Christian Bartz , Christoph Meinel

Fusion Models for Improved Visual Captioning

Visual captioning aims to generate textual descriptions given images or videos. Traditionally, image captioning models are trained on human annotated datasets such as Flickr30k and MS-COCO, which are limited in size and diversity. This…

Computer Vision and Pattern Recognition · Computer Science 2021-03-01 Marimuthu Kalimuthu , Aditya Mogadala , Marius Mosbach , Dietrich Klakow

Prefix Conditioning Unifies Language and Label Supervision

Image-classification datasets have been used to pretrain image recognition models. Recently, web-scale image-caption datasets have emerged as a source of powerful pretraining alternative. Image-caption datasets are more ``open-domain'',…

Computer Vision and Pattern Recognition · Computer Science 2023-05-17 Kuniaki Saito , Kihyuk Sohn , Xiang Zhang , Chun-Liang Li , Chen-Yu Lee , Kate Saenko , Tomas Pfister

Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning

We present Pix2Cap-COCO, the first panoptic pixel-level caption dataset designed to advance fine-grained visual understanding. To achieve this, we carefully design an automated annotation pipeline that prompts GPT-4V to generate…

Computer Vision and Pattern Recognition · Computer Science 2025-01-24 Zuyao You , Junke Wang , Lingyu Kong , Bo He , Zuxuan Wu

Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style

Image captioning is a research hotspot where encoder-decoder models combining convolutional neural network (CNN) and long short-term memory (LSTM) achieve promising results. Despite significant progress, these models generate sentences…

Computer Vision and Pattern Recognition · Computer Science 2019-10-16 Hongwei Ge , Zehang Yan , Kai Zhang , Mingde Zhao , Liang Sun

Visio-Linguistic Brain Encoding

Enabling effective brain-computer interfaces requires understanding how the human brain encodes stimuli across modalities such as visual, language (or text), etc. Brain encoding aims at constructing fMRI brain activity given a stimulus.…

Computer Vision and Pattern Recognition · Computer Science 2022-04-19 Subba Reddy Oota , Jashn Arora , Vijay Rowtula , Manish Gupta , Raju S. Bapi

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images,…

Computer Vision and Pattern Recognition · Computer Science 2017-06-27 Wenhu Chen , Aurelien Lucchi , Thomas Hofmann

Multimodal Data Augmentation for Image Captioning using Diffusion Models

Image captioning, an important vision-language task, often requires a tremendous number of finely labeled image-caption pairs for learning the underlying alignment between images and texts. In this paper, we proposed a multimodal data…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Changrong Xiao , Sean Xin Xu , Kunpeng Zhang