English
Related papers

Related papers: Compositional Generalization in Image Captioning

200 papers

Image captioning has focused on generalizing to images drawn from the same distribution as the training set, and not to the more challenging problem of generalizing to different distributions of images. Recently, Nikolaus et al. (2019)…

Computation and Language · Computer Science 2021-01-29 Emanuele Bugliarello , Desmond Elliott

Compositional generalization is a key facet of human cognition, but lacking in current AI tools such as vision-language models. Previous work examined whether a compositional tensor-based sentence semantics can overcome the challenge, but…

Artificial Intelligence · Computer Science 2025-09-12 Hala Hawashin , Mina Abbaszadeh , Nicholas Joseph , Beth Pearson , Martha Lewis , Mehrnoosh sadrzadeh

In image captioning where fluency is an important factor in evaluation, e.g., $n$-gram metrics, sequential models are commonly used; however, sequential models generally result in overgeneralized expressions that lack the details that may…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Junjiao Tian , Jean Oh

Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models…

Computer Vision and Pattern Recognition · Computer Science 2021-11-12 Edwin G. Ng , Bo Pang , Piyush Sharma , Radu Soricut

Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Zhan Shi , Xu Zhou , Xipeng Qiu , Xiaodan Zhu

Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about…

Computer Vision and Pattern Recognition · Computer Science 2018-10-03 Parth Shah , Vishvajit Bakarola , Supriya Pati

Mainstream captioning models often follow a sequential structure to generate captions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance.…

Computer Vision and Pattern Recognition · Computer Science 2018-10-24 Bo Dai , Sanja Fidler , Dahua Lin

Multi-sentence summarization is a well studied problem in NLP, while generating image descriptions for a single image is a well studied problem in Computer Vision. However, for applications such as image cluster labeling or web page…

Computer Vision and Pattern Recognition · Computer Science 2020-06-17 Nicholas Trieu , Sebastian Goodman , Pradyumna Narayana , Kazoo Sone , Radu Soricut

Image captioning, which generates natural language descriptions of the visual information in an image, is a crucial task in vision-language research. Previous models have typically addressed this task by aligning the generative capabilities…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Qian Cao , Xu Chen , Ruihua Song , Xiting Wang , Xinting Huang , Yuchen Ren

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram…

Computer Vision and Pattern Recognition · Computer Science 2018-12-20 Annika Lindh , Robert J. Ross , Abhijit Mahalunkar , Giancarlo Salton , John D. Kelleher

Image captioning implies automatically generating textual descriptions of images based only on the visual input. Although this has been an extensively addressed research topic in recent years, not many contributions have been made in the…

Computer Vision and Pattern Recognition · Computer Science 2021-02-09 Eva Cetinic

Compositional generalization is a basic and essential intellective capability of human beings, which allows us to recombine known parts readily. However, existing neural network based models have been proven to be extremely deficient in…

Artificial Intelligence · Computer Science 2020-10-27 Qian Liu , Shengnan An , Jian-Guang Lou , Bei Chen , Zeqi Lin , Yan Gao , Bin Zhou , Nanning Zheng , Dongmei Zhang

This paper discusses and demonstrates the outcomes from our experimentation on Image Captioning. Image captioning is a much more involved task than image recognition or classification, because of the additional challenge of recognizing the…

Computer Vision and Pattern Recognition · Computer Science 2018-05-24 Vikram Mullachery , Vishal Motwani

Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically…

Computer Vision and Pattern Recognition · Computer Science 2018-10-16 Md. Zakir Hossain , Ferdous Sohel , Mohd Fairuz Shiratuddin , Hamid Laga

Image captioning models require the high-level generalization ability to describe the contents of various images in words. Most existing approaches treat the image-caption pairs equally in their training without considering the differences…

Computer Vision and Pattern Recognition · Computer Science 2022-12-15 Hongkuan Zhang , Saku Sugawara , Akiko Aizawa , Lei Zhou , Ryohei Sasano , Koichi Takeda

Image caption generation is one of the most challenging problems at the intersection of vision and language domains. In this work, we propose a realistic captioning task where the input scenes may incorporate visual objects with no…

Computer Vision and Pattern Recognition · Computer Science 2022-07-04 Berkan Demirel , Ramazan Gokberk Cinbis

Recurrent neural networks have recently been used for learning to describe images using natural language. However, it has been observed that these models generalize poorly to scenes that were not observed during training, possibly depending…

Computer Vision and Pattern Recognition · Computer Science 2016-08-30 Yuval Atzmon , Jonathan Berant , Vahid Kezami , Amir Globerson , Gal Chechik

Visual imagery does not consist of solitary objects, but instead reflects the composition of a multitude of fluid concepts. While there have been great advances in visual representation learning, such advances have focused on building…

Computer Vision and Pattern Recognition · Computer Science 2025-04-07 Austin Stone , Hagen Soltau , Robert Geirhos , Xi Yi , Ye Xia , Bingyi Cao , Kaifeng Chen , Abhijit Ogale , Jonathon Shlens

Compositional generalization is the ability to generalize systematically to a new data distribution by combining known components. Although humans seem to have a great ability to generalize compositionally, state-of-the-art neural models…

Machine Learning · Computer Science 2021-06-22 Juyong Kim , Pradeep Ravikumar , Joshua Ainslie , Santiago Ontañón

Compositional generalization refers to the ability to generalize to novel combinations of previously observed words and syntactic structures. Since it is regarded as a desired property of neural models, recent work has assessed…

Computation and Language · Computer Science 2025-04-07 Ryoma Kumon , Daiki Matsuoka , Hitomi Yanaka
‹ Prev 1 2 3 10 Next ›